Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K.ctc_batch_cost() get slice index 0 of dimension 0 out of bounds error when using online trainning (batch_size=1) #7049

Closed
channingxiao opened this issue Jun 20, 2017 · 12 comments
Labels
To investigate Looks like a bug. It needs someone to investigate.

Comments

@channingxiao
Copy link

Hello, I am using CTC loss function in my model, everything were good until I tried to using online training (batch_size =1). The error was caused by K.ctc_batch_cost function.
The error can be reproduced with the keras example "image_ocr.py" by simply set the "minibatch_size = 1 " in line 446 ( the parameter of TextImagegenerator).

I am using keras 2.0.2 with tensorflow 1.1.0 backend.
Thank you!

@fchollet fchollet added the To investigate Looks like a bug. It needs someone to investigate. label Jun 20, 2017
@cyprienruffino
Copy link

Hi ! I am having the same problem with Keras 2.0.6 and TensorFlow 1.2.0.
As I have to do online training, I did get around the problem by making minibatches with twice the same data, but I admit it is a quite dirty solution...

@xisnu
Copy link

xisnu commented Aug 8, 2017

Yes. I am also having the same problem. The problem is definitely related to conversion of dense to sparse.

@Cerno-b
Copy link

Cerno-b commented Sep 7, 2017

Glad someone already posted this.
The specific error message is

InvalidArgumentError (see above for traceback): slice index 0 of dimension 0 out of bounds.
[[Node: ctc/scan/strided_slice = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1, _device="/job:localhost/replica:0/task:0/cpu:0"](ctc/scan/Shape, ctc/scan/strided_slice/stack, ctc/scan/strided_slice/stack_1, ctc/scan/strided_slice/stack_2)]]

Here is a somewhat minimal example that shows what's happening. It only occurs for batch_size exactly equal to one.

test_lstm.py.txt

@fchollet It's fairly urgent for me, so if you have any pointers where I could look, I could aid in the investigation. I tried reading the corresponding Tensorflow source code at tensorflow-master\tensorflow\core\util\strided_slice_op.cc, line 299 but as a TF beginner, progress is pretty slow so far.

@Cerno-b
Copy link

Cerno-b commented Sep 7, 2017

Tried to get into TF debugging and was able to get this stack, if it helps:

Traceback of node construction:
[...]
7: test_lstm\test_lstm.py
Line: 46
Function:
Text: "loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])"
8: Python35\lib\site-packages\keras\engine\topology.py
Line: 554
Function: call
Text: "output = self.call(inputs, **kwargs)"
9: Python35\lib\site-packages\keras\layers\core.py
Line: 659
Function: call
Text: "return self.function(inputs, **arguments)"
10: test_lstm\test_lstm.py
Line: 17
Function: ctc_lambda_func
Text: "return K.ctc_batch_cost(labels, y_pred, input_length, label_length)"
11: Python35\lib\site-packages\keras\backend\tensorflow_backend.py
Line: 3263
Function: ctc_batch_cost
Text: "sparse_labels = tf.to_int32(ctc_label_dense_to_sparse(y_true, label_length))"
12: Python35\lib\site-packages\keras\backend\tensorflow_backend.py
Line: 3222
Function: ctc_label_dense_to_sparse
Text: "initializer=init, parallel_iterations=1)"
13: Python35\lib\site-packages\tensorflow\python\ops\functional_ops.py
Line: 526
Function: scan
Text: "n = array_ops.shape(elems_flat[0])[0]"
14: Python35\lib\site-packages\tensorflow\python\ops\array_ops.py
Line: 509
Function: _SliceHelper
Text: "name=name)"
15: Python35\lib\site-packages\tensorflow\python\ops\array_ops.py
Line: 677
Function: strided_slice
Text: "shrink_axis_mask=shrink_axis_mask)"
16: Python35\lib\site-packages\tensorflow\python\ops\gen_array_ops.py
Line: 3744
Function: strided_slice
Text: "shrink_axis_mask=shrink_axis_mask, name=name)"
17: Python35\lib\site-packages\tensorflow\python\framework\op_def_library.py
Line: 767
Function: apply_op
Text: "op_def=op_def)"
18: Python35\lib\site-packages\tensorflow\python\framework\ops.py
Line: 2630
Function: create_op
Text: "original_op=self._default_original_op, op_def=op_def)"
19: Python35\lib\site-packages\tensorflow\python\framework\ops.py
Line: 1204
Function: init
Text: "self._traceback = self._graph._extract_stack() # pylint: disable=protected-access"

@WindQAQ
Copy link
Contributor

WindQAQ commented Nov 21, 2017

@fchollet I think this problem can be basically solved by modifying the function ctc_batch_cost() in keras/backend/tensorflow_backend.py . Take a look at the following lines:

label_length = tf.to_int32(tf.squeeze(label_length))
input_length = tf.to_int32(tf.squeeze(input_length))

If the batch_size is 1, these two tensors label_length and input_length will be rank 0; however, they should be rank 1 with shape (1,). At least for tensorflow API ctc_loss(), the parameter sequence_length should be a 1-D tensor, see Tensorflow CTC loss. If the tensor input_length has rank 0, these lines will be broken. As for label_length, it seems that the input should also be at least rank 1 in Tensorflow scan(), so the error occurs in ctc_label_dense_to_sparse().

Hence, my basic solution is to squeeze only along axis 1, that is,

label_length = tf.to_int32(tf.squeeze(label_length, axis=1))
input_length = tf.to_int32(tf.squeeze(input_length, axis=1))

It leads to the rank 1 tensors for all kinds of batch size. I try this solution on my computer, and it works well!

@saisumanth007
Copy link

@WindQAQ @fchollet Could you please tell what input_length and label_length specify? As per the documentation it seems label_length contains the lengths of ground truth strings (in case of OCR). But I'm not sure what input_length means.

@xisnu
Copy link

xisnu commented Jan 29, 2018

You are right about the label_length. Input length is the length of the input sequences. In case of OCR it may represent the sequence of feature vectors created from the input image.

@saisumanth007
Copy link

@xisnu Input to the LSTM is (batch_size, 26, 512 ) in my case and the output is (batch_size, 26, 37 ). So what should be the input_length?

@xisnu
Copy link

xisnu commented Jan 29, 2018

Suppose you have three samples like this
Input
a1 a2 a3
b1b2 b3 b4
c1

Target
goat
mat
is

If you feed this to LSTM CTC model you should pad them to make it equal. So it becomes
Input
a1 a2 a3 PD
b1 b2 b3 b4
c1 PD PD PD

So input to LSTM is (3, 4, 1), But you can also input the actual input sequence lengths in an array [3 4 1] and of course target length is another array [4 3 2].

@WindQAQ
Copy link
Contributor

WindQAQ commented Jan 29, 2018

@saisumanth007 It should be the length of inputs before padding, and thus, it can not be determined based on the information you give.

@saisumanth007
Copy link

@xisnu @WindQAQ Suppose in OCR, I have 3 images : image1,image2 and image3 with ground truth strings "goat", "mat", "is" respectively.
While training, I will pad the labels to max length i.e., 4 in this case.
So label_length = [4,3,2] --> these are the lengths before padding.
Can we determine input_length in this case?

@WindQAQ
Copy link
Contributor

WindQAQ commented Jan 29, 2018

@saisumanth007

Input to the LSTM is (batch_size, 26, 512 ) in my case

Basically, if you do not pad the inputs, that is, the feature vectors of images, the input_length should be an array filled with 26. It depends on whether you pad the inputs. Maybe you can talk about how you extract the feature vectors from images so that I can help you directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
To investigate Looks like a bug. It needs someone to investigate.
Projects
None yet
Development

No branches or pull requests

7 participants