Attention Example is Not Efficient, Needs Greedy Decoding #242

neubig · 2017-01-13T20:13:10Z

Currently the attention example is not very efficient, particularly on GPUs. For example, this for loop could be changed so it only does a single matrix multiplication (which can be done only once per sentence):
https://github.com/clab/dynet/blob/master/examples/python/attention.py#L75

Also, here the attention model is randomly generating examples instead of selecting the best one, which is more in line with what we would expect:
https://github.com/clab/dynet/blob/master/examples/python/attention.py#L105

emanjavacas · 2017-01-16T10:28:25Z

I've been looking into this example and I've also done some refactoring (see PR #243) which does not affect the issue you mention - I would have time to do what you propose though and perhaps add it to the standing PR, although I'd probably need some help here with dynet matrix operations.

For a) I imagine this will imply a matrix-matrix-multiply of matrix w1 (repeatedly concatenated n times where n is the length of the sequence) with the concatenation of all input vectors (and similarly for the other terms of v*dy.tanh(w1*input_vector + w2dt).

For b) I imagine you mean sampling according to argmax.

neubig · 2017-01-16T14:42:27Z

Thanks, this would be great!

For a), this basically means we calculate w1dt = w1 * dy.concat_columns(input_vectors) once at the beginning of the sentence, then replace: the loop over attention_weight = v*dy.tanh(w1*input_vector + w2dt) with a single attention_weights = v*dy.tanh(dy.colwise_add(w1dt, w2dt))

For b), yes that's right (although I wouldn't call argmax sampling).

A pull request for #242 (greedy decoding and vectorization in attention.py)

neubig · 2017-01-23T14:43:14Z

Fixed by #257

neubig added the minor bug Bugs that aren't too bad, only concern documentation, or have easy work-arounds label Jan 13, 2017

neubig changed the title ~~Attention Example is Not Efficient, Need Greedy Decoding~~ Attention Example is Not Efficient, Needs Greedy Decoding Jan 13, 2017

yoavg mentioned this issue Jan 13, 2017

Python Attention Example Correctness #238

Closed

neubig added a commit that referenced this issue Jan 20, 2017

Merge pull request #257 from emanjavacas/attn

95cac6b

A pull request for #242 (greedy decoding and vectorization in attention.py)

neubig closed this as completed Jan 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attention Example is Not Efficient, Needs Greedy Decoding #242

Attention Example is Not Efficient, Needs Greedy Decoding #242

neubig commented Jan 13, 2017

emanjavacas commented Jan 16, 2017

neubig commented Jan 16, 2017

neubig commented Jan 23, 2017

Attention Example is Not Efficient, Needs Greedy Decoding #242

Attention Example is Not Efficient, Needs Greedy Decoding #242

Comments

neubig commented Jan 13, 2017

emanjavacas commented Jan 16, 2017

neubig commented Jan 16, 2017

neubig commented Jan 23, 2017