Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attention Example is Not Efficient, Needs Greedy Decoding #242

Closed
neubig opened this issue Jan 13, 2017 · 3 comments
Closed

Attention Example is Not Efficient, Needs Greedy Decoding #242

neubig opened this issue Jan 13, 2017 · 3 comments
Labels
minor bug Bugs that aren't too bad, only concern documentation, or have easy work-arounds

Comments

@neubig
Copy link
Contributor

neubig commented Jan 13, 2017

Currently the attention example is not very efficient, particularly on GPUs. For example, this for loop could be changed so it only does a single matrix multiplication (which can be done only once per sentence):
https://github.com/clab/dynet/blob/master/examples/python/attention.py#L75

Also, here the attention model is randomly generating examples instead of selecting the best one, which is more in line with what we would expect:
https://github.com/clab/dynet/blob/master/examples/python/attention.py#L105

@neubig neubig added the minor bug Bugs that aren't too bad, only concern documentation, or have easy work-arounds label Jan 13, 2017
@neubig neubig changed the title Attention Example is Not Efficient, Need Greedy Decoding Attention Example is Not Efficient, Needs Greedy Decoding Jan 13, 2017
@emanjavacas
Copy link
Contributor

I've been looking into this example and I've also done some refactoring (see PR #243) which does not affect the issue you mention - I would have time to do what you propose though and perhaps add it to the standing PR, although I'd probably need some help here with dynet matrix operations.

For a) I imagine this will imply a matrix-matrix-multiply of matrix w1 (repeatedly concatenated n times where n is the length of the sequence) with the concatenation of all input vectors (and similarly for the other terms of v*dy.tanh(w1*input_vector + w2dt).

For b) I imagine you mean sampling according to argmax.

@neubig
Copy link
Contributor Author

neubig commented Jan 16, 2017

Thanks, this would be great!

For a), this basically means we calculate w1dt = w1 * dy.concat_columns(input_vectors) once at the beginning of the sentence, then replace: the loop over attention_weight = v*dy.tanh(w1*input_vector + w2dt) with a single attention_weights = v*dy.tanh(dy.colwise_add(w1dt, w2dt))

For b), yes that's right (although I wouldn't call argmax sampling).

neubig added a commit that referenced this issue Jan 20, 2017
A pull request for #242 (greedy decoding and vectorization in attention.py)
@neubig
Copy link
Contributor Author

neubig commented Jan 23, 2017

Fixed by #257

@neubig neubig closed this as completed Jan 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor bug Bugs that aren't too bad, only concern documentation, or have easy work-arounds
Projects
None yet
Development

No branches or pull requests

2 participants