Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM builder is not the standard LSTM #154

Closed
pmichel31415 opened this issue Nov 8, 2016 · 7 comments
Closed

LSTM builder is not the standard LSTM #154

pmichel31415 opened this issue Nov 8, 2016 · 7 comments
Labels
minor bug Bugs that aren't too bad, only concern documentation, or have easy work-arounds

Comments

@pmichel31415
Copy link
Collaborator

pmichel31415 commented Nov 8, 2016

After looking at the code in lstm.h, it seems that the LSTMBuilder cell has the following update rule :
imgtemp-rtc2r2-1

Which is not the common update rule :

imgtemp-ffi3s1-1

(i.e. in the dynet implementation the forget and input gate are coupled and there are peaky connections with the cell state c)

What are the reasons behind this choice?

Anyway I think this should be documented.

@neubig
Copy link
Contributor

neubig commented Nov 8, 2016

I've heard that the motivation is the results in this paper: https://arxiv.org/abs/1503.04069

But I agree that this should be documented.

@neubig neubig added the minor bug Bugs that aren't too bad, only concern documentation, or have easy work-arounds label Nov 8, 2016
@pmichel31415
Copy link
Collaborator Author

I see. I started documenting the lstm.

I also started implementing a vanilla LSTM. Would it be interesting to have it in the form of two optional parameters coupling and peepholes in the constructor? With default values true for compatibility.

@yoavg
Copy link
Contributor

yoavg commented Nov 8, 2016

the LSTMBuilder is quite central. I agree that it would be good to add
these, just need to ensure that:

  • no speed penalty
  • the values are saved / loaded with the model
  • we remain backward compatible with saved models

On Tue, Nov 8, 2016 at 9:31 PM, Paul Michel [email protected]
wrote:

I see. I started documenting the lstm.

I also started implementing a vanilla LSTM. Would it be interesting to
have it in the form of two optional parameters coupling and peepholes in
the constructor? With default values true for compatibility.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#154 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAcVtHrC5LRkHsWELhZuViolpdL86VyJks5q8M4VgaJpZM4Kso4N
.

@yoavg
Copy link
Contributor

yoavg commented Nov 22, 2016

After staring at the code (admittedly without pen and paper at hand, I'm commuting), I am pretty sure the code implements something else than what's written above -- for example $o_t$ uses $c_t$ (ct[i]) and not $c_{t-1}$ (i_c_tm1), and tanh is applied twice (it is also applied on $c_t$ just prior to computing $h_t$.

@redpony ?

@pmichel31415
Copy link
Collaborator Author

@yoavg yes you are right this is my mistake, I'll update the original post.

I've been working on implementing the aforementioned options to the LSTMBuilder, but I'll put it on hold until the dropout PR is merged.

@yoavg
Copy link
Contributor

yoavg commented Nov 22, 2016

this may affect the dropout PR, as I may be introducing dropout in an un-intended place (or effectively introducing it twice) -- and may not be.

@neubig
Copy link
Contributor

neubig commented Jan 4, 2017

An alternative VanillaLSTMBuilder is now available, documentation in the "Unorthodox Things" section as of this commit: 7935e79

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor bug Bugs that aren't too bad, only concern documentation, or have easy work-arounds
Projects
None yet
Development

No branches or pull requests

3 participants