-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine pre-allocated storage #71
Comments
I also have a question about "clones", since each clone's parameters are pointed to same storage, what's the difference with one don't use clones? |
prealloc is an ugly but effective tweak. All clones indeed share the parameters, and with preallocation we also share the internal buffers used to store gradInput and some outputs. But for both, we share only the intermediate buffers and we cannot share any buffer exposed outside of the nn graph - since as you say, the main goal of using clones is to keep full independent modules. So in other word each clone can be represented as:
and outside For outputs, we do have an additional constraint which is that some modules do use outputs to calculate gradInput so we cannot share them at all I hope this help. If you want to have a peephole connection to current LSTM - the safest is that you turn off preallocation. |
@jsenellart-systran About choosing which node to share outputs, it seems like every node in torch has their own overridden functions like "updateGradInput" and "accGradParameters". If I was right, if any node's these two functions use "self.output", then output should not be shared between clones. As you said, sigmoid node should not share outputs because it's "updateGradInput" function actually use "self.output". But linear node's both two functions don't use "self.output" at all, that's why linear node in lstm could share both input and outputs. |
When pre-allocation is enabled, how could I determine which node is used to calculate gradients?
In LSTM implementation, for some nodes both inputs and outputs storage are shared between clones, but for other nodes only inputs are shared.
I want to add a peephole connection to current LSTM, and felt quite confused.
It seems if I want to decide which node share both and which nodes share only input when new model is deployed, I have to fully understand how nodes are handled in gModule ...
Any idea?
Thanks.
The text was updated successfully, but these errors were encountered: