-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[1.7.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) #18316
Conversation
Hey @bgawrych , Thanks for submitting the PR
CI supported jobs: [clang, centos-cpu, windows-gpu, windows-cpu, website, unix-cpu, miscellaneous, centos-gpu, unix-gpu, sanity, edge] Note: |
78f410d
to
5323ac3
Compare
@mxnet-bot run ci [edge, windows-gpu] |
Jenkins CI successfully triggered : [edge, windows-gpu] |
@bgawrych Please help to rebase the code, the failure of |
* Changed relevant function args to index_t * Added nightly test for RNN * Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh * Using const instead of literals * Added nightly test for RNN ReLU & tanh, LSTM, GRU * Type assertion to force evaluation of output NDArray * Incorporated latest round of comments
…pache#18203) * Fix input gradient calculation for bidirectional LSTM For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect. Reason of wrong calculations was overwriting y derivative (dy) tensor by calculated x derivative (dx) tensor before right2left layer could use dy for own gradient calculations. Propsed fix uses additional space to avoid overwriting. * Fix gradient calculation for GRU For GRU with number of layers > 2 i2h_weight gradient for layers in the middle (all except last and first) was incorrect. Wrong caluculations were caused by assigning output pointer to input instead of calculating new input pointer. * Enable tests for GRU and LSTM gradients * Fix comments * Change loop iteration deduction * Add more test cases for fused rnn layers
@mxnet-bot run ci [unix-gpu] |
Jenkins CI successfully triggered : [unix-gpu] |
@ciyongch Everything is alright :) branch 1.x too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @bgawrych , ping @TaoLv @pengzhao-intel to take a review and help merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Fix for LSTM and GRU layers without DNNL enabled give wrong gradients #17898
[Large Tensor] Fixed RNN op #17632
Checklist
Essentials
Comments