Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[1.7.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) #18316

Merged
merged 2 commits into from
Jun 3, 2020

Conversation

bgawrych
Copy link
Contributor

@bgawrych bgawrych commented May 14, 2020

Description

Fix for LSTM and GRU layers without DNNL enabled give wrong gradients #17898
[Large Tensor] Fixed RNN op #17632

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Comments

@mxnet-bot
Copy link

Hey @bgawrych , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [clang, centos-cpu, windows-gpu, windows-cpu, website, unix-cpu, miscellaneous, centos-gpu, unix-gpu, sanity, edge]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@bgawrych bgawrych force-pushed the 1.7rnn_fx branch 2 times, most recently from 78f410d to 5323ac3 Compare May 18, 2020 07:26
@bgawrych bgawrych changed the title [1.7.x] Backport of fix LSTM and GRU layers gradient calculations [1.7.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) May 18, 2020
@bgawrych
Copy link
Contributor Author

@mxnet-bot run ci [edge, windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [edge, windows-gpu]

@ciyongch
Copy link
Contributor

@bgawrych Please help to rebase the code, the failure of edge job was already fixed.

@ciyongch
Copy link
Contributor

ciyongch commented Jun 1, 2020

Hi @bgawrych , seems the failed cases is still caused by the CI itself. Please hold on for a while, I've create a PR #18452 to address such failure.

@ciyongch
Copy link
Contributor

ciyongch commented Jun 2, 2020

Hi @bgawrych , PR #18456 was merged to fix the CI issue, it's ok to re-trigger the CI now, thanks!

connorgoggins and others added 2 commits June 2, 2020 08:41
* Changed relevant function args to index_t

* Added nightly test for RNN

* Added fix for LSTM, GRU, RNN-ReLU, RNN-tanh

* Using const instead of literals

* Added nightly test for RNN ReLU & tanh, LSTM, GRU

* Type assertion to force evaluation of output NDArray

* Incorporated latest round of comments
…pache#18203)

* Fix input gradient calculation for bidirectional LSTM

For bidiractional LSTM with number of layers > 2 input gradient calculation was incorrect.
Reason of wrong calculations was overwriting y derivative (dy) tensor by
calculated x derivative (dx) tensor before right2left layer could use dy for own
gradient calculations.
Propsed fix uses additional space to avoid overwriting.

* Fix gradient calculation for GRU

For GRU with number of layers > 2 i2h_weight gradient for
layers in the middle (all except last and first) was incorrect.
Wrong caluculations were caused by assigning output pointer to
input instead of calculating new input pointer.

* Enable tests for GRU and LSTM gradients

* Fix comments

* Change loop iteration deduction

* Add more test cases for fused rnn layers
@bgawrych
Copy link
Contributor Author

bgawrych commented Jun 2, 2020

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@bgawrych
Copy link
Contributor Author

bgawrych commented Jun 2, 2020

@ciyongch Everything is alright :) branch 1.x too

Copy link
Contributor

@ciyongch ciyongch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @bgawrych , ping @TaoLv @pengzhao-intel to take a review and help merge.

Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pengzhao-intel pengzhao-intel merged commit 4a830db into apache:v1.7.x Jun 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants