[WIP] Reduce numerical error on numerical gradient calculations #15770

larroy · 2019-08-06T21:39:38Z

Description

During numerical grad, the output symbol is multiplied by a random matrix. This increases the numerical errors dramatically. With this change, the gradient is still checked at the right location, but without the reduced precission. The benefit by adding a constant instead of multiplying is also that the gradient output in symbolic is always the same and the numerical one changes very little.

Fixes #11720
Overall will reduce flakiness of tests using numerical gradients

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Fixes apache#11720 Overall will reduce flakiness of tests using numerical gradients

larroy · 2019-08-06T22:49:30Z

@marcoabreu add [pr-work-in-progress]

larroy · 2019-08-08T22:18:59Z

@mxnet-label-bot add [pr-work-in-progress]

ChaiBapchya · 2019-08-16T15:22:48Z

@larroy
Thanks for diving deep on this issue!
If this solves the problem (adding instead of multiplying random matrix) would be great! Can you address merge conflicts and retrigger the CI?

Also I skimmed through a few CI pipelines. Error seem to be related to this change..

larroy · 2019-09-08T04:00:57Z

Hi I don't have time to follow on this one.

larroy requested a review from szha as a code owner August 6, 2019 21:39

Reduce numerical error on numerical gradient calculations

998b3f8

Fixes apache#11720 Overall will reduce flakiness of tests using numerical gradients

larroy force-pushed the numerical_grad_fix branch from fb9e53c to 998b3f8 Compare August 6, 2019 21:48

larroy changed the title ~~Reduce numerical error on numerical gradient calculations~~ [WIP] Reduce numerical error on numerical gradient calculations Aug 6, 2019

marcoabreu added the pr-work-in-progress PR is still work in progress label Aug 8, 2019

larroy mentioned this pull request Aug 16, 2019

Move Windows CI build to a 64-bit toolchain to fix 'out of heap space'. #15882

Closed

6 tasks

larroy closed this Sep 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Reduce numerical error on numerical gradient calculations #15770

[WIP] Reduce numerical error on numerical gradient calculations #15770

larroy commented Aug 6, 2019

larroy commented Aug 6, 2019

larroy commented Aug 8, 2019

ChaiBapchya commented Aug 16, 2019

larroy commented Sep 8, 2019

[WIP] Reduce numerical error on numerical gradient calculations #15770

[WIP] Reduce numerical error on numerical gradient calculations #15770

Conversation

larroy commented Aug 6, 2019

Description

Checklist

Essentials

larroy commented Aug 6, 2019

larroy commented Aug 8, 2019

ChaiBapchya commented Aug 16, 2019

larroy commented Sep 8, 2019