Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[WIP] Reduce numerical error on numerical gradient calculations #15770

Closed
wants to merge 1 commit into from

Conversation

larroy
Copy link
Contributor

@larroy larroy commented Aug 6, 2019

Description

During numerical grad, the output symbol is multiplied by a random matrix. This increases the numerical errors dramatically. With this change, the gradient is still checked at the right location, but without the reduced precission. The benefit by adding a constant instead of multiplying is also that the gradient output in symbolic is always the same and the numerical one changes very little.

Fixes #11720
Overall will reduce flakiness of tests using numerical gradients

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@larroy larroy requested a review from szha as a code owner August 6, 2019 21:39
Fixes apache#11720
Overall will reduce flakiness of tests using numerical gradients
@larroy larroy force-pushed the numerical_grad_fix branch from fb9e53c to 998b3f8 Compare August 6, 2019 21:48
@larroy larroy changed the title Reduce numerical error on numerical gradient calculations [WIP] Reduce numerical error on numerical gradient calculations Aug 6, 2019
@larroy
Copy link
Contributor Author

larroy commented Aug 6, 2019

@marcoabreu add [pr-work-in-progress]

@larroy
Copy link
Contributor Author

larroy commented Aug 8, 2019

@mxnet-label-bot add [pr-work-in-progress]

@marcoabreu marcoabreu added the pr-work-in-progress PR is still work in progress label Aug 8, 2019
@ChaiBapchya
Copy link
Contributor

@larroy
Thanks for diving deep on this issue!
If this solves the problem (adding instead of multiplying random matrix) would be great! Can you address merge conflicts and retrigger the CI?

Also I skimmed through a few CI pipelines. Error seem to be related to this change..

@larroy
Copy link
Contributor Author

larroy commented Sep 8, 2019

Hi I don't have time to follow on this one.

@larroy larroy closed this Sep 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-work-in-progress PR is still work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_operator.test_laop_3 has fixed seed that can mask flakiness
3 participants