Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Flaky test: test_preloaded_multi_sgd #16345

Open
szha opened this issue Oct 1, 2019 · 6 comments
Open

Flaky test: test_preloaded_multi_sgd #16345

szha opened this issue Oct 1, 2019 · 6 comments

Comments

@szha
Copy link
Member

szha commented Oct 1, 2019

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-16343/1/pipeline#step-476-log-1053

Likely caused by #16122 in which the test was added.

cc @Caenorst @apeforest

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended label(s): Test, Flaky

@Caenorst
Copy link
Contributor

Caenorst commented Oct 1, 2019

I don't see any absurd difference:

Error 1.013672 exceeds tolerance rtol=0.001000, atol=0.001000.  Location of maximum error:(0, 2, 0, 3), a=-0.203125, b=-0.204346

 a: array([[[[-0.624   , -0.4946  , -0.3997  ,  0.06415 ,  0.2402  ,

          -0.00757 ],

         [-0.03027 , -0.1873  , -0.284   , -0.2961  ,  0.5986  ,...

 b: array([[[[-0.6245   , -0.495    , -0.4001   ,  0.0641   ,  0.2397   ,

          -0.00769  ],

         [-0.03076  , -0.1875   , -0.2844   , -0.2966   ,  0.598    ,...

so I'm suggesting to bump rtol to 5e-3 or 1e-2.

@ChaiBapchya
Copy link
Contributor

@ChaiBapchya
Copy link
Contributor

ChaiBapchya commented Oct 1, 2019

@Caenorst can you try bumping it up and then running that particular test say 10k times for unix-gpu using this command:

MXNET_TEST_COUNT=1 nosetests --logging-level=DEBUG --verbose -s tests/python/gpu/test_operator_gpu.py:test_preloaded_multi_sgd

Thanks.

@Caenorst
Copy link
Contributor

Caenorst commented Oct 2, 2019

It turned out that values very close to 0. are the most inaccurate so bumping atol instead of rtol.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants