Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Mark test_layer_norm and test_norm flaky #20091

Merged
merged 1 commit into from
Apr 3, 2021
Merged

Conversation

barry-jin
Copy link
Contributor

Description

These two tests sometimes take longer than 20mins to run.

[2021-03-25T17:51:39.813Z] 1200.88s call     tests/python/unittest/test_operator.py::test_layer_norm[float32-0.001-0.001-in_shape_l1-finite_grad_check_l1-1]

[2021-03-25T17:51:39.813Z] 1200.40s call     tests/python/unittest/test_operator.py::test_layer_norm[float64-0.0001-0.0001-in_shape_l2-finite_grad_check_l2-1]

[2021-03-25T17:51:39.813Z] 1200.14s call     tests/python/unittest/test_operator.py::test_norm

https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-cpu/detail/PR-20087/9/pipeline

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@mxnet-bot
Copy link

Hey @barry-jin , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [miscellaneous, unix-cpu, windows-gpu, unix-gpu, centos-cpu, website, clang, sanity, centos-gpu, edge, windows-cpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@lanking520 lanking520 added the pr-awaiting-testing PR is reviewed and waiting CI build and test label Mar 25, 2021
@leezu
Copy link
Contributor

leezu commented Mar 25, 2021

Marking the test as flaky would re-run the test up to 3 times, taking 60 minutes. It may be better to disable the test until it's clear why it hangs on the CD tests.

@barry-jin
Copy link
Contributor Author

barry-jin commented Mar 25, 2021

It looks like test_norm, test_layer_norm became the slowest tests after openmp submodule being removed in #19953 .
unix-cpu Python3: CPU pytest slowest 50 before this commit
https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/master/2485/pipeline/282

[2021-03-01T06:10:05.320Z] ============================= slowest 50 durations =============================
[2021-03-01T06:10:05.320Z] 246.53s call     tests/python/unittest/test_operator.py::test_broadcast_binary_op

[2021-03-01T06:10:05.320Z] 185.37s call     tests/python/unittest/test_operator.py::test_order

[2021-03-01T06:10:05.320Z] 110.09s call     tests/python/unittest/test_operator.py::test_psroipooling

[2021-03-01T06:10:05.320Z] 97.39s call     tests/python/unittest/test_operator.py::test_layer_norm[float32-0.001-0.001-in_shape_l1-finite_grad_check_l1-0]

[2021-03-01T06:10:05.320Z] 95.54s call     tests/python/unittest/test_operator.py::test_layer_norm[float64-0.0001-0.0001-in_shape_l2-finite_grad_check_l2-0]

[2021-03-01T06:10:05.321Z] 90.15s call     tests/python/unittest/test_operator.py::test_layer_norm[float64-0.0001-0.0001-in_shape_l2-finite_grad_check_l2-1]

[2021-03-01T06:10:05.321Z] 88.62s call     tests/python/unittest/test_operator.py::test_layer_norm[float32-0.001-0.001-in_shape_l1-finite_grad_check_l1-1]

[2021-03-01T06:10:05.321Z] 71.50s call     tests/python/unittest/test_operator.py::test_convolution_dilated_impulse_response

[2021-03-01T06:10:05.321Z] 71.45s call     tests/python/unittest/test_operator.py::test_bilinear_resize_op

[2021-03-01T06:10:05.321Z] 43.48s call     tests/python/unittest/test_operator.py::test_convolution_independent_gradients

[2021-03-01T06:10:05.321Z] 42.75s call     tests/python/unittest/test_operator.py::test_stack

[2021-03-01T06:10:05.321Z] 24.36s call     tests/python/unittest/test_operator.py::test_multi_proposal_op

[2021-03-01T06:10:05.321Z] 24.18s call     tests/python/unittest/test_operator.py::test_laop_2

[2021-03-01T06:10:05.321Z] 21.11s call     tests/python/unittest/test_operator.py::test_layer_norm[float16-0.01-0.01-in_shape_l0-finite_grad_check_l0-1]

[2021-03-01T06:10:05.321Z] 20.08s call     tests/python/unittest/test_operator.py::test_layer_norm[float16-0.01-0.01-in_shape_l0-finite_grad_check_l0-0]

[2021-03-01T06:10:05.321Z] 18.79s call     tests/python/unittest/test_operator.py::test_reduce

[2021-03-01T06:10:05.321Z] 10.60s call     tests/python/unittest/test_operator.py::test_batchnorm[True-False-False-shape2-BatchNorm]

[2021-03-01T06:10:05.321Z] 10.39s call     tests/python/unittest/test_operator.py::test_batchnorm_training

[2021-03-01T06:10:05.321Z] 10.11s call     tests/python/unittest/test_operator.py::test_batchnorm[True-True-False-shape2-BatchNorm]

[2021-03-01T06:10:05.321Z] 9.05s call     tests/python/unittest/test_operator.py::test_l2_normalization

unix-cpu Python3: CPU pytest slowest 50 for this commit
https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-19953/3/pipeline/282/

[2021-03-01T11:36:42.961Z] ============================= slowest 50 durations =============================

[2021-03-01T11:36:42.961Z] 600.14s call     tests/python/unittest/test_operator.py::test_layer_norm[float64-0.0001-0.0001-in_shape_l2-finite_grad_check_l2-1]

[2021-03-01T11:36:42.961Z] 591.61s call     tests/python/unittest/test_operator.py::test_layer_norm[float32-0.001-0.001-in_shape_l1-finite_grad_check_l1-1]

[2021-03-01T11:36:42.961Z] 543.72s call     tests/python/unittest/test_operator.py::test_layer_norm[float16-0.01-0.01-in_shape_l0-finite_grad_check_l0-1]

[2021-03-01T11:36:42.961Z] 541.24s call     tests/python/unittest/test_operator.py::test_layer_norm[float32-0.001-0.001-in_shape_l1-finite_grad_check_l1-0]

[2021-03-01T11:36:42.961Z] 483.78s call     tests/python/unittest/test_operator.py::test_batchnorm[False-True-False-shape2-BatchNorm]

[2021-03-01T11:36:42.961Z] 438.33s call     tests/python/unittest/test_operator.py::test_batchnorm[False-False-False-shape2-BatchNorm]

[2021-03-01T11:36:42.961Z] 428.82s call     tests/python/unittest/test_operator.py::test_norm

[2021-03-01T11:36:42.961Z] 406.73s call     tests/python/unittest/test_operator.py::test_batchnorm[True-False-False-shape2-BatchNorm]

[2021-03-01T11:36:42.961Z] 363.46s call     tests/python/unittest/test_operator.py::test_layer_norm[float64-0.0001-0.0001-in_shape_l2-finite_grad_check_l2-0]

[2021-03-01T11:36:42.961Z] 340.72s call     tests/python/unittest/test_operator.py::test_reduce

[2021-03-01T11:36:42.961Z] 297.12s call     tests/python/unittest/test_operator.py::test_layer_norm[float16-0.01-0.01-in_shape_l0-finite_grad_check_l0-0]

[2021-03-01T11:36:42.961Z] 262.40s call     tests/python/unittest/test_operator.py::test_batchnorm[True-True-False-shape2-BatchNorm]

[2021-03-01T11:36:42.961Z] 200.51s call     tests/python/unittest/test_operator.py::test_laop_2

[2021-03-01T11:36:42.961Z] 177.58s call     tests/python/unittest/test_operator.py::test_broadcast_binary_op

[2021-03-01T11:36:42.961Z] 163.79s call     tests/python/unittest/test_operator.py::test_batchnorm[False-False-True-shape2-BatchNorm]

[2021-03-01T11:36:42.961Z] 154.08s call     tests/python/unittest/test_operator.py::test_batchnorm[True-False-False-shape2-SyncBatchNorm]

[2021-03-01T11:36:42.961Z] 150.28s call     tests/python/unittest/test_operator.py::test_batchnorm[False-True-True-shape2-BatchNorm]

@lanking520 lanking520 added pr-awaiting-review PR is waiting for code review and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Mar 25, 2021
@szha szha merged commit 4552d4f into apache:master Apr 3, 2021
szha added a commit that referenced this pull request Apr 5, 2021
leezu pushed a commit that referenced this pull request Apr 7, 2021
@barry-jin barry-jin deleted the flaky_norm branch August 4, 2021 21:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants