-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test battery: flaky tests #2894
Comments
Leaving unassigned, good issue to any other newcomer to the project too 🎉 |
Sorry hijacked the issue to record other flaky tests as well. |
Thanks @matthewrmshin !!! With that list, working on this issue will be thousand times easier. Thanks!!! |
Yes, I also notice that |
Another vote for |
Could we (eventually) actually automate a procedure that can quantify 'flakiness' to highlight individual test issues? Notably, we could create a script/workflow that will run (say, overnight) the test battery &/or Travis CI a number of times & compare failures on each set to pick up on any bad tests. Then we could send out emails prompting us to get any issues fixed. There is actually some interesting literature about, such as this 2014 study which I just skim read, if anyone is looking for some bed-time reading! |
Travis CI already "knows" which tests are flaky (in its environment, at least) because it sometimes has to run those ones twice to get the test battery to pass. Maybe we can just get it to notify us of which tests it has to run twice - shouldn't be too hard to arrange? |
I am aware of that, but I think two is statistically not the greatest sample size for indication of test reliability. Though your comment made me wonder if there is a Travis CI configuration setting for the number of re-runs? If we up that, then only the relatively small number of possibly flaky tests will need to be re-run, instead of manually having to restart whole jobs in test battery runs whenever flaky tests crop up.
That would certainly be useful & hopefully simple to configure. I'll have a look through the Travis CI docs to see what is possible. |
I believe we manage the rerunning of failed tests ourselves, in our Travis CI config. |
(By "knows" I meant our T-CI build script knows, so maybe we could just add a line to that to send a notification out somehow). |
True! But we would gradually identify all flaky tests as they randomly fail over many separate T-CI runs. |
The best outcome for this issue is that all tests are fixed and we'll no longer have to re-run them as we do now. |
Sure. But thinking about the future, flaky tests do cause issues here & then, so having some procedures for monitoring & fixing unreliable ones could be nice. |
A general comment - we should provide clear comments in our functional tests on exactly how they are supposed to work, because some of them do very strange things in order to set up certain test conditions that are difficult to reproduce "naturally". Case in point: #2929 (comment) - |
@hjoliver Don't worry. I did manage to work out what was going on - and it was (sort of) my fault. |
Note on
Comparing with my local environment, it has the same settings or higher number for memory/files/etc. Except
But I do not believe this could cause random failures... I guess. |
OK, the test is failing as the "health check settings" log line is different:
Running locally I see
|
how to hammer flaky tests on Travis CI
|
(Having done that with |
Bit off topic I think, but an interesting post about a team that collected reasons for their flaky tests and produced a nice summary of each issue, post mortem, etc. https://samsaffron.com/archive/2019/05/15/tests-that-sometimes-fail I like the author's optimism (I think he's co-founder of discourse BTW)
|
Haha, the first paragraphs describe our situation exactly 😬 |
A few Travis CI runs on the chunk 3/4 have failed on Traceback (most recent call last):
File ".travis/cover.py", line 38, in <module>
main()
File ".travis/cover.py", line 33, in main
call(fn_tests + ' --state=failed -j 5', shell=True) # nosec
File "/opt/python/3.7.1/lib/python3.7/subprocess.py", line 317, in call
with Popen(*popenargs, **kwargs) as p:
File "/opt/python/3.7.1/lib/python3.7/subprocess.py", line 769, in __init__
restore_signals, start_new_session)
File "/opt/python/3.7.1/lib/python3.7/subprocess.py", line 1447, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory I've added this to the "flaky test" list above. |
Log for a failure in Travis where a test failed due to "Contact info not found for suite" https://gist.github.com/kinow/1429a911fae2e7cbbb7dcec63e64c653 Looks like when this error occurs, some tests that are not in the list of flaky tests fail. Which could indicate resource unintentionally shared by tests? |
May have a fix for tests/restart/21-task-elapsed.t |
@matthewrmshin I've added two tests that failed today in a trivial PR, but it was just so I don't forget about them. Should I, instead, create a PR for each test that fails under Travis - and is not related to the change, and pass locally in the same branch - to move that test to the flaky tests folder? |
👍 |
Then we can close this issue? - until someone dreams up a better way to handle flaky tests (which can be a new issue). |
+1 ! |
Will raise a PR for the flaky tests in a few minutes. Closing this one for now 🎉 |
Just to record that that the test battery can fail randomly or in certain scenarios (under a time zone, when an environment variable is present, etc). This ticket is just to check if it'd be possible to make it more stable, perhaps avoid re-running it, and, of course, sporadically failures.
[Edited by @matthewrmshin ] Since #3224, sensitive tests have gained their own hierarchy in the source tree under flakytests/ and run under Travis CI in serial. New flaky tests should be moved to the hierarchy, with the view that it may be possible to move some back to live with the other non-flaky tests. (I have commented out the original list - as the source tree is now the master list.)
List (post #3224 and #3286):
./tests/cylc-reset/03-output-2.t
- failed in Remove obsolete command category "hook". #3311./tests/registration/02-on-the-fly.t
- failed in Remove obsolete command category "hook". #3311(When adding new ones, please bear in mind that the list is in alphabetical order.)
The text was updated successfully, but these errors were encountered: