dev: identifying flaky tests? #5195

mrienstra · 2022-10-26T06:36:00Z

I noticed some tests:
packages/create-astro/test/typescript-step.test.js#L37-L83
(which I feel somewhat responsible for, since I added them)
... seem to be flaky. I first noticed them failing in this PR:
https://github.com/withastro/astro/actions/runs/3316348321/jobs/5478061799
And then again today in this PR:
https://github.com/withastro/astro/actions/runs/3324584748/jobs/5496363068

I've seen other failing tests recently that didn't seem to be related to the PR, I can dig some other examples up later, or if anyone else knows of some off hand, please chime in.

Anyway -- putting aside for the moment that I need to fix those tests -- it would be great to have some way of analyzing test failures and spotting patterns. Or maybe that already exists for the Astro repo and I'm just not aware of it. So opening this issue to see what others think.

I skimmed a few articles on the subject:

Detect and track flaky Mocha tests - Sept 2021 - BuildPulse:
- Demonstrates usage of mocha-junit-reporter & mocha-multi-reporters with GitHub Actions.
- Is a little vague re: storing and analyzing, as that's what their service does.
- They have an open source plan waitlist, they responded quickly to my email when I asked about it, my vague initial impression is favorable.
How to Find and Eliminate Flaky Tests - Jul 2022 - Semaphore:
- Doesn't give too much info on capturing failures for analysis, aside from how to do so using their test reports feature.
- Semaphore offers a free plan for open source projects.

CircleCI can detect flaky tests and offers free credits for open source, but that might be overkill.

Kiwi TCMS also looks like overkill, but maybe there's any easy way to just use it for testing telemetry...?

retries can be helpful for flaky tests: Mocha docs for retries. WebdriverIO has a nice summary. (Astro already retries Playwright tests 3 times in CI, 5 times in packages/integrations/prefetch) Not recommended for unit tests, but the packages/create-astro tests I mentioned are an example of tests that are kind of in between unit tests & e2e tests, maaaybe some of the other flaky test also fall into this category.

Found this package:

Dylanlan/mocha-bad-test-finder: I'll give it a whirl locally. Seems similar to something I've done before while debugging flaky Playwright tests. Wouldn't help with capturing & analyzing failures in CI.

The text was updated successfully, but these errors were encountered:

matthewp · 2022-10-26T12:46:05Z

We have a few. I have a list on Discord (I'll move to the #dev channel and ping you). Would love to continue the discussion there if possible. My opinion is that there are only 2 or 3 of these that cause false-positives in PRs and we should just skip or remove them. Happy to continue the discussion on discord.

matthewp closed this as completed Oct 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dev: identifying flaky tests? #5195

dev: identifying flaky tests? #5195

mrienstra commented Oct 26, 2022 •

edited

Loading

matthewp commented Oct 26, 2022

dev: identifying flaky tests? #5195

dev: identifying flaky tests? #5195

Comments

mrienstra commented Oct 26, 2022 • edited Loading

matthewp commented Oct 26, 2022

mrienstra commented Oct 26, 2022 •

edited

Loading