Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic test suite execution failures #5024

Closed
symbiont-joseph-kachmar opened this issue Aug 29, 2019 · 6 comments · Fixed by #5757
Closed

Non-deterministic test suite execution failures #5024

symbiont-joseph-kachmar opened this issue Aug 29, 2019 · 6 comments · Fixed by #5757

Comments

@symbiont-joseph-kachmar

General summary/comments (optional)

stack test seems to non-deterministically fail when executing a test suite for a multi-project repository with parallelism enabled (e.g. stack build -j8 --test) with the following errors:

/tmp/stack-7235c340cf08101f/test-ghc-env: openBinaryFile: resource busy (file is locked)

/tmp/stack-7235c340cf08101f/test-ghc-env: openBinaryFile: resource busy (file is locked)

/tmp/stack-7235c340cf08101f/test-ghc-env: openBinaryFile: resource busy (file is locked)

/tmp/stack-7235c340cf08101f/test-ghc-env: openBinaryFile: resource busy (file is locked)

/tmp/stack-7235c340cf08101f/test-ghc-env: openBinaryFile: resource busy (file is locked)

/tmp/stack-7235c340cf08101f/test-ghc-env: openBinaryFile: resource busy (file is locked)

I haven't been able to pin down a reliable reproduction, however if I were to guess I'd say it's got something to do with the testGhcEnvRelFile, possible either colliding with an existing on in this call:

writeFileUtf8Builder fp ghcEnv

...or bubbling up from Cabal or ghc somehow?

Steps to reproduce

Unable to reliably reproduce.

Expected

stack build -j8 --test should successfully execute all tests in a multi-project repository.

Actual

stack build -j8 --test fails non-deterministically with an error message about busy resources (locked files).

Stack version

$ stack --version
Version 2.1.3, Git revision 0fa51b9925decd937e4a993ad90cb686f88fa282 (7739 commits) x86_64 hpack-0.31.2

Method of installation

  • Compiled from source via cabal-install
  • Official binary
@ulidtko
Copy link

ulidtko commented Sep 10, 2020

Can confirm, the same happens in our CI with exactly the same stack --version.

It fails when test dependencies are built and installed.

Currently we have to work around and use stack test -j1 which avoids the issue.

@roberth
Copy link
Contributor

roberth commented Oct 8, 2020

Here's another one https://travis-ci.org/github/fpco/inline-c/jobs/734110470

@wraithm
Copy link
Contributor

wraithm commented Feb 2, 2021

I'm experiencing the same issue in my company's CI system with Github Actions.

@ChickenProp
Copy link

ChickenProp commented May 20, 2021

Thanks to this issue I was able to reproduce locally. Some more details:

We have a multi-project repository. Six projects have test suites, five with one each and one with two. (Some other projects have no tests.)

If I run a loop

time while stack test --fast -j8 --ta '-m nothingmatching'; do echo -e '\n---\n'; done

It fails sooner or later, usually after less than a minute, with the error in question, spending about 5s on each test run. I've seen between one and four copies of the error, e.g. one attempt gave

tmp/stack-aa1cf248d6a0dc70/test-ghc-env: openBinaryFile: resource busy (file is locked)

/tmp/stack-aa1cf248d6a0dc70/test-ghc-env: openBinaryFile: resource busy (file is locked)

/tmp/stack-aa1cf248d6a0dc70/test-ghc-env: openBinaryFile: resource busy (file is locked)

It seems like there's one copy for each project whose tests fail to run. Hypothesis: at least one project's tests will always run, so I might eventually see five copies but not six.

If I specify a single test suite (stack test projectname:test:suitename ...) or single project (stack test projectname ...) I haven't seen it fail yet, even with the two-suite project.

If I run with --verbosity debug, for the suites that run we get a message

[debug] Run process within /path/to/repo/projectname/: /path/to/repo/projectname/.stack-work/.../build/suitename/suitename -m nothingmatching

and a corresponding [debug] Process finished in 2042ms: /path/to/... afterwards. But we don't get either of those for the suites that don't run. So that narrows down where the problem might be, I guess.

It looks like this is definitely happening while running tests. I can't rule out that it would also happen in other steps. But depending on use case, for at least some users (and possibly including for ourselves) it might work to build and run tests separately, i.e. something like

stack test -j8 --no-run-tests
stack test -j1

This is with stack

Version 2.6.0, Git revision 23430bf10dcaf81f0436556c0b58c28b168e744b PRE-RELEASE x86_64 hpack-0.33.0

(It would probably be good to get a tarball with a minimal test case for reproduction, but I'm unlikely to get to that any time soon.)

@julialongtin
Copy link

I can reproduce this every time with my project. stack really doesn't handle 56 threads well.

zyla added a commit to zyla/stack that referenced this issue Oct 27, 2021
Since the build temporary directory is shared between jobs, we can't use
the same file name for the `test-ghc-env` for all of them.

Fixes commercialhaskell#5024
AshleyYakeley added a commit to AshleyYakeley/Truth that referenced this issue Mar 30, 2022
@kozak
Copy link

kozak commented Jun 7, 2022

We are using @zyla's fix (#5757) in production for 6 months and it works perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants
@ChickenProp @ulidtko @roberth @kozak @wraithm @julialongtin @symbiont-joseph-kachmar and others