-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I am experiencing race condition while using SimpleCov with Parallel_tests #559
Comments
Have you tried setting |
I did just now, and it didn't help. Also please notice that |
Which version of simplecov are you using? Which Rails and Ruby version? Which testing framework? Nothing obvious sticks out from over here :(
sort of override each other, refuse_coverage_drop sets maximum_coverage_drop to 0 but that's just an aside here :) |
Coverage MUST NOT DROP! :-) I just ran it again, and this time the results only up to the second-last process got considered even though the last process reached 100%. I'm sure there's a race condition somehow:
|
Sorry must have tiredly missed the versions :( All we can do atm is look at the merging code and see if anything pops out as maybe being a race condition or something. If you could share a sample with us that reproduces this it'd be great but probably hard to do as it might also be related to something local (race conditions and execution speed). If that's not possibly I'll likely post some puts debugging I'd like to see when I get to look into how the code works :) |
I am so glad I documented the results from earlier! I started wiping out the resultset from previous run by adding the following to the
Result: The parallel_tests with SimpleCov still fails, BUT NO LONGER complains about:
I believe the cause of the self-contradicting summary below has changed by the action I took above:
New hypothesis:
|
For less ambiguity, I have conducted the following experiments:
Result: Conclusion:
Result: Conclusion:
Result: Conclusion:
OK, I'm back to thinking this is certainly a race condition. |
One more experiment: Fix the minimum threshold at 100%
Result is failure despite the second process hitting 100%:
Result: pass!
Note: This inconsistency not only exhibits itself on my Macbook, but also on a remote (Travis CI) Unix system. These experiments prove, I think beyond reasonable doubt, that SimpleCov has a race condition. |
OMG I see a bug #64 that has a comment from @jshraibman-mdsol on Jan 7, 2013 saying:
Seven months later, @colszowka wrote:
I am reporting the same bug 4 years (!!!) later, and I'm looking at PR #185 where a lock pattern is implemented and supposedly merged with master, but I just checked master and it doesn't have it! See the commit: This is a regression bug. The fix PR #185 somehow got lost on master. |
The problems I'm seeing are somewhat consistent with @ronwsmith 's description at the bottom of #223 (except I do see LOC numbers) Note to self: |
SOLVED! The
Note, the Result for 110% threshold, as expected:
Result for 100% threshold, as expected:
You can take that to the bank! :) |
@bf4 I just now after my above workaround started seeing the wisdom of your first reply -- but SimpleCov.pid is always being assigned after forking inside the process, no? I don't see at what level a non-forked SimpleCov class object can exist. With parallel_tests, we're always inside one of the processes, with no "controller" process except the shell script that calls the parallel_tests command. |
I should leave one more comment for anybody looking at the history of PRs: using none of the past versions of SimpleCov solved this problem for me, whereas my own workaround did. So it's not a regression bug; it's a real bug that has never been fixed, at least for someone running a Macbook Pro, Rails 5, SimpleCov on Parallel_tests. Double confirmation: Build use to take 10 minutes, now it takes 2 minutes! |
#185 got reverted because it caused problems for Windows users. See the changelog for v0.8.2. |
I set it up as per the above URL, If you do not remove
|
In the last year since 2017, I've written many more tests, causing my processes to get loads more unevenly. My solution above does fail from time to time. I have found multiple underlying causes. For example, SimpleCov is like Schroeder's cat in that if you look at "SimpleCov.result.covered_percent" at the wrong time, you might alter the outcome. Also it seems like ParallelTests puts the most load on the last process. I'm now using this heuristic to avoid the race condition by having the last starting process be the one waiting for others. The following solution seems to run into LESS RACE CONDITIONS than my previous (upvoted) solution above, and it's also less verbose and more elegant, though I won't guarantee that it completely avoids race conditions:
|
+1 |
This should be fixed with the merge of #706 thanks to @f1sherman ! Release hopefully incoming within the next days (0.17.0). |
Ah yes, please reopen/comment if it persits on master/after release. |
@PragTob I am, unfortunately, still seeing this issue even after the release of 0.17.1. I have been unable to track down the cause of the problem, but it appears that something about the order in which the processes from On our CI machine, we run with 8 cores, and thus, by default, 8 spec processes. Whenever a spec that is not Here is as much relevant information as I can put together:
And here is some sample output by the CI:
I am able to work around this problem by forcing parallel specs to use a different number of processes, but that will eventually fail, and I have to bump the number around again. It's sort of a game of musical chairs: identify a configuration where the final specs process finishes last, and use that until it stops working. Let me know if I can provide any further information that might help diagnose. |
@PragTob I had time to come back to this and add some debugging statements through the
We can clearly see that SimpleCov is correctly identifying the final process to exit, but the call to This behavior is still intermittent, and the exact same codebase can pass or fail by adjusting the number of processes running the parallel specs. I'm turning my attention to the |
@PragTob closing this out as I've found the issue. The issue does not appear to be one with What we saw was that the final process was dropping all but the most recent results from the merged results, and tracked that down to In order to see this behavior we had to have several minor coincidences — we had an error in I don't think there's any change needed to tl;dr messing with time causes unforeseen issues |
@wkirby thank you so much for looking into it so much 💚 That... does sound quite gruel overall as in hard to track down. I'd wish there was something I know we could do, maybe I'll come up with something down the line but right now I can think of nothing off the top of my head. Or well... maybe I do. We maybe use monotonic time. Hm. |
I appreciate that. The only thing I thought of was a check if That goes further down the line towards handling weird interactions with other gems, though, and I'm loathe to suggest that it's the responsibility of |
Repro:
I'm using
simplecov (0.13.0)
,ruby "2.3.3"
,parallel_tests (2.13.0)
Ran:
Expected:
The last of the 8 processes reports 100% coverage, which should make SimpleCov pass.
Actual:
The result of the 3rd-last process was picked as the merged result!
The text was updated successfully, but these errors were encountered: