-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache write race condition across processes #4444
Comments
Isn't this related? #4432 |
I don't believe so. I believe the case we see in our repo is the exact same file getting mocked for 2 different processes (while running in parallel) which causes the cache write operation to fail because one process has the file locked. That ticket looks like is more about different files with the same contents. We don't have any of those cases in the repositories we host that we ran into this issue. |
We basically run into the same issue with our tests. One easy way to reproduce was to remove jest
|
Having the same issue and can't find a way around it. Jest is basically unusable for us like this. |
We are trying to update to 21.2.0 from 20.0.4 and now we have the following error on our build servers:
|
I'm now having the same issue tests are broken randomly. |
If I run the tests with the --runInBand flag then as expected everything is OK. |
I can see the same issue fairly consistently:
jest 21.2.1
Anything we can do to help diagnose the issue? Should I create a repro case? Is this error critical? Can it be treated as a warning rather than taking down the whole test suite? Is there a way to back off and retry? |
Having a small repro would be great |
Here's the repro: https://github.com/asapach/jest-cache-issue/ (Out of curiosity tried running it in WSL on Win10 and the issue is not reproducible using Posix API) |
Is this just happening on Windows? I don't have access to windows machines beyond virtual machines, so not the easiest to debug for me... @jeanlauliac you added |
This problem is very similar to npm/write-file-atomic#10 and npm/write-file-atomic#22. |
Just ran a procmon trace, here's an example of the issue:
As you can see two processes are trying rename the same file within 1ms of each other and the second one fails. I think npm/write-file-atomic#22 addresses the async version of |
Would it be possible to create a repro showing that just using Or if you could write a test within jest that shows the same error (we have appveyor CI) that could be a start as well? I'm not even sure what behavior we want in case of this error. Retry the write? Rerun the test? The whole test file? |
OK, I'll try to create another repro. Not sure it's possible to create a jest test, because it would require spawning multiple processes, disabling/cleaning the cache and keep running until it fails.
Well, firstly the issue should not even happen when |
That's a good point, and should be fixed separately. That way
@cpojer thoughts on making it not be sync? Not sure how that scales. Or if you have another idea on how to fix this |
|
Here's the other repro with Findings so far: the sync version fails as expected, but surprisingly the async version fails as well. This means that they probably implement a retry queue only when it runs in the same process, which doesn't help in our case. |
New flag? It's a highly misleading name. And on e.g. CI you rarely want the cache anyways, so it's just wasted resources. Or is a cache generated within a single test run used during
Awesome! Could you open up an issue against write-file-atomic? It feels like a fix should go there, and if not (they don't want to support multiple processes writing at once) we can revisit on our side. WDYT? |
A patch I tried locally that seemed to work is ignoring the error if it comes from trying to rename to a file with the same content. Since it just means another process 'won' the race, we can ignore it and move on. const cacheWriteErrorSafeToIgnore = (
e: Error,
cachePath: Path,
fileData: string,
) => {
if (
e.message &&
e.message.indexOf('EPERM: operation not permitted, rename') > -1
) {
try {
const currentContent = fs.readFileSync(cachePath, 'utf8');
return fileData === currentContent;
} catch (e) {
}
}
return false;
}; |
Upstream issue: npm/write-file-atomic#28 |
I think this means "rename" is not an atomic operation on Windows, so it breaks the assumption made by @jwbay your solution looks reasonable to me! 👍 Instead of using |
I was about to start work on a PR for |
I think adding this logic (local queue) wouldn't fix the issue, because it happens mostly when different processes try to write to (rename to) the same file. To fix concurrency issues once and for all, we may have to reconsider how we do caching, for example have a single process that access the cache, with which we communicate over some kind of IPC. Existing key/value store systems may be handy, such as |
Ah, I perhaps misunderstood the issue then. The way I read it, the library already has a queuing mechanism that works nicely for the async requests, but if you mix in sync requests as well you can get collisions. |
My above referenced pull request should solve this issue. At least it did it for me! |
@fluffynuts Thanks :) I have applied the beta version to our project. But it fails:
By the way, the |
so I can try to repro, please provide the exact version of jest you're using? |
alternatively, if your project is opensource, please link me to a branch causing this, or create a minimal repro-repo - the more info I have, the more likely I am to squash it. We're not seeing this error on our Jenkins CI (windows) or any of our windows dev machines, so right now, I can't repro. |
@fluffynuts strange the ScriptTransformer contains a
|
Okay I made some progress: the problem seems to be the |
Okay somehow I missed your PR. But exactly the same change I made in my PR 😊 So I can close my PR. In nodejs/node#17921 (comment) it will be mentioned that fs.exists will return false on EPERM error so it is clear that the current code can not work. But why has you PR not been merged? |
We are always on the latest released version (29.4.3 at the time).
The project is not open source, I am sorry. We have over 5k Jest tests and large codebase. And I guess it is related to the huge amount of transpiled modules. |
We are using forked CRA v5.0.1 that uses Jest v27.4.3. Is the issue fixed in latest jest, and we should try to upgrade it? We are using
Issue happen pretty often. |
I'm getting this issue too. Only a few hundred tests in my codebase. |
Running Jest tests on windows may trigger some error like: EPERM: operation not permitted This related to issue: jestjs/jest#4444 As one possible workaround, you may run tests with --runInBand. An artificial script named `jest:windows:workaround:issue:4444` has been added for this.
@AngusMacrae have you tried my hack-around?
obviously, run after |
We started using Linux agent for Jenkins. It fixed the issue. |
well, it works around the issue because linux can do COW for files in use, but yes - this is why the issue has remained unfixed in Jest: I think most people are using linux or osx so they just don't care :| |
Yes, it is a pity that nobody cares about this issue. |
@fluffynuts seems jest v30 will break jest-patch-cache https://www.npmjs.com/package/@jest/transform/v/30.0.0-alpha.1?activeTab=code Because now all the files are bundled into one index file |
hooray! :D PRs welcome; I don't really have the time to look at this rn and we haven't upgraded to that yet, so I guess I can put it off for a bit :D |
|
Upgrading |
Do you want to request a feature or report a bug?
bug
What is the current behavior?
When running tests in parallel with the new atomic cache writing, we're getting rename errors as multiple processes try to write to the same files. Even with
--no-cache
option set it's still hitting rename errors because it's still trying to write to the files.What is the expected behavior?
--no-cache
should not write cache filesPlease provide your exact Jest configuration and mention your Jest, node, yarn/npm version and operating system.
jest 21.0.1
node 6.9.2
yarn 0.27.x/1.0.0
OS Windows
The text was updated successfully, but these errors were encountered: