-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel workflows containing jobs with the same name use the same cache key, resulting in "Failed to save cache entry" #699
Comments
Thanks for the report. The cache entry should only be written in the post-action of the step, which in this case would be "Post build platform commons with Gradle". This error message usually results from a cache entry written during a previous execution of the workflow (with the same commit id). For example, if you use "Rerun Jobs" in the UI, or kickoff the workflow manually. The fact that the cache entry is "almost empty" is not an indication of a problem. Most of the content of Gradle User Home is extracted into separate cache entries and is not contained in the main Gradle User Home cache entry. I understand that the repository is private: are you able to share the logs (possibly redacted) or the Caching Summary with me directly? That might assist me to diagnose the problem. |
Thanks for your answer :)
I don't understand either, so I made a video explaining how I found the issue. To sum up, while the workflow runs, I refresh the Github caches page, and during the build, we can see that a cache entry is saved. This cache entry contains the same thing as the cache entry that was restored initially. gradle-build-cache-bug-report.webm
I can confirm you that there is no mention of the cache entry being created. I looked at the logs prior to the moment when the cache entry got saved. The only logs about cache are in the first step, when the cache is restored. Logs:
Gradle build action settings (each step have the same settings, the only thing that changes, is the build-root-directory and arguments, as you can see in the original post):
Cache summary (these logs are not the one of the video because we now have the workaround in the main branch, so we do not have the issue anymore. These logs are the ones of a run that happened before the workaround)
Logs in the post-action step - only for gradle user home (these logs are not the one of the video because we now have the workaround in the main branch, so we do not have the issue anymore. These logs are the ones of a run that happened before the workaround):
I can confirm to you that the issue happen without triggering a job rerun, as you can see in the video.
Yes, indeed, I realized that the cache entry that gets saved before the post-action step is actually the same as the one initially restored. My bad, I was wrong on this part. |
Thanks for all of the details! This is most perplexing. Only the post-action contains the logic to save a cache entry. And my understanding/experience is that post-actions will be executed after all of the main actions have run. There is also lots of logging around saving a cache entry. It doesn't seem possible that the action saves a cache entry without any evidence in the logs. I can think of only 2 explanations based on the 1. GitHub Actions is somehow executing the
|
Thank you for your response! I found the issue. I have 2 Jobs running in parallel (one for backend, one for frontend to give you some context), and both use gradle-build-action. In the logs, both generate the same cache key. The frontend job is faster than the backend job, so the frontend job saves the cache while the backend is still running. I thought that it was not possible.
According to the README:
Shouldn't the cache keys be different in my case ? It's important to share cache between jobs, but the key should be different across different workflows Message:
|
Yes the keys should be different. Here's the logic that determines the cache key, and looking at it now I can see that 2 workflows that have jobs with the same name could produce the same cache key. This is unintentional. Thanks for helping to track this down. A fix for this will be to include the workflow name/id when identifying the Job in the cache key. In the meantime, you can override this by setting the Please let me know if that allows you to remove your current workaround. I'll be sure to get this fixed for the next release. |
PS There are a couple of recommendations I'd make to simplify your build/workflows.
|
I can confirm you that both workflows have the same job name: "build", that's why the key was:
I changed jobs' names, and now I have two different cache entries:
Thank you for your suggestions, it is super valuable! I just made the modifications. The only thing is that I added Thank you again for your reactivity, and your work :) |
Glad to hear you got this sorted. We'll be sure to fix this in an upcoming release so that others don't hit this issue. |
Fixes #699 by avoiding cache key collisions between jobs with the same name in different workflows.
I think we're experiencing an issue after this change. 🤔
Wouldn't it be better to use the worklow filename instead of name in the cache key? Btw the override proposed here points to a different env var, is that intentional? Thanks! |
Yes I'll see if this is available via an GH env var. But I think it still will make sense to sanitize the cache key to remove commas and other illegal characters. |
Context
Here are the steps of our Github CI
From the logs of "Post build platform commons with Gradle":
We had the problem using the version 2.2.2 of this action. Updating to 2.4.2 did not fix the issue.
Source of the problem
After investigating for a few hours, it turns out that a cache entry is systematically created between the "Build platform with Gradle" step and the "Run platform fastest tests with gradle" step. This cache entry is almost empty (usually around 8MB).
There is no log whatsoever indicating that the cache entry has been saved at this time.
Workaround:
I added a step that manually deletes the cache entry created during the build. Now there is no more problem when saving gradle home. The whole CI now takes between 5-10 minutes whereas it took more than 45 minutes before this workaround.
Notes:
This project is private, so I can not share it. I tried to reproduce it with an example project, the problem happened during the first action run (a cache entry for gradle home was saved during the build, its size was around 100kb). The next ones did not have the issue. I tried looking at the source code of this project, but I could not find the issue either.
I may try to find the issue later on and if I find it, I open a PR.
The text was updated successfully, but these errors were encountered: