-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH Artifact lookup times out #482
Comments
from the logs it looks like craft sees your CI as successful before the artifact upload job has started -- once it sees success it moves on to trying to locate artifacts from the successful CI you can configure the necessary jobs here |
Ah, I see, so basically this is completely wrong the way it is set up currently 😅 So I need to add this https://github.com/getsentry/sentry-javascript/actions/runs/5715683859/job/15486522728 as the required context, so: statusProvider:
name: github
config:
contexts:
- All required tests passed or skipped |
@asottile-sentry can you clarify what For reference, here's the job we want to wait on: |
Or is it enough to just set without any context? statusProvider:
name: github |
I am not sure -- I'm just going by the docs :S |
…8685) (#8729) Seems like the `context` passed isn't found which is blocking our release. Reverting for now until we know how to configure the status provider correctly (getsentry/craft#482). #uncraft
@Lms24 newest versions of Craft use |
Very first step would be to change this line: https://github.com/getsentry/sentry-javascript/blob/99e347907812873cc0c832aa25968c3db6587af1/.craft.yml#L1 To say Moreover, the issue you originally reported doesn't seem to be related to contexts. I'll dig a bit more and report back here. |
If you look at the logs starting from here: https://github.com/getsentry/publish/actions/runs/5715907847/job/15486244767#step:10:80 It actually just waits for all the checks to pass. The |
So here in the artifact upload job for the failed publish, it says it successfully uploaded the artifacts at 23 past the hour: https://github.com/getsentry/sentry-javascript/actions/runs/5715683859/job/15485977459#step:6:970 The publish job fails at 27 minutes past the hour: https://github.com/getsentry/publish/actions/runs/5715907847/job/15486244767#step:10:133 That means for some reason GitHub API did not make the uploaded artifacts available for about 4 minutes. That's quite long. It maybe because the artifacts seem quite large. I googled a bit to see if there's any mention of an expected delay for artifacts to be available but failed to find one. You may wanna raise this issue with GitHub support for investigation. In the meantime, either increasing the number of tries or extending the delay as you originally suggested makes sense for a workaround. |
@BYK thanks for looking into this!
I'm confused now. I thought we just always use the latest craft version when starting a release via getsentry/publish? We only recently were blocked by a bug in the latest craft version. So why would it take an older version just for the status provider?
Ok, so this, plus your last reply, plus the fact that GH is already the default status provider, suggests to me that our config doesn't need adjustments but it's a GH problem 🤔 The weird thing here is that our initial publish attempts fail often. Really often. I'm wondering how often that's overall because of test flakes vs. this artifact retrieval problem.
Still seems like a good idea to me. |
My pleasure!
Sorry for the confusion I caused. Some years ago, we did not default to GitHub as the status provider. That was changed around version 0.21.0 and we added a check: if the config file mentioned a version earlier than 0.21.0, don't default to anything. For anything else, use GitHub. So this file was already mentioning 0.23.1, hence past that version, hence nothing to change or worry about. Moreover, that epoch check was removed from the code a long time ago too 😅
Exactly! If you had an issue with your config, you'd have known about it a very long time ago.
That is very easy to check and quantify. Just go check the logs and see why they fail. This one you linked was due to GitHub not making artifacts available quickly enough. If this is frequent, I'd blame that as if your tests fail I think you get another fail email for the failed workflow on the release branch. |
The artifacts however are available here: https://github.com/getsentry/sentry-javascript/actions/runs/5752023602 I'd definitely raise this with GitHub Support. Again, I suspect it may have todo something with the number of files or the size of the unzipped artifacts. |
Hmm I'm not sure if we're missing something here. I think there might still be some problem in the status provider. Looking at the run I linked, we can see the following time stamps:
So we started downloading artifacts before the action even completed. Shouldn't the status provider in Craft only start downloading artifacts once the entire action completed (by default)? I have a feeling that the artifacts might only be available once the action completed. Does this make sense? |
Aha, that makes sense! Then I guess calculating the combined status of the commit step may some issues. That part was always a bit tricky. Relevant code is here: craft/src/status_providers/github.ts Lines 87 to 114 in 58d5b3c
We may wanna do some digging into the API and modernize that code, especially due to the "legacy checks" part. |
@Lms24 it seems in that case then a job would need to be added to your workflow which requires all others to finish -- then you'd use that as the indicator for craft's status |
I guess it's worth trying as long as sentry-javascript is the only affected repo. But it means we're adapting to broken behaviour. |
I think what's happening is there's a point where everything is passed (all statuses green, none pending) and craft can't know that there's more things to run |
… and other required jobs (#8751) Adjust our final "Required Tests Passed" CI job to not only depend on required _test_ jobs but on _all_ required jobs. Specifically, this adds * Lint * Circular deps check * Most importantly: Upload Artifacts With this change, we can configure craft's status provider to specifically wait for this job which should fix the artifacts download timeout (getsentry/craft#482). It's worth noting that this PR will only fix this timeout in our repo and in a way we're adjusting to the status provider check. However, given that we're apparently the only repo where this happens, it's probably justified and it makes things more explicit.
Soo, I just tried to set statusProvider:
name: github
config:
contexts:
- job_required_jobs_passed but the release failed again. I'm going to try When I query the API endpoint manually with gh api \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
/repos/getsentry/sentry-javascript/commits/583e0bd53968b56d0f63c132c32b989dbcb999cb/status I get an empty array for {
"state": "pending",
"statuses": [],
// ...
} Not sure if this endpoint even works correctly. For instance, it always returns |
Update: I tried |
Given that there's no way to make this work with our current status checks: I'm wondering if we should use the Actions endpoint instead of the commit based endpoints: https://docs.github.com/en/rest/actions/workflow-jobs?apiVersion=2022-11-28#get-a-job-for-a-workflow-run Seems like this is what we need to poll the status of a specific job. Now what's left to figure out is
|
Actions endpoint should it be used here as that's specific to GitHub actions but the GitHub status provider (and GitHub status checks) are independent from GitHub Actions. You may wanna try graphql endpoints. |
Alright folks, I'm done with trying to fix this. Given that there's a somewhat working but annoying workaround (namely, telling our managers to wait ~30min with adding the accept label in getsentry/publish) I can't justify working on this any longer. My gut feeling tells me that we can't be the only repo that's affected by this but what can I do... To whoever picks this up: If we can't easily change the current github status provider, maybe the path of least friction/resistance is to create a second Github (Action) status provider that uses working endpoints. Individual repos could opt-in to use this provider instead and we wouldn't risk breaking existing publishing configs who rely on the older endpoints (if there are any). |
It happens repeatedly for us that publishes time out when looking for artifacts uploaded to github. One example: https://github.com/getsentry/publish/actions/runs/5715907847/job/15486244767
This happens basically on every sentry-javascript release if we approve it while the checks are still running, so it seems to me something is not right with the waiting. Maybe we need to extend the
ARTIFACTS_POLLING_INTERVAL
(currently 10s), or extend theMAX_TRIES
(currently 3)?The text was updated successfully, but these errors were encountered: