Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GHA jobs fail because of too many requests #162

Closed
jakirkham opened this issue Jul 15, 2024 · 7 comments
Closed

GHA jobs fail because of too many requests #162

jakirkham opened this issue Jul 15, 2024 · 7 comments
Assignees

Comments

@jakirkham
Copy link
Member

Have started seeing GHA jobs fail because of too many request (for example:

#26 ERROR: toomanyrequests: Too Many Requests (HAP429).

------
 > pushing rapidsai/ci-conda:cuda11.8.0-rockylinux8-py3.10-amd64 with docker:
------

This is the same as issue ( rapidsai/miniforge-cuda#72 ), which already contains suggestions on how this could be resolved

@jameslamb
Copy link
Member

I just observed the same thing on https://github.com/rapidsai/ci-imgs/actions/runs/11614334107/job/32342353447

43 of 270 jobs failed because they hit this rate limit. All succeeded on a rebuild. Just posting since this is an old issue, to note that this is still a problem we hit sometimes.

@jameslamb
Copy link
Member

jameslamb commented Dec 3, 2024

I just merged #216, which limits this repo to running 150 jobs at a time within a workflow run. Let's see if that helps.

@jameslamb jameslamb self-assigned this Dec 5, 2024
@vyasr
Copy link
Contributor

vyasr commented Dec 10, 2024

@jameslamb what data do we want to see before we close this issue?

@jameslamb
Copy link
Member

🤷🏻 I don't know, there are so many factors that affect whether or not we hit this.

Let's say when the next round of renovate PRs go up on January 1st, if we don't see any rate-limiting errors, we close this.

@jameslamb
Copy link
Member

In #219 we reduced this further to 50 parallel jobs.

Saw multiple runs on different days where most jobs succeeded and 0 had rate-limiting issues. ref: #219 (comment)

If the job triggered by the merge of #219 also succeeds without any rate-limiting issues, I think we should close this. Build to watch: https://github.com/rapidsai/ci-imgs/actions/runs/12816222911

@jameslamb
Copy link
Member

If the job triggered by the merge of #219 also succeeds without any rate-limiting issues, I think we should close this. Build to watch: https://github.com/rapidsai/ci-imgs/actions/runs/12816222911

Pretty similar to what we saw on #219... 265 out of the 270 build jobs succeeded, and that took about 27 minutes.

Importantly, none of the failures were related to rate-limiting

Just restarted the failed jobs there. I'd like to see all the pushes to DockerHub also succeed without rate-limiting issues before we close this.

@jameslamb
Copy link
Member

All the pushes succeeded, with 0 need for restarts!

https://github.com/rapidsai/ci-imgs/actions/runs/12816222911

I think this can be closed. Thanks @bdice for keeping it moving forward with #219

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants