-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tune release-blocking kind job resources down #19081
Tune release-blocking kind job resources down #19081
Conversation
The kind release-blocking prowjobs have been ending up in error state over the past few week days, often due to pods being unable to schedule due to insufficient CPU. We've been seeing increased PR traffic due to v1.19 getting released, and a concerted attempt to clear the backlog of v1.20 PRs that were waiting for code thaw. The kind release-blocking jobs were the only jobs migrated to k8s-infra-prow-build that reserved 7300m CPU. The next highest cpu limit is 7, used by the verify presubmit and bazel-test periodic jobs. These jobs have not been ending up in error state. I was originally going to match the kind presubmit jobs, which use 4 cpu, but they have been failing a lot today during times I would correlate with high PR volume. I'm still tuning the memory to match based on graphs in the issue linked to this PR.
/cc @BenTheElder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hasheddan, spiffxp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@spiffxp: Updated the
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The kind release-blocking prowjobs have been ending up in error
state over the past few week days, often due to pods being unable
to schedule due to insufficient CPU. We've been seeing increased
PR traffic due to v1.19 getting released, and a concerted attempt
to clear the backlog of v1.20 PRs that were waiting for code thaw.
The kind release-blocking jobs were the only jobs migrated to
k8s-infra-prow-build that reserved 7300m CPU. The next highest
cpu limit is 7, used by the verify presubmit and bazel-test periodic
jobs. These jobs have not been ending up in error state.
I was originally going to match the kind presubmit jobs, which use
4 cpu, but they have been failing a lot today during times I would
correlate with high PR volume. I'm still tuning the memory to match
based on graphs in the issue linked to this PR.
Hoping to address #19080