Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pull-kubernetes-e2e-gce-gpu blocking #4642

Closed
rohitagarwal003 opened this issue Sep 19, 2017 · 20 comments
Closed

Make pull-kubernetes-e2e-gce-gpu blocking #4642

rohitagarwal003 opened this issue Sep 19, 2017 · 20 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@rohitagarwal003
Copy link
Member

The pull-kubernetes-e2e-gce-gpu job has been running for a while on every PR and has been reasonably non-flaky: https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-e2e-gce-gpu

We recently stopped running all non-blocking tests on every PR in #4640.

I would like to make the pull-kubernetes-e2e-gce-gpu blocking if there are no objections.

@rohitagarwal003
Copy link
Member Author

/assign @krzyzacy @BenTheElder @fejta

@rohitagarwal003
Copy link
Member Author

/cc @vishh

@krzyzacy
Copy link
Member

cc @spiffxp

@krzyzacy
Copy link
Member

Let's make the gpu job working on prow instead of Jenkins and we can make it always run and blocking. @BenTheElder is working towards it.

@BenTheElder
Copy link
Member

ref #4639 for the most recent PR, I've had a job pass now and once that config is updated I expect this should work. we can probably officially migrate to prow soon but I want to see it pass more :-)

@BenTheElder
Copy link
Member

Following up, I've gotten one run in since #4639, and it was succesful.
Runs here, Sep 19 22:17:49 is the start time of the first one after that PR.
Also here under pull-kubernetes-e2e-gce-gpu-prow.

I will run more and keep an eye on them.

@BenTheElder
Copy link
Member

This also works on release-1.7 and has 4 succesful runs since #4639.
I think we can probably switch the GPU job to run on Prow any time @mindprince.

@krzyzacy
Copy link
Member

I think we can flip it to prow now to give @mindprince more signals :-)

@rohitagarwal003
Copy link
Member Author

Awesome. Thanks guys!

@krzyzacy
Copy link
Member

We'll watch the job for a few days (until code freeze maybe), and @mindprince you will want to propose to release team to make the job blocking (maybe join our burndown meeting next Monday?) :-)

@krzyzacy krzyzacy assigned jdumars and unassigned BenTheElder, fejta and krzyzacy Sep 20, 2017
@krzyzacy
Copy link
Member

@jdumars FYI

@BenTheElder
Copy link
Member

This job has been pretty well behaved, perhaps we can make it blocking after the release is done? I think we will need to propose it somewhere (the sig-testing meeting maybe?)

@spiffxp
Copy link
Member

spiffxp commented Sep 29, 2017

Just bumped into a quota issue

https://k8s-testgrid.appspot.com/kubernetes-presubmits#pull-kubernetes-e2e-gce-gpu

W0929 18:01:58.316] ERROR: (gcloud.compute.networks.create) Could not fetch resource:
W0929 18:01:58.317]  - Quota 'SUBNETWORKS' exceeded.  Limit: 160.0

@BenTheElder
Copy link
Member

xref: #4472 for probably the best solution, though generally we should be fine via max_concurrency and the janitor.

@BenTheElder
Copy link
Member

Adding a note: the networking quota issue has been solved for this job and the fix is being rolled out to other jobs (thanks @MrHohn !)

@krzyzacy
Copy link
Member

/unassign @jdumars
/assign @mindprince @BenTheElder
I guess feel free to send or update the PR to make it blocking before 1.9

@BenTheElder
Copy link
Member

@krzyzacy I did: #5134

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 24, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 23, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants