Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testground CI timeouts after 6 hours #8731

Closed
Tracked by #10
lidel opened this issue Feb 10, 2022 · 5 comments · Fixed by #8741 or #8884
Closed
Tracked by #10

Testground CI timeouts after 6 hours #8731

lidel opened this issue Feb 10, 2022 · 5 comments · Fixed by #8741 or #8884
Assignees
Labels
kind/bug A bug in existing code (including security flaws) kind/maintenance Work required to avoid breaking changes or harm to project's status quo need/triage Needs initial labeling and prioritization topic/test failure Topic test failure

Comments

@lidel
Copy link
Member

lidel commented Feb 10, 2022

Problem

Testground CI runs for 6h and then timeouts:

2022-02-10_22-32

2022-02-10_22-35

Did something changed 8 days ago?

2022-02-10_22-33

Solution

Aside from fixing these tests, cap them?
Seems that when it was working it took ~2minutes, perhaps we should set cap at 15m, 1h, or something, so these hanging jobs don't squat workers for 6h each.

@lidel lidel added kind/bug A bug in existing code (including security flaws) topic/test failure Topic test failure need/triage Needs initial labeling and prioritization kind/maintenance Work required to avoid breaking changes or harm to project's status quo labels Feb 10, 2022
@galargh galargh self-assigned this Feb 11, 2022
@galargh galargh moved this to Todo in IP Productivity 🆙 Feb 11, 2022
@galargh
Copy link
Contributor

galargh commented Feb 11, 2022

I manually disabled the tests for now. I'm going to look into it and try to find out what changed.

@galargh
Copy link
Contributor

galargh commented Feb 17, 2022

I've taken a closer look at the failing builds and found out that the requests that were supposed to be queuing testground runs were not succeeding:

Post "https://ci.testground.ipfs.team/run": unexpected EOF

The testground action didn't account for such a scenario. That's why the builds were hanging indefinitely.

@galargh galargh moved this from In Progress to In Review in IP Productivity 🆙 Feb 17, 2022
Repository owner moved this from In Review to Done in IP Productivity 🆙 Feb 18, 2022
lidel pushed a commit that referenced this issue Feb 18, 2022
* ci: set timeout on testground job
* ci: use testground action which exits early on scheduling failures

Note: this will be continued in #8731
@lidel
Copy link
Member Author

lidel commented Feb 18, 2022

Let's keep this open until @galargh can switch to upstream action with a fix from coryschwartz/testground-github-action#2:

https://github.com/ipfs/go-ipfs/blob/ce25140b69a2d1ae8d3fb6394bf269fd7b274e80/.github/workflows/testground-on-push.yml#L28-L29

@lidel lidel reopened this Feb 18, 2022
Repository owner moved this from Done to Todo in IP Productivity 🆙 Feb 18, 2022
@galargh galargh moved this from Todo to In Review in IP Productivity 🆙 Feb 23, 2022
@BigLep BigLep added this to the Best Effort Track milestone Mar 10, 2022
@BigLep BigLep moved this to 🏃‍♀️ In Progress in IPFS Shipyard Team Mar 10, 2022
@BigLep
Copy link
Contributor

BigLep commented Mar 10, 2022

@galargh : just checking in to see if we've done the fix so we can resolve this issue.

@galargh
Copy link
Contributor

galargh commented Mar 10, 2022

The issue is fixed but we're waiting with the resolution until we get coryschwartz/testground-github-action#2 contributed back into mainstream so that we can stop using my fork.

Maybe we should ask about transferring the action to testground org or create a fork there instead?

Repository owner moved this from 🤔 Triage to 🥳 Done in InterPlanetary Developer Experience Apr 14, 2022
Repository owner moved this from 🏃‍♀️ In Progress to 🎉 Done in IPFS Shipyard Team Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) kind/maintenance Work required to avoid breaking changes or harm to project's status quo need/triage Needs initial labeling and prioritization topic/test failure Topic test failure
Projects
No open projects
Archived in project
3 participants