-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile,runtime: frequent test timeouts on solaris-amd64-oraclerel #51443
Comments
It makes sense to increase |
"Bryan C. Mills" ***@***.***> writes:
@golang/release, @rorth: would it make sense to set `GO_TEST_TIMEOUT_SCALE`
on this builder to make the timeouts more generous?
I'd be ok with that, but wondered about one thing: that zone is shared
between the golang builder and an LLVM buildbot. The latter is
restricted to 8 parallel jobs, and I'd been under the impression that
the golang one would work with just 4 jobs. However, recently I've
often seen way more than those 4 jobs. Have I just been deluded about
that restriction, or has it somehow been lost?
|
While running tests, a go builder will happily grab whatever resources it has available. Are you doing anything to restrict it to 4 jobs? |
Hi Ian,
While running tests, a go builder will happily grab whatever resources it
has available. Are you doing anything to restrict it to 4 jobs?
Not so far. When I inherited the builder, I was under the (obviously
wrong) impression that this had been done in the builder config,
similarly to buildbot for LLVM. So there's no similar facility
available?
I should be able to use CPU shares to achieve this. Setting up a
separate zone just for the golang builder seems like a waste of effort
(and IPv4 addresses).
|
Good point, I think the builder config will set |
Ian Lance Taylor ***@***.***> writes:
Good point, I think the builder config will set `GOMAXPROCS`, which will
limit the resource usage of each specific test, but then you will have 4
tests run in parallel and each test will happily run 4 tests in parallel.
At least, I think that is how it works; my apologies if I'm getting this
wrong.
As a first step, I've now set GOMAXPROCS=8 for the builder to see if
this helps. The last runtime timeout happened before that change,
though. We'll have to see if this is enough.
In parallel, I'm also running the runtime test 10000 times in a loop on
identical hardware (although on bare metal instead of a kernel
zone/VM). Of the roughly 4000 runs completed so far, 6 have taken 2+
minutes rather than the usual 16-18 seconds. I missed the
-test.timeout=3m, unfortunately, so all tests PASSed.
Unfortunately, I couldn't easily see which subtest is timing out,
otherwise I could restrict the testing to just that one if it's always
the same one.
|
Appears not to be:
2022-03-29T16:24:51-a2baae6/solaris-amd64-oraclerel |
|
|
|
Change https://go.dev/cl/408701 mentions this issue: |
For golang/go#52653. Updates golang/go#51443. Change-Id: Iaea8fab13ed979e54c827f0f3c4d705bdaff4ee4 Reviewed-on: https://go-review.googlesource.com/c/build/+/408701 Reviewed-by: Alex Rakoczy <[email protected]> Auto-Submit: Bryan Mills <[email protected]> Run-TryBot: Bryan Mills <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
There have apparently not been any more of these failures since May. I'm calling this obsolete. |
Change https://go.dev/cl/465156 mentions this issue: |
I had initially added known issues fairly aggressively in order to use them to reduce noise in 'greplogs -triage'. Now that we are using 'watchflakes' for triage, that noise reduction is no longer important (the failures are already clustered to their respective known issues), and having greyed-out cells on the dashboard makes new regressions too easy to miss. Concretely: - golang/go#42212 is mostly specific to x/net at this point (as golang/go#57841) - There have been no failures matching golang/go#51001 since October. - golang/go#52724 has been so rare lately that we hadn't yet added a 'watchflakes' pattern for it. - There have been no failures matching golang/go#51443 since May. - There have been no failures matching golang/go#53116 or golang/go#53093 since I enabled 'watchflakes' for the builder in December. - The linux-amd64-perf builder seems to be passing consistently for x/benchmarks and x/tools, so there is no need to refer to golang/go#53538 to explain failures on it. Change-Id: Ia16db2a23e5fa037a299f1f56fb26f1cf84521e1 Reviewed-on: https://go-review.googlesource.com/c/build/+/465156 Reviewed-by: Dmitri Shuralyov <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Run-TryBot: Bryan Mills <[email protected]> Auto-Submit: Bryan Mills <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Found new dashboard test flakes for:
2023-04-07 15:12 solaris-amd64-oraclerel go@39986d28 runtime.TestFakeTime (log)
|
Found new dashboard test flakes for:
2023-04-11 20:25 solaris-amd64-oraclerel go@28480216 runtime.TestCgoExternalThreadSIGPROF (log)
2023-04-11 20:25 solaris-amd64-oraclerel go@28480216 runtime.TestCgoExecSignalMask (log)
2023-04-11 20:25 solaris-amd64-oraclerel go@28480216 runtime.TestSigStackSwapping (log)
|
In triage, it looks to us like all the tests are just timing out. The builder just seems slow. There might be one culprit. Verbose test output would tell us which test cases are slow. CC @golang/solaris |
I forgot that @golang/solaris is empty... CC @rorth I suppose? |
The C toolchain on this builder seems particularly slow. I wonder if just raising the |
"Bryan C. Mills" ***@***.***> writes:
The C toolchain on this builder seems particularly slow. I wonder if just
raising the `GO_TEST_TIMEOUT_SCALE` for the builder might resolve things.
The buildlet is running in a kernel zone (VM), and for some reason some
of the tests run extremely slow there, compared to bare metal on an
identical server or the host system.
For the time being, I've increased GO_TEST_TIMEOUT_SCALE to 4 as a
workaround.
Besides, I hope to migrate the buildlet to a different system (5 year
old 2 x Xeon Gold 5120 instead of 12 year old 4 x Xeon E7540) in the
coming months.
|
Duplicate of #60152 |
greplogs --dashboard -md -l -e '(?ms)\Asolaris-amd64-oraclerel.*^panic: test timed out.*FAIL\s+runtime' --since=2022-01-01
2022-03-01T21:27:42-b0db2f0/solaris-amd64-oraclerel
2022-02-28T19:00:23-b33592d/solaris-amd64-oraclerel
2022-01-08T00:24:25-90860e0/solaris-amd64-oraclerel
2022-01-06T23:39:43-042548b/solaris-amd64-oraclerel
@golang/runtime: the builder appears to only run the
-short
tests. Is there something we can feasibly do to make-short
mode shorter?@golang/release, @rorth: would it make sense to set
GO_TEST_TIMEOUT_SCALE
on this builder to make the timeouts more generous?The text was updated successfully, but these errors were encountered: