-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/tools/go/internal/gcimporter: frequent failures without useful output #38415
Comments
Is it possible that this test is running out of memory or timing out? I'm not sure how to debug this failure because the test actually always prints those |
Maybe! @dmitshur has tracked down several other OOM-related builder failures recently; he might have more experience with the symptoms of OOM test failures. |
The TestBExportData_stdlib test always reported these two errors (I believe they are expected) and the test is expected to pass (I don't recall why the output is reported). It would probably be a good to track down where this happens, verify the expected output and silence it. |
I'm not sure if this is OOM-related. In local testing, I always get non-empty output for that, even when the process is terminated by the kernel instead of the runtime:
Is it possible that the test is somehow invoking |
I've tried this on several machines and it shows the expected output and then says PASS. But when it fails shouldn't the output report which test failed?
|
It should, but without the |
In all the failure logs above, it is only printing the output once, which would indicate it executes TestBExportData_stdlib but doesn't get to TestIExportData_stdlib. If you suspect an OOM or other kernel failure, TestVeryLongFile seems a likely candidate. It's curious to me that it is only failing on the power9 and not other power machines. I will try running it multiple times to see if I can reproduce it. |
Given the very high failure rate (~160 failures in April alone), I'm marking this as a release-blocker for 1.17 via #11811. (CC @golang/release) |
I wonder if this failure mode is actually an out-of-memory condition (compare #45931) combined with missing error reporting on some path. |
Weekly check-in: this needs to be investigated before beta 1. |
Looked into this a bit today. Like Bryan, I tried to repro the silent failure mode via various ways of terminating the process. All of them (even os.Exit) resulted in some form of output on my machine. I was able to repro on gomote, twice, then succeeded the next ~20 times I tried, so wasn't able to iterate on debugging. Probably a next step would be to SSH into one of the machines and try dmesg. Note that most of the non-ppc build failures above are not so mysterious: test timeouts or segfaults, the latter on openbsd (which is #36563). However, there are at least some non-ppc build failures that exhibit the silent failure mode (such as 2021-05-07T20:56:39-f05e912/linux-arm64-packet). Also, note that the linux-ppc64le-power9osu machines, per their documented specs, are pretty large. |
Ping; any progress? |
We're currently trying to debug the test failure using https://golang.org/cl/327990. It's hard to do because it doesn't reproduce locally, and we can't actually tell which test is failing. |
Friendly reminder that the RC is imminent. |
I agree with that assessment. I think 7 failures per month is still too high long-term, but it's far lower than the failure rate in May. I think it's good enough not to be a release-blocker at this point. |
An interesting clue from https://build.golang.org/log/f68a7c2dcb411c0067dd3bd2489f9d191072383a:
|
I've been trying to reproduce this problem on a few systems. I can get it to consistently fail on 2 power9's but not on another. The the one where it passes has an older kernel. So far I can't get it to fail on a power8. I also tried building the test using go1.17 and it doesn't fail on the power9s when using go1.17 either. On one of the power9's where it fails consistently, here is the output log, with most of the logged information about valid imports removed. The error about constraints appears to be causing the problem.
|
@laboger are you testing x/tools@master using go@master? The x/tools gcimporter has been updated to support generics, so that test failure looks like it came from an older version of x/tools. |
@findleyr You are correct. I didn't have the latest. Now if I try on various systems I have access to, I am not able to reproduce a failure with this test, even when using the same configuration as the build machines. |
One of the recent ppc64le p9 failures has something similar in the log as what happened on arm64:
|
I suspect that this is an awkward side-effect of running out of RAM in memory-intensive tests (#33959). (It would be nice if the failure mode of running out of RAM for this test were clearer, but that doesn't appear to be directly related to |
Marking WaitingForInfo to remind myself to check for further unexplained failures after CL 193181. |
x/tools/go/loader by default prints errors to stderr. TestBExportData_stdlib and TestIExportData_stdlib intentionally load packages with errors, resulting in spurious (and confusing) output from passing tests when the test binary fails for other reasons. Suppress these spurious errors by setting each tests's types.Config to print errors to the test log instead. For golang/go#38415 Change-Id: I93fee06c4141bb4c15bd285844668df6eec44892 Reviewed-on: https://go-review.googlesource.com/c/tools/+/360914 Trust: Bryan C. Mills <[email protected]> Run-TryBot: Bryan C. Mills <[email protected]> gopls-CI: kokoro <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Robert Findley <[email protected]>
Change https://golang.org/cl/360915 mentions this issue: |
Change https://golang.org/cl/360914 mentions this issue: |
It does not appear to be needed any more. For golang/go#38415 Change-Id: I5c9525a96606df93c58cc15a4cb4281f95b93902 Reviewed-on: https://go-review.googlesource.com/c/tools/+/360915 Trust: Bryan C. Mills <[email protected]> gopls-CI: kokoro <[email protected]> Run-TryBot: Bryan C. Mills <[email protected]> Reviewed-by: Robert Findley <[email protected]> TryBot-Result: Go Bot <[email protected]>
https://build.golang.org/log/50ebd814481ff8a3c0976a9ec32602bdca86e185
https://build.golang.org/log/4f4af454dd05bbeccf2dd92c3d045a5d463f6889
https://build.golang.org/log/aeb6a5acd91e7cea78a93621877395caea90f785
https://build.golang.org/log/f4f7db7b08ee567d0086336920d3faf69363b080
(CC @griesemer, @stamblerre for
gcimporter
.)These sorts of patterns would be a lot easier to spot if
fetchlogs
worked on thex
repos (#35515; CC @andybons for prioritization).The text was updated successfully, but these errors were encountered: