-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/go: tests timing out on aix builder #29065
Comments
First, I've opened issue 29078 which might be the problem with However, currently the builder is stuck in a
(I don't have the rest of the backtrace, as gdb isn't fully working yet) I'm not sure this is related to the PATH problem. A few crashes that occurred seems linked with a problem inside the runtime: Morever, is there a way to manually trigger a build ? I could make some experiments to figuring out what's wrong. If not, I'll let the builder stuck until PATH is changed. This will avoid unnecessary fails and provides more tests that everything was related to PATH afterwards. |
It appears that the I did try running I tried running Line 882 in 1adbb2b
That suggests that either there is a serious hang in |
Moreover, the new package |
That's frustrating, but if you have a POSIX |
Yes, it do have it but |
That seems dangerous: at least on other platforms, FWIW, we rejected a similar implementation for Solaris (#24684). We should probably remove the one for AIX before it gets released: we can always add it back later if needed. I'll open a separate issue. |
Filed as #29084. |
Ok thanks. |
By the way, aix builder is released. If you wish to do some experiments, you can. I'll provide the patch to use Posix version of lockedfile on AIX tomorrow. |
Great, thanks! Let's rename the file to |
Change https://golang.org/cl/152397 mentions this issue: |
AIX doesn't provide flock() syscall, it was previously emulated by fcntl calls. However, there are some differences between a flock() syscall and a flock() using fcntl. Therefore, it's safer to remove it and just provide FcntlFlock. Thus, lockedfile implementation must be moved to use FcntlFlock on aix/ppc64. Updates #29065. Fixes #29084. Change-Id: Ic48fd9f315f24c2acdf09b91d917da131a1f2dd5 Reviewed-on: https://go-review.googlesource.com/c/152397 Reviewed-by: Tobias Klauser <[email protected]> Reviewed-by: Bryan C. Mills <[email protected]>
Change https://golang.org/cl/152457 mentions this issue: |
I've made some investigations and the timeouts seem to come from different issues. The main problem is the runtime being stuck in an infinite loop during
(Sadly, I don't have more backtrace as gdb isn't fully working on aix/ppc64...) I've also found out different crashes which seem to be related to P objects:
I think every think is linked to AIX syscalls that are made by calling 'asmcgocall' and C functions. How does GC know that a thread is in a C syscall ? It's for possible that I've forgotten to add a GOOS="aix" somewhere to handle syscalls correctly, even if I've check several times. Morever, we are aware that the test machine has a really really slow local disk. That's why we have created a ramdisk to store the build folder. The person who have provided the machine is trying to find a way to fix this slowness. However, it might be interesting to change TMPDIR and GOTMPDIR to /ramdisk0/tmp. This might accelerate the whole process. I still don't understand why these timeouts occur everytime within the builder but never when I manually launch all.bash... Are there many differences between both ? If anyone has ideas about the origin of these bugs, you're welcome ! /cc @aclements |
Updates golang/go#29065 Fixes golang/go#29078 Change-Id: Ifa9355c9dc988a460b6198913431647ec2c5e6ac Reviewed-on: https://go-review.googlesource.com/c/152457 Reviewed-by: Brad Fitzpatrick <[email protected]>
Tests are still timing out even after the new PATH CL 152457 and the remove of Flock in cmd/go/internal/lockedfile CL 152397. There are still some problems with the last one, but it does trigger an error and not a timeout. I've been searching for a few days and still don't understand what's wrong with the builder. How far the environment or the execution are different from a classic ./all.bash ? The timeout during cmd/go tests seems to always be related to TestScript/script_wait test. sleep process seems to never be killed... Maybe, there is something wrong with signals on AIX. But it's really strange that it never occurs while running ./all.bash. There is no error handler in script_test.go:interruptProcess(), I'll submit a patch which might give us more information about why sleep isn't killed. Edit: Is there a way to try a commit only on one builder ? Adding the error handler isn't that simple to implement because of channels. It might be a little too heavy for just a supposition on one GOOS... The patch is ready but if I can only try it on aix/ppc64 that could be better. |
I've found why sleep process isn't kill at the end of script_wait. |
I have no memory of doing such a thing on purpose (or accident). |
Well, it seems that this sigignore came from how we were launching our builder. It should be ok now. We'll see if it works during the next commit ! |
If some thread is stuck on |
The only tests I'm seeing timing out lately are for the |
The
cmd/go
tests are timing out on theaix/ppc4
builder:https://build.golang.org/log/86480370eb2fba22d0f47458ad1fecf8ce9beea7
It looked similar to the Plan 9 failures reported in #29033 at first, but @bcmills suggested to track this in a different issue as the
lockedfile
implementation forplan9
is different from the other platforms.@Helflym mentioned in #29033 (comment) that the timeout is only triggered on the builder, but not on a fresh install fetched from git. So there might be some issue with the builder.
Also, it seems the last few commits haven't been built by the aix/ppc64 builder which also indicates some issue with the
aix
builder./cc @bradfitz
The text was updated successfully, but these errors were encountered: