-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: SIGSEGV in race + coverage mode #60825
Comments
Thanks, I will take a look. You wrote:
however the test debris looks like it is arm64 (I will assume this is an arm64 only failure). |
Yes, it was arm64, apologies. It is also worth noting that this wasn't a flake. It failed three times in a row. I'm running a new build again now, which I will be surprised if it doesn't fail again. |
So far unable to reproduce on a regular linux-arm64 gomote (after a couple thousand runs). Weird. |
Never mind, I can see the failure now. Something about the script test setup seems to be a factor. |
It seems to require With these files, this is ok:
But this fails:
I'm not sure why this isn't failing on the linux-arm64-longtest post-submit builder. Does that not use race mode? |
This is also a puzzle for me. |
I've debugged this for a bit and it looks as though the faulting load is happening at the very end of the BSS section; I think this is some sort of off-by-one error. If I take the test case in question and add in another dummy BSS variable, e.g. var QQQ [1000]int the failure mode goes away. So somehow I think in this specific case the coverage counter slab in question winds up as the very last thing in BSS, and we wind up indexing off the end by 1 (not sure how, but that is what it looks like). |
Regarding Without Though
This doesn't happen when it is enabled implicitly.
I believe the buildinfo string is in Marking as a release blocker. With the current evidence it isn't clear that this is necessary related to either coverage or race modes, that could just be the unlucky combination. |
One other weird thing that I can't explain here: when I run the faulting program with GOCOVERDEBUG=true, I get the following trace output
This is kind of obscure but "=+= 3: pk=2 visit live fcn {i=3 F0 NC1}" is showing that we don't hit a "live" function in the counter segment until slot 3. This should not be happening given the nature of the test program-- it doesn't have any dead functions, so the first element in the counter segment should be non-zero. This also seems to point to some sort of off-by-N error. I will spend a little time in the debugger to see if I can learn more. |
In the spirit of the snake-bit dog and fighting the previous war, I'd go looking for failure to save a register somewhere. |
From the stack trace
which loads from 0x119effc, which is a valid address, last 4 bytes in BSS.
But we fault at 4 bytes after it. Where does the 4 come from? Well, this code https://cs.opensource.google/go/go/+/master:src/runtime/race_arm64.s;l=351 checks for address validity and faults on invalid address. It (blindly) does an 8-byte load (!), which makes it go over the segment... I'll send a CL. |
Change https://go.dev/cl/503937 mentions this issue: |
Thanks Cherry! Good detective work, I don't think I would have spotted that. |
@gopherbot Please open backports for 1.19 and 1.20. This bug can cause crashes on seemingly arbitrary atomic operations in race mode. |
Backport issue(s) opened: #60844 (for 1.19), #60845 (for 1.20). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases. |
Change https://go.dev/cl/503976 mentions this issue: |
Change https://go.dev/cl/503977 mentions this issue: |
… in racecallatomic In racecallatomic, we do a load before calling into TSAN, so if the address is invalid we fault on the Go stack. We currently use a 8-byte load instruction, regardless of the data size that the atomic operation is performed on. So if, say, we are doing a LoadUint32 at an address that is the last 4 bytes of a memory mapping, we may fault unexpectedly. Do a 1-byte load instead. (Ideally we should do a load with the right size, so we fault correctly if we're given an unaligned address for a wide load across a page boundary. Leave that for another CL.) Fix AMD64, ARM64, and PPC64. The code already uses 1-byte load on S390X. Fixes #60844. Updates #60825. Change-Id: I3dee93eb08ba180c85e86a9d2e71b5b520e8dcf0 Reviewed-on: https://go-review.googlesource.com/c/go/+/503937 Run-TryBot: Cherry Mui <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Than McIntosh <[email protected]> Reviewed-by: David Chase <[email protected]> (cherry picked from commit 1a7709d) Reviewed-on: https://go-review.googlesource.com/c/go/+/503977 Auto-Submit: Dmitri Shuralyov <[email protected]> TryBot-Bypass: Dmitri Shuralyov <[email protected]> Reviewed-by: Austin Clements <[email protected]>
… in racecallatomic In racecallatomic, we do a load before calling into TSAN, so if the address is invalid we fault on the Go stack. We currently use a 8-byte load instruction, regardless of the data size that the atomic operation is performed on. So if, say, we are doing a LoadUint32 at an address that is the last 4 bytes of a memory mapping, we may fault unexpectedly. Do a 1-byte load instead. (Ideally we should do a load with the right size, so we fault correctly if we're given an unaligned address for a wide load across a page boundary. Leave that for another CL.) Fix AMD64, ARM64, and PPC64. The code already uses 1-byte load on S390X. Fixes #60845. Updates #60825. Change-Id: I3dee93eb08ba180c85e86a9d2e71b5b520e8dcf0 Reviewed-on: https://go-review.googlesource.com/c/go/+/503937 Run-TryBot: Cherry Mui <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Than McIntosh <[email protected]> Reviewed-by: David Chase <[email protected]> (cherry picked from commit 1a7709d) Reviewed-on: https://go-review.googlesource.com/c/go/+/503976 Auto-Submit: Dmitri Shuralyov <[email protected]> Reviewed-by: Austin Clements <[email protected]>
This failure occurred on a linux-amd64-longtest builder preparing for 1.21 RC1. I don't see any others like it in greplogs.
cc @thanm @golang/runtime
The text was updated successfully, but these errors were encountered: