-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/trace: crash during traceAdvance when collecting call stack for cgo-calling goroutine #69085
Comments
Thanks for reporting! I'm a bit surprised though. Stack movement during a syscall (and cgo call) is forbidden because pointers to stack-allocated values might be passed to the syscall. But then why is |
Actually, why is this taking the |
Indeed. I had initially suspected and then dismissed
Do you know if the right M is checked in |
That detail about being in But the thing is, in One thing that's a little fishy is Still just thinking out loud at this point... |
Ah! Good catch. Yeah, I think this is plausibly what's happening, but then (and I mean this rhetorically) why is this code valid at all? |
Oh my goodness, I see it. |
@gopherbot Please open a backport issue for Go 1.23. This bug is causing random crashes when using the tracer that have no workaround. It is also a regression that only exists in Go 1.23. The fix is also small and simple, but admittedly subtle. |
Backport issue(s) opened: #69087 (for 1.23). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases. |
Change https://go.dev/cl/608775 mentions this issue: |
Change https://go.dev/cl/608835 mentions this issue: |
…inter As of CL 580255, the runtime tracks the frame pointer (or base pointer, bp) when entering syscalls, so that we can use fpTracebackPCs on goroutines that are sitting in syscalls. That CL mostly got things right, but missed one very subtle detail. When calling from Go->C->Go, the goroutine stack performing the calls when returning to Go is free to move around in memory due to growth, shrinking, etc. But upon returning back to C, it needs to restore gp.syscall*, including gp.syscallsp and gp.syscallbp. The way syscallsp currently gets updated is automagically: it's stored as an unsafe.Pointer on the stack so that it shows up in a stack map. If the stack ever moves, it'll get updated correctly. But gp.syscallbp isn't saved to the stack as an unsafe.Pointer, but rather as a uintptr, so it never gets updated! As a result, in rare circumstances, fpTracebackPCs can correctly try to use gp.syscallbp as the starting point for the traceback, but the value is stale. This change fixes the problem by just storing gp.syscallbp to the stack on cgocallback as an unsafe.Pointer, like gp.syscallsp. It also adds a comment documenting this subtlety; the lack of explanation for the unsafe.Pointer type on syscallsp meant this detail was missed -- let's not miss it again in the future. Now, we have a fix, what about a test? Unfortunately, testing this is going to be incredibly annoying because the circumstances under which gp.syscallbp are actually used for traceback are non-deterministic and hard to arrange, especially from within testprogcgo where we don't have export_test.go and can't reach into the runtime. So, instead, add a gp.syscallbp check to reentersyscall and entersyscallblock that mirrors the gp.syscallbp consistency check. This probably causes some miniscule slowdown to the syscall path, but it'll catch the issue without having to actually perform a traceback. For #69085. Fixes #69087. Change-Id: Iaf771758f1666024b854f5fbe2b2c63cbe35b201 Reviewed-on: https://go-review.googlesource.com/c/go/+/608775 Reviewed-by: Nick Ripley <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Reviewed-by: Cherry Mui <[email protected]> (cherry picked from commit 54fe0fd) Reviewed-on: https://go-review.googlesource.com/c/go/+/608835
After upgrading to Go 1.23, we started seeing crashes like the following from the execution tracer:
The full stack contains company code I'd prefer not to share broadly. I've dropped most of it, but I left the goroutine for which it appears
traceAdvance
is collecting a call stack.I believe this is a problem with the
syscallbp
value introduced by https://go.dev/cl/580255. ThetraceStack
function in the above traceback is using that value for unwinding here. From a core dump, we can see thatgp.syscallsp
andgp.sched.sp
differ significantly fromgp.syscallbp
andgp.sched.bp
for the goroutine:I suspect the issue here is that
copystack
needs to adjustgp.syscallbp
, like it does forgp.sched.bp
.cc @mknyszek
The text was updated successfully, but these errors were encountered: