Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: support unwinding logical cgo stack transitions using frame pointers #58378

Closed
qmuntal opened this issue Feb 7, 2023 · 26 comments
Closed
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Debugging FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@qmuntal
Copy link
Member

qmuntal commented Feb 7, 2023

Note: follow-up issue for #40044 (comment)

When using generic third-party debuggers (e.g. dbg or windbg) to debug a amd64 Go binary, it is currently not possible to know which piece of Go code started a cgo call once runtime.asmcgocall has switched from the Go routine stack to the system stack. This is because asmcgocall is marked as NOFRAME, so the frame pointer is not set on the function prologue, confusing the debugger unwinding process.

Notice that this is only true for amd64 binaries, as all other architectures supporting frame pointers do not mark asmcgocall as NOFRAME. Yet, at least windows/arm64 neither supports cgo stack unwinding because asmcgocall calls asmstdcall, which is marked as NOFRAME.

IMO we should at least make sure that asmcgocall and related-auxiliary functions can be unwound from the system stack up to the Go stack by placing the right frame pointer in their prologue. I'm leaving out of this discussion other functions that perform a system stack on purpose, as they are not so fundamental as asmcgocall and correctly setting the frame pointer of low-level assembly functions is not straight-forward.

I'm particularly interested on this because on Windows, WinDbg uses the frame pointer to unwind the stack (see #57302) to unwound the stack of a debugged binary or crash minidump (#57441, #49471, #20498).

It would also help fixing at least #40044 and #57698.

@golang/runtime @golang/compiler

@qmuntal qmuntal added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Debugging compiler/runtime Issues related to the Go compiler and/or runtime. labels Feb 7, 2023
@mknyszek mknyszek added this to the Backlog milestone Feb 15, 2023
@mknyszek
Copy link
Contributor

@qmuntal I've assigned this to you for now in triage; do you plan to work on it? (Feel free to unassign and leave a "help wanted" tag.)

@qmuntal
Copy link
Member Author

qmuntal commented Feb 16, 2023

@qmuntal I've assigned this to you for now in triage; do you plan to work on it?

Yep. In fact I've already submitted a bunch of CLs (not merged yet) that will make this issue easier to implement (although their are valuable on their own). Will update them referencing this issue.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/466835 mentions this issue: runtime: remove unnecessary NOFRAME flags on windows

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/466396 mentions this issue: runtime: use explicit NOFRAME on netbsd/amd64

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/466457 mentions this issue: runtime: use explicit NOFRAME on plan9/amd64

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/466455 mentions this issue: runtime: use explicit NOFRAME on openbsd/amd64

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/466316 mentions this issue: runtime: use explicit NOFRAME on linux/amd64

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/466458 mentions this issue: runtime: remove implicit NOFRAME heuristic support

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/466395 mentions this issue: runtime: use explicit NOFRAME on freebsd/amd64

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/466355 mentions this issue: runtime: use explicit NOFRAME on dragonfly/amd64

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/466456 mentions this issue: runtime: use explicit NOFRAME on solaris/amd64

gopherbot pushed a commit that referenced this issue Feb 22, 2023
This CL marks some linux assembly functions as NOFRAME to avoid relying
on the implicit amd64 NOFRAME heuristic, where NOSPLIT functions
without stack were also marked as NOFRAME.

Updates #58378

Change-Id: I7792cff4f6e539bfa56c02868f2965088ca1975a
Reviewed-on: https://go-review.googlesource.com/c/go/+/466316
Reviewed-by: Than McIntosh <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
Run-TryBot: Quim Muntal <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
gopherbot pushed a commit that referenced this issue Feb 23, 2023
This CL marks some dragonfly assembly functions as NOFRAME to avoid
relying on the implicit amd64 NOFRAME heuristic, where NOSPLIT functions
without stack were also marked as NOFRAME.

Updates #58378

Change-Id: I832a1a78d68a49f11df3b03fa9d50d4796bcac03
Reviewed-on: https://go-review.googlesource.com/c/go/+/466355
Run-TryBot: Quim Muntal <[email protected]>
Reviewed-by: Than McIntosh <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
gopherbot pushed a commit that referenced this issue Feb 24, 2023
This CL marks some freebsd assembly functions as NOFRAME to avoid
relying on the implicit amd64 NOFRAME heuristic, where NOSPLIT functions
without stack were also marked as NOFRAME.

Updates #58378

Change-Id: Ibd00748946f1137e165293df7da73278cb673bbd
Reviewed-on: https://go-review.googlesource.com/c/go/+/466395
Reviewed-by: Than McIntosh <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
Run-TryBot: Quim Muntal <[email protected]>
gopherbot pushed a commit that referenced this issue Feb 24, 2023
This CL marks some netbsd assembly functions as NOFRAME to avoid
relying on the implicit amd64 NOFRAME heuristic, where NOSPLIT functions
without stack were also marked as NOFRAME.

While here, and thanks to CL 466355, `asm_netbsd_amd64.s` can
be deleted in favor of `asm9_unix2_amd64.s`, which makes better
use of the frame pointer.

Updates #58378

Change-Id: Iff554b664ec25f2bb6ec198c0f684590b359c383
Reviewed-on: https://go-review.googlesource.com/c/go/+/466396
Reviewed-by: Than McIntosh <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
Run-TryBot: Quim Muntal <[email protected]>
gopherbot pushed a commit that referenced this issue Feb 27, 2023
This CL marks some openbsd assembly functions as NOFRAME to avoid
relying on the implicit amd64 NOFRAME heuristic, where NOSPLIT functions
without stack were also marked as NOFRAME.

Updates #58378

Change-Id: I993549df41a93255fb714357443f8b24c3dfb0a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/466455
Reviewed-by: Keith Randall <[email protected]>
Run-TryBot: Quim Muntal <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
Reviewed-by: Joel Sing <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
gopherbot pushed a commit that referenced this issue Feb 27, 2023
This CL marks some solaris assembly functions as NOFRAME to avoid
relying on the implicit amd64 NOFRAME heuristic, where NOSPLIT functions
without stack were also marked as NOFRAME.

While here, I've reduced the stack usage of runtime·sigtramp by
16 bytes to compensate the additional 8 bytes from the stack-allocated
frame pointer. There were two unused 8-byte slots on the stack, one
at 24(SP) and the other at 80(SP).

Updates #58378

Change-Id: If9230e71a8b3c72681ffc82030ade6ceccf824db
Reviewed-on: https://go-review.googlesource.com/c/go/+/466456
Reviewed-by: Than McIntosh <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Run-TryBot: Quim Muntal <[email protected]>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/472195 mentions this issue: runtime: remove NOFRAME from asmcgocall and systemstack

@felixge
Copy link
Contributor

felixge commented Feb 28, 2023

👋 It's great to see your CL above. I'm currently working on adding frame pointer unwind support to the execution tracer (much faster than gentraceback), see #16638 . systemstack() not pushing frame pointers is an issue I also encountered, causing the caller frame of systemstack to be missing.

@qmuntal
Copy link
Member Author

qmuntal commented Feb 28, 2023

It was been quite a journey getting to that CL 🥲, glad that it also helps other effords other than my goal of enabling SEH unwinding on Windows.

Now that I almost managed to get rid of the implicit NOFRAME rule, it would be easier to have frame pointers in the runtime functions.

gopherbot pushed a commit that referenced this issue Mar 1, 2023
This CL removes some NOFRAME flags on Windows assembly files
for several reasons:

- windows/386 does not use a frame pointer
- Leaf frameless functions already skip the frame pointer
- Some non-leaf functions do not contain enough dragons to justify
not using the frame pointer

Updates #58378

Change-Id: I31e71bf7f769e1957a4adba91778da5af66ce1e4
Reviewed-on: https://go-review.googlesource.com/c/go/+/466835
Reviewed-by: Keith Randall <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
Run-TryBot: Quim Muntal <[email protected]>
Reviewed-by: Dmitri Shuralyov <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
gopherbot pushed a commit that referenced this issue Mar 1, 2023
This CL marks some plan9 assembly functions as NOFRAME to avoid
relying on the implicit amd64 NOFRAME heuristic, where NOSPLIT functions
without stack were also marked as NOFRAME.

Updates #58378

Change-Id: Ic8c9ab5c1a0897bebc6c1419ddc903a7492a1b0a
Reviewed-on: https://go-review.googlesource.com/c/go/+/466457
TryBot-Bypass: Quim Muntal <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
gopherbot pushed a commit that referenced this issue Mar 1, 2023
All amd64 OSes already make use of the NOFRAME flag wherever is
required, so we can remove the frameless nosplit functions heuristic
code path.

Updates #58378

Change-Id: I966970693ba07f8c66da0aca83c23caad7cbbfe5
Reviewed-on: https://go-review.googlesource.com/c/go/+/466458
Reviewed-by: Cherry Mui <[email protected]>
Run-TryBot: Quim Muntal <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Than McIntosh <[email protected]>
@nightlyone
Copy link
Contributor

Please consider a release note about that semantic change that may surprise assembler code outside of the tree. @qmuntal

@qmuntal
Copy link
Member Author

qmuntal commented Mar 2, 2023

Please consider a release note about that semantic change that may surprise assembler code outside of the tree. @qmuntal

Sure

@aclements I'm having problems on CL 472195 trying to get TestTracebackSystemstack happy. It seems that gentraceback doesn't like the fact that systemstack now has a frame pointer. Manage to fix it, but without knowing why and probably by chance. As you are working on refactoring that gentraceback, could you provide some guidance on how to properly fix that?

@cherrymui
Copy link
Member

FWIW, I'm still not convinced that saving frame pointers is a good idea for stack transition functions like systemstack, at least not implicitly. If we want to use the frame pointer to express stack transition, I think we may want to be very explicit.

@qmuntal
Copy link
Member Author

qmuntal commented Mar 2, 2023

FWIW, I'm still not convinced that saving frame pointers is a good idea for stack transition functions like systemstack, at least not implicitly. If we want to use the frame pointer to express stack transition, I think we may want to be very explicit.

Your opinion worth alot 😸. What I don't understand is what could go wrong when saving the frame pointer on functions that change the stack. And, if there are caveats, what could be worse than not having any stack unwinding at all?

@cherrymui
Copy link
Member

cherrymui commented Mar 2, 2023

I'm not sure if it could go wrong, just that as the stack transition is special enough I think it is clearer that we do it more explicitly, with comments. It may also be simpler if we save the FP only when the stack transition actually happens, and not on the tail call case.

For the traceback failure (I didn't read the code or the failure message, just guessing), could it be that the frame layout of systemstack_switch need to match? Also the comment at gosave_systemstack_switch may be helpful.

@aarzilli
Copy link
Contributor

aarzilli commented Mar 3, 2023

I remembered that one of the problems we had with delve's stack unwinder was the morestack/newstack path. Maybe you have already accounted for it, I thought it mention it just in case.

@felixge
Copy link
Contributor

felixge commented Mar 3, 2023

the stack transition is special enough I think it is clearer that we do it more explicitly, with comments

Not sure I follow. Are you talking about some sort of compiler directive like this?

//go:emitfp
systemstack(func() { ... })

Sorry if this makes no sense, just trying to make sure I understand your comments 😅 .

If we want to use the frame pointer to express stack transition, I think we may want to be very explicit.

FWIW, while working on CL 463835 I found that frame pointer unwinding through systemstack seems to be working already because systemstack doesn't clobber the BP register, so it gets saved by the callee of systemstack. This causes an unwinder to miss the caller of systemstack, but the stack transition itself seems to be working.

@qmuntal
Copy link
Member Author

qmuntal commented Mar 3, 2023

For the traceback failure (I didn't read the code or the failure message, just guessing), could it be that the frame layout of systemstack_switch need to match? Also the comment at gosave_systemstack_switch may be helpful.

Thanks for the advice, the frame layout of systemstack_switch did have to match. Tests are green now without having to modify gentraceback.

I'm not sure if it could go wrong, just that as the stack transition is special enough I think it is clearer that we do it more explicitly, with comments.

Have the same questions as @felixge, plus one data point more: arm64 versions of asmcgocall and systemstack already place the frame pointer on the stack.

It may also be simpler if we save the FP only when the stack transition actually happens, and not on the tail call case.

I tried this at some point, but it would create another (and probably bigger) issue: the FP would be stored outside the function prologue, which is non-standard. Doing so will make it difficult to programmatically generate DWARF unwinding info using the FP instead of stack deltas, and AFAIK will not work on Windows with SEH unwinding info, which requires the FP to be on the prologue, and the prologue is limited to 255 bytes.

@cherrymui
Copy link
Member

//go:emitfp
systemstack(func() { ... })

No. I was just thinking explicitly saving FP in the assembly code. If you all prefer the implicit saving, that is fine. But I think somewhere in the code we should have a comment mentioning that the frame pointer is chained through the stack transition.

arm64 versions of asmcgocall

Not sure if that is actually correct (https://golang.org/cl/241080). It does do the implicit saving, though.

@felixge
Copy link
Contributor

felixge commented Mar 4, 2023

No. I was just thinking explicitly saving FP in the assembly code. If you all prefer the implicit saving, that is fine.

Ah, got it! I don't have a preference on that. If it's better to do this "by hand" in the assembly rather than letting the compiler generate it, that works for me. As long as the frame pointer gets pushed I'm happy 😄.

gopherbot pushed a commit that referenced this issue Mar 13, 2023
This CL removes the NOFRAME flag from runtime.asmcgocall,
runtime.systemstack and runtime.mcall so the compiler can place
the frame pointer on the stack.

This will help unwinding cgo stack frames, and might be all what's
needed for tools that only use the frame pointer to unwind the stack.
That's not the case for gdb, which uses DWARF CFI, and windbg,
which uses SEH. Yet, having the frame pointer correctly set lays
the foundation for supporting cgo unwinding with DWARF CFI and SEH.

Updates #58378

Change-Id: I7655363b3fb619acccd9d5a7f0e3d3dec953cd52
Reviewed-on: https://go-review.googlesource.com/c/go/+/472195
Run-TryBot: Quim Muntal <[email protected]>
Auto-Submit: Michael Pratt <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
@qmuntal
Copy link
Member Author

qmuntal commented Apr 21, 2023

Closing this issue as done. There has been many CLs during go1.21 dev cycle to improve the frame pointer unwinding through the cgo transition, and there are even some tests verifying that there are no regressions in this area, i.e. @felixge's CL 474916.

Thanks everyone for the discussion and the reviews!

@qmuntal qmuntal closed this as completed Apr 21, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in Go Compiler / Runtime Apr 21, 2023
@qmuntal qmuntal modified the milestones: Backlog, Go1.21 Apr 21, 2023
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/501516 mentions this issue: doc/go1.21: NOFRAME heuristic changes

gopherbot pushed a commit that referenced this issue Jun 8, 2023
For #58378

Change-Id: I960b97f33a8bf29d3a9622b58d278544d0970a38
Reviewed-on: https://go-review.googlesource.com/c/go/+/501516
Reviewed-by: Cherry Mui <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Run-TryBot: Quim Muntal <[email protected]>
Reviewed-by: David Chase <[email protected]>
@golang golang locked and limited conversation to collaborators Jun 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Debugging FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

7 participants