-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: runtime: support tracking goroutine ancestor tracebacks #22289
Comments
Change https://golang.org/cl/70993 mentions this issue: |
@ianlancetaylor I totally goofed on https://go-review.googlesource.com/#/c/70992/. Abandoned that change and followed up here. |
CC @aclements |
Any updates on this? |
It does seem like the storage costs could get rather high. And it also seems like the output could get quite confusing for a large program. But on the other hand, it's optional. You have a clear proposal and a change, but it might help to have a clearer use case. What is the kind of problem that would be more easily solved with this additional information? |
So the use case is derived from an issue we hit in production with using a library like https://github.com/10gen/mgo. The readers and writers of MongoDB operation cursor are goroutines that are created from other goroutines. The issue we hit was that we were connecting to MongoDB on behalf of customers in one part of code as well as connecting to our own MongoDB for our backing database. It turned out a leak was present in our customer code. Nailing this down took a while but this new option would allow us to immediately identify the customer code that "spawned" that goroutine. Another example which is totally not real is the usage of Finally, complete visibility of a server that isn't behaving well via this option is beneficial in a situation where you have no clue what could be wrong. JVM thread dumps have been very beneficial for me for this reason in the past. |
@ianlancetaylor How is that for a use case? Hoping to get this in during the 1.11 cycle :) |
Out of curiosity, how much would it help to simply know the ID of the goroutine that spawned the goroutine? If the spawning goroutine hasn't exited, that would let you find it in the stack trace and at least see what it's currently doing. That's something we could easily do all the time.
Could you elaborate on this? I didn't think JVM thread dumps included the stack trace of spawning threads. |
For the first question: It would only help marginally since in this use case, the concern is not with what the goroutine is doing now, but with what it was doing at the time of the child goroutine being spawned. For the second question: Correct, the JVM thread dumps do not do this. I was just saying how having more information can make debugging unknown issues easier in some scenarios like the one I outlined above. From Go I have seen and written, it seems more common to liberally spawn goroutines from other goroutines than it does in a JVM based language like Java. I'd imagine a similar type of pattern being useful for event loops as well where you would see where each event originated from and so on. The absence of this can make callback oriented code very hard to debug in certain cases when the originating code is the source of the issue. |
Thanks. A few more questions: I see in your prototype that you capture the entire textual stack trace, including arguments. Would it be okay to just capture the PCs and not the arguments? This would be significantly less data, since we could delay symbolizing those until printing the traceback. How far up the "goroutine stack" do you need to go? There are obviously potential asymptotic problems with capturing all the way up to main. What if, for example, the number passed to GODEBUG=tracebackancestors=x was the number of parent goroutines to track? This would bound the more troubling dimension of the space requirements, while still giving you a fair amount of debugging power. |
Yeah, only PCs are just fine. There's honestly probably not too much value in having the arguments there as those are much more likely to refer to garbage data. Having the ability to pass a depth limit seems like a good idea. Are you open in accepting a value like -1 to make it unbounded or would you prefer the user to just pass a high value in that case? |
I think I'd rather the user just pass a high value. For example, even if the user passes, say, 100, that's practically infinite for debugging purposes, but it still bounds the damage that can be done by a huge chain of goroutines. Would you mind updating your proposed CL to do just the PCs and add the bound? It looked like it was already in reasonably good shape. |
Yeah makes sense. Sure. Will add the bound as well as the deferred symbolization of pcs. Will result in a little more net added code since i was relying on the simple backtraces. I'm going to work on this today since I'm going to mostly be out this Wednesday-Friday (PAX East!). |
Thanks! I suspect you're right that it will be slightly more code, but I don't think by much. Printing will take a little more effort, since I don't think we have a way to print traceback from a PC slice right now. |
…h GODEBUG="tracebackancestors=N" Currently, collecting a stack trace via runtime.Stack captures the stack for the immediately running goroutines. This change extends those tracebacks to include the tracebacks of their ancestors. This is done with a low memory cost and only utilized when debug option tracebackancestors is set to a value greater than 0. Resolves golang#22289 Change-Id: I527b0af222bb3c757c7fd30363e889cec51d8f45
…"tracebackancestors=1" Currently, collecting a stack trace via runtime.Stack captures the stack for the immediately running goroutines. This change extends those tracebacks to include the tracebacks of their ancestors. This is done with a low memory cost and only utilized when debug option tracebackancestors is set to 1. Resolves golang#22289 Change-Id: I7edacc62b2ee3bd278600c4a21052c351f313f3a
…h GODEBUG="tracebackancestors=N" Currently, collecting a stack trace via runtime.Stack captures the stack for the immediately running goroutines. This change extends those tracebacks to include the tracebacks of their ancestors. This is done with a low memory cost and only utilized when debug option tracebackancestors is set to a value greater than 0. Resolves golang#22289 Change-Id: I7edacc62b2ee3bd278600c4a21052c351f313f3a
@aclements put up a new patchset just in time before travel! Not much more code was added. New backtraces look like this for the same sample code above:
Diff between previous and new method:
|
Per comments from @aclements after discussion with Go runtime/compiler team, this is accepted with a bound on the pain it can cause (as in discussion above). |
@rsc great! Let me know if more modifications need to be made to the CL or if the team will just implement on its own. |
Currently, collecting a stack trace via runtime.Stack captures the stack for the
immediately running goroutines. I'd like to propose a change that extends those tracebacks to include the tracebacks of their ancestors. The goal of this would be to use as little memory as possible while accounting for the cost with a
GODEBUG
option oftracebackancestors
set to 1.Description of the option
Approach
As this option is enabled at startup the of the runtime, any new
g
that is created vianewproc
will reference thecallergp
in order to derive a trace and attach it to the newg
. It will also copy thecallergp
's traces as well. This has a non-negligible cost to store as goroutines grow to a large number depending on the source of the goroutines.Example
The following file demonstrates the output of the proposed change with multiple goroutine invocations:
Running it with the flag on produces:
Assumptions
There is no problem allocating memory on the heap from within a
systemstack
call with respect to garbage collection (I haven't looked far enough to prove that we're still in a typical goroutine while doingnewproc1
.The text was updated successfully, but these errors were encountered: