runtime: use parent goroutine's stack for new goroutines #27345

josharian · 2018-08-29T17:39:06Z

This is a speculative idea from discussion with @aclements. Jotting it down so it doesn't get lost and to open discussion.

Spinning off many goroutines that grow their stack and then exit can be a source of performance pain. See e.g. #18138. One approach to fixing this is tracking and using statistics about stack growth behavior per spawn site.

Another possible approach is to:

switch immediately in the scheduler from parent goroutine to spawned goroutine
use the (free part of) the parent's stack as the spawn's stack
relocate the spawn's stack as needed

The idea here is that the parent goroutine's stack is likely to have plenty of free space, so if the spawn is short-lived, we can avoid doing any stack allocation for it at all, much less multiple stack growths.

If the spawn lives long enough that the parent goroutine starts running again, then this is worse. So it might be worth tracking some basic statistics on the fly to decide whether to do this.

aclements · 2018-09-06T21:58:41Z

One common pattern where this may be problematic is where a parent spawns a goroutine and blocks on receiving the result of that goroutine from an unbuffered channel. In the basic form of this design, this would always force a stack-rip. A few ways I can think to get around this:

Encourage people to use a single-element channel instead. Not so great for existing code.
Statically recognize this "send and exit" pattern and compile it into a combined operation that does them in the opposite order (the difference is unobservable). The value would still have to be buffered somewhere until the parent reached the channel receive. This may be tricky because there's no way to bound the amount of buffering necessary.
Let the parent keep running until it blocks (like today), and then start running the new goroutine on the parent's stack. We'd need somewhere to stash the goroutine's argument frame. Here we could at least bound the amount of buffering for the argument frames, since we could always fall back to spawning a full goroutine.

For solution 2 or 3, we could always do something like _defer objects, which are manually allocated and freed by the runtime.

Cilk (which inspired this idea) structurally enforces something like solution 2, but the spawning code always has to pre-arrange a place to put the result.

cherrymui · 2018-09-06T23:52:57Z

How about allocating the child goroutine's stack in the same allocation but not right next to the parent's stack? That is, suppose the parent has 100K allocated stack space, say (base, base+100K), and is currently using 60K. When it spawns the child, we can put the child's stack at, say, base+90K, and adjust the parent's stack bound. When either the parent's stack grows to 90K, or the child's stack grows to 10K, (or the parent fires up more goroutines which uses up the stack), relocate the stack.

This way, both goroutines are runnable (if not otherwise blocked), and the scheduler can choose whatever appropriate.

CAFxX · 2018-09-07T22:40:07Z

When it spawns the child, we can put the child's stack at, say, base+90K, and adjust the parent's stack bound.

If and when #25999 (comment) materializes then this could even be done precisely by knowing how much stack space both parent and child actually require.

RLH · 2018-09-09T13:43:58Z

In Go terms I believe Cilk basically turned the go command into a call but the calling frame was prepared in such a way that the parent's goroutine stack could be copied, stolen, and then resumed by another P. I never figured out the actual mechanism used in Cilk stealing but Go's ability to relocate stacks is unique and seems to be part of the solution for Go. I suspect that we could copy the child goroutine instead but that seems racier. This would also bound the number of goroutines the scheduler would have to manage as well as a lot of other nice properties discussed in the Cilk papers. Go should preserve these bounds if we do this.

…

On Fri, Sep 7, 2018 at 6:40 PM, Carlo Alberto Ferraris < ***@***.***> wrote: When it spawns the child, we can put the child's stack at, say, base+90K, and adjust the parent's stack bound. If and when #25999 (comment) <#25999 (comment)> materializes then this could even be done precisely by knowing how much both parent and child need in term of stack space. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#27345 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA7Wnx2xixDi0c2XgX3C78od7v5KuIijks5uYvXUgaJpZM4WSA6H> .

aclements · 2018-09-14T18:08:46Z

This would also bound the number of goroutines the scheduler would have to
manage as well as a lot of other nice properties discussed in the Cilk
papers. Go should preserve these bounds if we do this.

I think it's worth striving for similar bounds, but Cilk was able to achieve these bounds strictly only because it had a very strict fork/join model of parallelism, where the parent function couldn't even return until all of its child threads had returned/exited. Because of Go's ad hoc parallelism, the best we could do (which would still be quite good) is to say, if your program is structured like X, then it will have some space and time bounds.

Let the parent keep running until it blocks (like today), and then start running the new goroutine on the parent's stack. We'd need somewhere to stash the goroutine's argument frame. Here we could at least bound the amount of buffering for the argument frames, since we could always fall back to spawning a full goroutine.

@pjweinb pointed out to me today that we could switch to the child when the parent blocks or tries to start a second child goroutine. Bounding it to a single pending goroutine makes the bookkeeping much simpler and may go towards ensuring Cilk-like space bounds. We could also combine this with @cherrymui's idea, which would give us a place to put the pending argument frame: when the parent go's the child, we immediately assign most of the parent's remaining stack to the child, put the goroutine's argument frame in its section of the stack, shrink the parent's stack bounds, and let the parent keep running. When the parent blocks, tries to start another child, or encounters its stack bound, we switch to the child (without moving the parent's stack at that time).

jadbox · 2019-04-19T04:46:51Z

This seems promising- any updates about working towards improving goroutine speculative stack spawning?

aclements · 2019-04-19T12:00:30Z

Sadly, no. Too much other work on our collective plates right now.

josharian added the Performance label Aug 29, 2018

josharian added this to the Unplanned milestone Aug 29, 2018

FiloSottile added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 30, 2018

randall77 mentioned this issue May 28, 2019

runtime: optimization to reduce P churn #32113

Open

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022

mknyszek added this to Go Compiler / Runtime Jul 7, 2022

mknyszek removed this from Go Compiler / Runtime Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: use parent goroutine's stack for new goroutines #27345

runtime: use parent goroutine's stack for new goroutines #27345

josharian commented Aug 29, 2018

aclements commented Sep 6, 2018

cherrymui commented Sep 6, 2018

CAFxX commented Sep 7, 2018 •

edited

Loading

RLH commented Sep 9, 2018 via email

aclements commented Sep 14, 2018

jadbox commented Apr 19, 2019

aclements commented Apr 19, 2019 via email

runtime: use parent goroutine's stack for new goroutines #27345

runtime: use parent goroutine's stack for new goroutines #27345

Comments

josharian commented Aug 29, 2018

aclements commented Sep 6, 2018

cherrymui commented Sep 6, 2018

CAFxX commented Sep 7, 2018 • edited Loading

RLH commented Sep 9, 2018 via email

aclements commented Sep 14, 2018

jadbox commented Apr 19, 2019

aclements commented Apr 19, 2019 via email

CAFxX commented Sep 7, 2018 •

edited

Loading