-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: change golang scheduler implementation to a lock-free stack #53398
Comments
That is not obviously true. At least, it's not obvious to me. And, sorry for being pedantic, do you mean a thread pool or a goroutine pool? After all, in Go threads and goroutines are different things. CC @bcmills |
You've marked this as a proposal but this seems like a pure performance issue, not a change in API. Performance improvements do not need to go through the proposal process. You've marked this as runtime but I don't understand what would change in the runtime package. |
Looking at the code, it's a goroutine pool. I don't see how things like |
@ianlancetaylor its a goroutine pool, I have used the words threads and goroutine pool interchangeably but that apparently has created some confusion. My bad.
Again , my bad. Didnt really chose the words well. What I meant was the addition of this library to sync package specifically. Limiting goroutines for concurrency and aggressively re-using them leads to improvement in performance The problem with creating unlimited goroutines is that it does a new goroutine allocation for every request and it also increases the burden on the garbage collector for collecting and freeing up the memory used by goroutines in its fire and forget model which is detrimental in the long run (especially for cases like HTTP servers). But if we use a goroutine pool as demonstrated by fasthttp and gnet, we can gain performance improvements by considerably reducing the allocations as well as memory usage. Also with infinite goroutines there will be too much context switching than doing actual work which affects latency. |
True, a hard limit has been set on concurrent connections but there are only so many CPU cores and with infinite goroutines there will be higher context switching that doing actual work which affects latency and throughput. Hence, limiting concurrency will actually lead to better performance. It will also reduce the strain on garbage collector considerably Bonus factor:- since itogami is optimized for CPU cache, retrieving goroutine pointers and calling |
Limiting concurrency may lead to better performance but if we limit the number of possible simultaneous net/http connections that will break real programs. We can't make a change like that. |
relevant issue #6785 ^ Not limiting the possible simultaneous net/http requests also exhausts sockets and breaks programs isn't that why imo having no restrictions is good theoretically but there are limits on hardware and sooner or later a choke point will arrive during the application run |
We obviously couldn't enable a default server-side max concurrency limit, in any case. It would break use-cases like servers implementing long-polling. I suppose it could be an option, but it would be incredibly rarely used, since it has significant sharp edges (e.g. what if you include a library with a longpoll handler, it'll take down your entire server in relatively short order). The notion of a fixed goroutine pool is dependent on it being expensive to "allocate" new goroutines. It seems much cleaner for the runtime to directly fix that cost (if it exists), rather than adding a new API that everyone will have to use. As an example: if a goroutine runs off the end of its stack, and assuming that nothing else can have references to a goroutine once its exited (right? with the possible exception of unsafe code, which we don't owe any API compatibility promise to), it seems relatively straightforward to reuse the memory for that goroutine on the next go statement. (I suppose this is in some sense having the runtime itself pool and reuse goroutine structures, but with the advantage that it will transparently improve all programs as opposed to a subset) |
Regarding long-polling yes, this pool is ill-suited for that purpose. But it is useful for extremely short-lived connections the kind we see in high-frequency trading servers as well as multiplayer game servers.
Agreed, this would be the best approach. Recycling of goroutines is better than fire and forget for high throughput bound cases. Doing so will allow |
I was doubtfull of itogami's claim to be really fast. I then made a generator based approach where the pool workers have reverse bingings to a closure from the consumer to get more tasks which is about 1.95x faster in this benchmark: If we remove the sleep statement from This not a new pattern, it is simple and short, variations of this (lazily generating the tasks when workers are not busy) can be seen in many codebases. I don't think it need to be a library for this. The work generator pattern has a different API which is slightly less convenient to use, but I don't think there lots of apps that can't be rewritten to use it. For example since it is talking about My understanding is that in go slow things should be hard to do. That why we can't cast a slice of structs to a slice of interfaces, or why select doesn't accept channel slices, ... IMO, Adding a goroutine pool to the std is making a slow thing easy to do. |
The runtime already does this. See |
As an aside, the x/sync.ErrGroup just recently introduced a |
@smasher164 the performance of itogami is better than x/sync.ErrGroup with
Here is the benchmarking code https://github.com/alphadose/go-threadpool-benchmarks/blob/master/general_test.go |
This sentence/sentiment doesn't seem to be logically sound, as slow and easy aren't exactly comparable, more so as you seem to focus on fast; what do you want to express? |
|
With golang 1.19, I made comparisons of golang scheduler against itogami scheduler for unlimited goroutines. Here are the benchstat for 20 cases ( code available here )
From the above results it is evident that the itogami scheduler takes almost the same time as golang scheduler but consumes much lesser memory and has half the number of allocations. The current golang scheduler works on the mechanism of lock first via Here is an snippet from the golang codebase displaying this func goschedImpl(gp *g) {
status := readgstatus(gp)
if status&^_Gscan != _Grunning {
dumpgstatus(gp)
throw("bad g status")
}
casgstatus(gp, _Grunning, _Grunnable)
dropg()
lock(&sched.lock) // acquire scheduler lock
globrunqput(gp) // place goroutine pointer in queue
unlock(&sched.lock) // release lock
schedule()
} I propose changing the current golang scheulder implementation from There are 2 major benefits to this:-
|
To further highlight this issue I made another micro-benchmark test here which does a simple recursive computation and sleeps for 1 microsecond in each call. Here are the results for 20 cases:-
In this case specifically it is evident that the itogami scheduler triumphs the current golang scheduler for all 3 metrics including time |
Limiting the number of goroutines and re-using them as much as possible will lead to a significant gain in performance in critical areas like
net/http
.itogami is a thread-pool implementation which is the most efficient among all existing threadpools because it uses golang runtime internal functions like
getg()
,goready()
etc for scheduling. Benchmarks to support the claims present in the repository itself.The addition of a thread-pool to golang core will unlock lots of opportunities for solving performance-critical problems.
Also since itogami already uses runtime internal functions, it would be take no effort to integrate it with the core lib.
The text was updated successfully, but these errors were encountered: