This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
RFC/POC: Multi-scheduler for and in Julia #43649
Labels
multithreading
Base.Threads and related functionality
speculative
Whether the change will be implemented is speculative
There have been interests in supporting Julia tasks with different scheduling trade-offs. In particular, users have been requesting a way to have latency-oriented Julia tasks in a single process (#41586, #34267). Furthermore, like many problems in programming, you can always make a better solution (= scheduling policy) if you know the properties of the problem. It would be nice if the users can control the task scheduling in a more flexible manner.
There have been a couple of suggestions and proof-of-concepts trying to address this problem (#42302, #42709) which in effect "partition" the current scheduler. These approaches may be enough for the throughput-latency trade-off. But I feel a long-term vision is lacking. If we complicate the task scheduling API surface, it can make improvements harder down the line especially within the 1.x lifetime. Our scheduler is already rather complex due to the co-existence of sticky- and non-sticky- tasks. Accumulating more quirks may not be a better long-term strategy. Can we take a more principled look at the problem for maximizing the extensibility and the optimizability?
To attack this problem, I suggest looking into a way to have multiple schedulers in a single
julia
process that are implemented in Julia. The former point provides more solutions to the complex task scheduling problems including, but not limited to, throughput-latency trade-offs. The latter point indicates that we add API to access "OS-level threads" and minimize the C runtime. This let many Julia programmers iterate rapidly on the analysis, design, and implementation of the scheduler.Design requirements
Point 1 is absolutely crucial. Let me emphasize that what I'm suggesting is NOT the plugable scheduler architecture you've seen in other languages like Python/Rust/C++/... where the choice of the event loop library locks you into a sub-ecosystem. As @StefanKarpinski puts it, "Go’s superpower as a language seems to be the fact that they implemented an incredibly good built-in task-based threading system and absolutely everyone uses it" (emphasis mine). Similarly, in Julia, we'd want to avoid the plugable event loop approach that can partition our ecosystem into non-composable subsets. Thus, it is crucial for the composability of parallel programs that each component does not control how the tasks are scheduled. This is achieved by the task and I/O API ("effects" 2) that are oblivious to the scheduler and the synchronization API that can transparently handle cross-scheduler synchronizations (see proof-of-concept below); i.e., no change for these existing API is required. (In fact, I value point 1 so much that I've been very reluctant to other proposals for solving the issue.)
Point 2 is the main problem that this RFC tries to solve. If we can have multiple schedulers, the application 1 author can arrange two schedulers; one for computed-oriented tasks and one for UI. Some composition of parallel programs may be executed fast with breadth-first scheduling than depth-first scheduling or vice versa. The application author can then choose an appropriate scheduler.
The last part is a side effect of point 2 and I'm not sure if it is a desirable property. This seems to oppose to why point 1 is so important. Maybe we can design better tooling around it to avoid excessive use of per-library thread pool. For example, it may be just enough to prepare a single throughput-oriented but mostly-quiet global thread pool for all UI libraries.
Implementing schedulers in Julia is not strictly required for satisfying the above requirements. But it would let us prepare an interface for accessing OS-level threads. Having such an API may be useful for implementing interop with the external libraries that do not like Julia's stack management. This in turn extends the applicability of point 3 and also makes it easier to try fancier stack management like cactus stack down the line.
Design sketch
At a high level, I suggest implementing scheduler(s) in
Base
(i.e., in Julia) based on the worker thread API exposed from the C runtime.A worker thread 2 is simply an OS-level thread with ptls. I think it can act like the root sticky task we have today. To implement schedulers in Julia, a worker thread API needs to include spawn, wait (sleep), and scheduler (wakeup).
A scheduler is built on top of a pool of worker threads. It manages Julia
Task
s using concurrent data structures and executes them usingyieldto
(see proof-of-concept below).To support multiple schedulers, the
Task
object needs a field for storing the reference to the scheduler. This is can be used to determine the destination ofyieldto
and also to determine the queue to be used for rescheduling. The latter is required for creating synchronization APIs that work across the task sets managed by different schedulers.Proof-of-concept
Writing M:N multi-tasking scheduler for Julia's native
Task
is actually already somewhat 3 possible and you can find examples in my proof-of-concept Schedulers.jl. Of course, this is not very surprising since we haveyieldto
that exposes a full-blown coroutine interface. What is perhaps slightly less trivial is to determine the requirements for cross-scheduler communications. I solved it in Schedulers.jl by using an unbounded lock-free queue for inter-scheduler (re)scheduling (e.g., spawn/wait). This can be used as a foundation of synchronization mechanisms such as locks and channels (I haven't implemented it yet but I think it's possible from my experience in ConcurrentCollections.jl). A very preliminary performance analysis shows a nice scaling with Schedulers.jl (tkf/Schedulers.jl#26) although this could be due to some missing features compared to the native scheduler in Julia.Footnotes
Let's loosely define an application to be a program whose user initiates the
julia
process in some way in this context. That is to say, an application "owns" the givenjulia
process and hence is "allowed" to control the global resource like worker thread assignment. In this definition, a Julia script is an application and the user typing code in REPL is composing an application on the fly. ↩ ↩2This RFC is loosely inspired by the upcoming parallelism support in OCaml. In fact, what I called "worker thread" (OS thread + ptls) is (IIUC) essentially what OCaml calls a domain. The schedulers in the above picture are similar to the task pool in domainslib. To elide the dynamic branches in the scheduler, I think we need good support for effect handlers like OCaml; but this is outside the scope of this RFC. Just being able to Revise or monkey-patch the scheduler runtime is very likely to encourage people to look into the performance tuning already. ↩ ↩2
Limitations and difficulties in Schedulers.jl are discussed in https://github.com/tkf/Schedulers.jl/discussions/24 ↩
The text was updated successfully, but these errors were encountered: