Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with WorkStealing #50221

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from
Draft

Experiment with WorkStealing #50221

wants to merge 8 commits into from

Conversation

vchuravy
Copy link
Member

Basically a continuation from #43366.

It shows similar promise for pfib, but I was actually motivated
by the benchmark in #50202 (comment)
where we did a lot of work trying to figure out if there are no more tasks to run.

One issue I am aware off is that the underlying data-structure is not GC friendly.
Fundamentally we will only copy over data in the buffer, we are gurantueed to only
read valid data, but on the unused pieces of the buffer we have task-corpses laying
around.

A second issue is that the current implementation only allows for one thread to push! and popfirst!,
while all other threads are only allowed to steal!.

It looks like Go uses a double-ended queue.

@vchuravy vchuravy added speculative Whether the change will be implemented is speculative performance Must go faster multithreading Base.Threads and related functionality labels Jun 20, 2023
@vchuravy vchuravy changed the title Experiment in WS Experiment with WorkStealing Jun 20, 2023
@vchuravy vchuravy mentioned this pull request Jun 20, 2023
3 tasks
@vchuravy
Copy link
Member Author

For my own record (with 4 threads).

WSQueue improved pfib from ~2.1s to ~1.4
ILLS decreased pfib from ~2.1s to 5.9s

I am not sure how to make WSQueue fully correct so I will probably investigate a lock-free double ended queue.

@gbaraldi
Copy link
Member

So this triggers an assertion when building with more than one thread set because

    if (jl_base_module == NULL) {
        // nthreads > 1 requires code in Base
        jl_atomic_store_relaxed(&jl_n_threads, 1);
        jl_n_markthreads = 0;
        jl_n_sweepthreads = 0;
        jl_n_gcthreads = 0;
    }

doesn't set the threadpool stuff, though I guess maybe it should @vtjnash?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality performance Must go faster speculative Whether the change will be implemented is speculative
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants