Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency model for rust analyzer #7444

Open
matklad opened this issue Jan 26, 2021 · 1 comment
Open

Concurrency model for rust analyzer #7444

matklad opened this issue Jan 26, 2021 · 1 comment
Labels
C-Architecture Big architectural things which we need to figure up-front (or suggestions for rewrites :0) ) E-hard S-unactionable Issue requires feedback, design decisions or is blocked on other work

Comments

@matklad
Copy link
Member

matklad commented Jan 26, 2021

Zulip discussion thread: https://rust-lang.zulipchat.com/#narrow/stream/185405-t-compiler.2Fwg-rls-2.2E0/topic/Concurrency.20in.20RA

Rust analyzer is a long-lived process which oversees many concurrent activities.
The code for this is mainly in the main_loop.rs. Luckely, the amount of
concurrency we need is small, so the thing is still manageable. Nonetheless, it
is hard to understand, and, being the heart of rust-analyzer, it would be cool
to simplify it.

The list of concurrent activities we need to handle:

  • LSP requests and notifications
  • background request processing
  • cargo check/cargo metadata project loading
  • cargo check diagnostics
  • vfs

Cross-cutting concerns:

  • cancellation
    • during shutdown
    • when switching Cargo.toml, it makes sense to kill cargo check.
      • this is the bit which doesn't map nicely to trio's scopes.
  • eventually consistent non-concurrent updates
    • for cargo check / cargo metadata we want to run at most one process at a time.
      If there's a current process, we should wait for it to finish and schedule a new one.
  • selection: in vfs, main loop and flycheck we need an ability to select among n concurrent activities.

Possible approaches:

1 Actorish model

This is mostly what we do today. Main thread owns all the state and has a
mailbox. Concurrent activities happen on background threads and communicate with
the main thread asynchronously and unidirectionaly, by posting the message

Project loading looks like this:

fn load_project():
    spawn(|| {
        send(Begin)
        for {
            send(Progress)
        }
        send(End(loaded_project))
    })


fn main_loop():
    let mut state = ...;
    loop {
        match next_event() {
            Begin => start_progress(&mut state),
            Progress => report_progress(&mut state)
            End(project) => {
                end_progress(&mut state)
                switch_to(&mut state, project)
            }
        }
    }

The benefit is that it's crystal clear what actually happens, as one can
dbg!(next_event) and get a linear history of all events as data.

The drawback is the tearing of control flow -- to understand a project reload
process (which conceptually is a single continous activity), one need to jump between load_project and main_loop.

2 High order actorish model

Rather than sending n different message types, we can have only one message type which packs a closure to be applied to the state

fn load_project():
    spawn(|| {
        send(|state| start_process(state))
        for {
            send(|state| report_progress(state))
        }
        send(|state| {
            end_progress(state)
            switch_to(state, loaded_project)
        })
    })


fn main_loop():
    let mut state = ...;
    loop {
        next_event()(&mut state);
    }

The benefit here is somewhat more linear control flow (though, everything still happens asynchronously) and extensibility with new "message types"

Also, because the set of functions is open, the set of actions supported by an Agent is also open, a sharp contrast to pattern matching message handling loops provided by some other languages.

https://clojure.org/reference/agents

The drawback is that the main loop is no-longer as obvious -- instead of a stream of events as data, you get a stream of closures.
Because there's no single place where state modification happens, it's harder to spot interferences between distinct concurrent activities

3 Mutexes

The third approach is to wrap the state into a mutex. The code looks exactly as
in the last example, with the difference that it actually is synchronous
(background activity resumes after critical section is finished, and not merely
scheduled).

fn load_project():
    spawn(|| {
        with_lock(|state| start_process(state))
        for {
            with_lock(|state| report_progress(state))
        }
        with_lock(|state| {
            end_progress(state)
            switch_to(state, loaded_project)
        })
    })

The main benefit here is simplicity -- each activity is a fully straight-line code.

The main drawback is that we don't have a central place where all mutation happens.

Misc

It seems the following abstraction of a concurrent process might be helpful:

let task: Task<R> = Task::spawn(|stop_token| {
    ...
    if stop_token.stop_requested() {
        return -1;
    }
    ...
    92
});

// Two-phase cancellation: we first request a stop, then we wait for the task to
// actually finish.
let task: StoppedTask = task.request_stop();
drop(task);

Alas, to be useful we need a way to plug this (a vector of these actually) into
select, and its unclear how to do that.

@matklad matklad added C-Architecture Big architectural things which we need to figure up-front (or suggestions for rewrites :0) ) S-unactionable Issue requires feedback, design decisions or is blocked on other work E-hard labels Jan 26, 2021
@matklad
Copy link
Member Author

matklad commented Jun 20, 2021

Maybe I've overdosed on Fyodor Mikhailovich Dostoevsky, but I think I am getting somewhere:

https://github.com/matklad/devils/blob/1553e000bbf8c6846044f1cf878024a0590a9a3d/tests/it.rs#L311-L485

bors added a commit that referenced this issue Mar 30, 2023
…paces-to-have-proc-macros, r=Veykril

fix: allow new, subsequent `rust-project.json`-based workspaces to get proc macro expansion

As detailed in #14417 (comment), `rust-project.json` workspaces added after the initial `rust-project.json`-based workspace was already indexed by rust-analyzer would not receive procedural macro expansion despite `config.expand_proc_macros` returning true. To fix this issue:
1. I changed `reload.rs` to check which workspaces are newly added.
2. Spawned new procedural macro expansion servers based on the _new_ workspaces.
    1. This is to prevent spawning duplicate procedural macro expansion servers for already existing workspaces. While the overall memory usage of duplicate procedural macro servers is minimal, this is more about the _principle_ of not leaking processes 😅.
3. Launched procedural macro expansion if any workspaces are `rust-project.json`-based _or_ `same_workspaces` is true. `same_workspaces` being true (and reachable) indicates that that build scripts have finished building (in Cargo-based projects), while the build scripts in `rust-project.json`-based projects have _already been built_ by the build system that produced the `rust-project.json`.

I couldn't really think of structuring this code in a better way without engaging with #7444.
bors added a commit that referenced this issue Mar 30, 2023
…paces-to-have-proc-macros, r=Veykril

fix: allow new, subsequent `rust-project.json`-based workspaces to get proc macro expansion

As detailed in #14417 (comment), `rust-project.json` workspaces added after the initial `rust-project.json`-based workspace was already indexed by rust-analyzer would not receive procedural macro expansion despite `config.expand_proc_macros` returning true. To fix this issue:
1. I changed `reload.rs` to check which workspaces are newly added.
2. Spawned new procedural macro expansion servers based on the _new_ workspaces.
    1. This is to prevent spawning duplicate procedural macro expansion servers for already existing workspaces. While the overall memory usage of duplicate procedural macro servers is minimal, this is more about the _principle_ of not leaking processes 😅.
3. Launched procedural macro expansion if any workspaces are `rust-project.json`-based _or_ `same_workspaces` is true. `same_workspaces` being true (and reachable) indicates that that build scripts have finished building (in Cargo-based projects), while the build scripts in `rust-project.json`-based projects have _already been built_ by the build system that produced the `rust-project.json`.

I couldn't really think of structuring this code in a better way without engaging with #7444.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-Architecture Big architectural things which we need to figure up-front (or suggestions for rewrites :0) ) E-hard S-unactionable Issue requires feedback, design decisions or is blocked on other work
Projects
None yet
Development

No branches or pull requests

1 participant