Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: Vendor jobserver impl and rm thread spawning in parallel compile_objects #889

Merged
merged 22 commits into from
Nov 11, 2023
Merged

Conversation

NobodyXu
Copy link
Collaborator

This PR contains a simpler, vendored implementation of jobserver optimized for cc, it supports non-blocking try_acquire.

By using the non-blocking try_acquire and async-as-state-machine, I was able to completely eliminate usage of additional thread inside parallel version of compile_objects.

@thomcc
Copy link
Member

thomcc commented Oct 22, 2023

This looks great at first glance, but it might not be for a week or two before I can actually review it (it's a big PR).

@NobodyXu
Copy link
Collaborator Author

but it might not be for a week or two before I can actually review it (it's a big PR).

That's totally fine, it's huge and change a lot of stuff.

I might also polish it a bit in the meantime and added more testing for the parallel compile_objects to detect regression we previously seen (e.g. passing more objects than the cores, msrv).

@thomcc
Copy link
Member

thomcc commented Oct 22, 2023

Yep, feel free.

src/job_token.rs Outdated Show resolved Hide resolved
src/job_token.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@osiewicz
Copy link
Contributor

osiewicz commented Oct 24, 2023

Overall I like your approach and I think it solves the issue that we've had with initial channel-based implementation nicely, though I think it would be nice to explain:

  • why do we have a static token queue in the first place
  • why we can't really store the tokens in the queue and have to use an external jobserver for that (that's due to Rust not supporting Drop for static).

I didn't look much into the jobserver reimplementation, but I'm assuming it's a verbatim copy of jobserver with try_acquire slapped on top; if so, I'd suggest to state which version it is based on so that one could verify in the future if there's a good reason (e.g. a bugfix) to sync them again.
I also think that it would be pretty cool to split this up a bit.
Overall though this looks pretty cool to me, good job. =)

@NobodyXu
Copy link
Collaborator Author

  • why do we have a static token queue in the first place

The static token queue is for cases where there's no jobserver configured and we need to create a new jobserver.

Instead of porting the code to create a new jobserver, it's much simpler to have a static token queue implemented using atomics.

  • why we can't really store the tokens in the queue and have to use an external jobserver for that (that's due to Rust not supporting Drop for static).

I think there's some comments in the inherited_jobserver mod.

I didn't look much into the jobserver reimplementation, but I'm assuming it's a verbatim copy of jobserver with try_acquire slapped on top; if so, I'd suggest to state which version it is based on so that one could verify in the future if there's a good reason (e.g. a bugfix) to sync them again.

It's actually based on jobslot, a fork of jobserver.

But even so it's not identical apart from try_acquire, I have simplified the code a bit and I will port that back to jobslot later.

I also think that it would be pretty cool to split this up a bit.

Can you please elaborate on which part to split up?

Overall though this looks pretty cool to me, good job. =)

Thank you and thanks for reviewing my code!

@osiewicz
Copy link
Contributor

osiewicz commented Oct 25, 2023

Oh yeah, I've meant that we should probably have comments for the first 2 points; by static I've meant that we have this JOBSERVER global that we've introduced in the previous PR; what problems does it solve? Why can't we use the job queue there? The answer to both of these questions is obvious to me and you given that we've cooperated and we have the understanding of the problem, but it's not going to be obvious to someone who reads this code for the first time.

Can you please elaborate on which part to split up?

I think the async executor should be in a separate module (something like executor, idk). Backoff logic could be hidden away as well, as this code (IMO) mixes the job scheduling (high level) with polling for the futures.

@NobodyXu
Copy link
Collaborator Author

Oh yeah, I've meant that we should probably have comments for the first 2 points;

That makes sense, I will add that.

I think the async executor should be in a separate module (something like executor, idk). Backoff logic could be hidden away as well, as this code (IMO) mixes the job scheduling (high level) with polling for the futures.

That's a good idea!

I will extract anything related to it as a separate module.

@NobodyXu NobodyXu requested a review from osiewicz October 25, 2023 11:42
@NobodyXu
Copy link
Collaborator Author

cc @nrc since this PR creates a small async executor and it might help your "Async fundamentals initiative: portable and interoperable" in adding an small single-thread async executor to libstd.

Copy link
Contributor

@osiewicz osiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NobodyXu
Copy link
Collaborator Author

It seems that this PR have some MSRV issues, will wait for #890 to be merged and rebase.

@NobodyXu
Copy link
Collaborator Author

NobodyXu commented Nov 1, 2023

Running cargo +1.46 c --all-features gives me the following errors:

error[E0658]: or-patterns syntax is experimental
  --> src/job_token/unix.rs:88:17
   |
88 |                 Some(libc::O_RDONLY) | Some(libc::O_RDWR),
   |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: see issue #54883 <https://github.com/rust-lang/rust/issues/54883> for more information

error[E0658]: or-patterns syntax is experimental
  --> src/job_token/unix.rs:89:17
   |
89 |                 Some(libc::O_WRONLY) | Some(libc::O_RDWR),
   |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: see issue #54883 <https://github.com/rust-lang/rust/issues/54883> for more information

error[E0658]: use of unstable library feature 'slice_strip'
  --> src/job_token/unix.rs:28:21
   |
28 |                 arg.strip_prefix(b"--jobserver-fds=")
   |                     ^^^^^^^^^^^^
   |
   = note: see issue #73413 <https://github.com/rust-lang/rust/issues/73413> for more information

error[E0658]: use of unstable library feature 'slice_strip'
  --> src/job_token/unix.rs:29:37
   |
29 |                     .or_else(|| arg.strip_prefix(b"--jobserver-auth="))
   |                                     ^^^^^^^^^^^^
   |
   = note: see issue #73413 <https://github.com/rust-lang/rust/issues/73413> for more information

error[E0658]: use of unstable library feature 'slice_strip'
  --> src/job_token/unix.rs:33:31
   |
33 |         if let Some(fifo) = s.strip_prefix(b"fifo:") {
   |                               ^^^^^^^^^^^^
   |
   = note: see issue #73413 <https://github.com/rust-lang/rust/issues/73413> for more information

error[E0599]: no method named `split_once` found for reference `&str` in the current scope
  --> src/job_token/unix.rs:64:31
   |
64 |         let (read, write) = s.split_once(',')?;
   |                               ^^^^^^^^^^ method not found in `&str`

error: aborting due to 6 previous errors

Some errors have detailed explanations: E0599, E0658.
For more information about an error, try `rustc --explain E0599`.
error: could not compile `cc`.

To learn more, run the command again with --verbose.

The first one is rust-lang/rust#79278 stablised in 1.53, the second one is rust-lang/rust#77853 stablised in 1.51 , the last one is rust-lang/rust#81940 stablised in 1.52

Given that rust 1.53 is released on Jun 18, 2021 and it has been more than two years, @thomcc would it makes sense to bump the MSRV (again)?

Also 1.46 has problem with linking on latest MacOS and fetching from crates.io is super slow in CI, so it makes sense to bump MSRV to a newer version.

@thomcc
Copy link
Member

thomcc commented Nov 3, 2023

would it makes sense to bump the MSRV (again)?

I'm fine with it. I'd be okay bumping as new as 1.63 (a bit older than a year, and also compatible with debian stable), but beyond that I don't see a reason for compat with old Rust.

(I'll be actually reviewing this this weekend, just was skimming through the comments)

It supports non-blocking `try_acquire` and is much simpler than the one
provided by `jobserver`

Signed-off-by: Jiahao XU <[email protected]>
Also fixed compilation errors in mod `job_token`

Signed-off-by: Jiahao XU <[email protected]>
Remove use of mpsc since the future is executed on one single thread
only.

Signed-off-by: Jiahao XU <[email protected]>
The mpsc is stored in a global variable and Rust never calls
`Drop::drop` on global variables, so they are never released.

This commit removes the mpsc and replaces that with an `AtomicBool` for
the implicit token to fix this, also dramatically simplifies the code.

Signed-off-by: Jiahao XU <[email protected]>
Return `Ok(None)` instead of `Err()` if no token is ready.

Signed-off-by: Jiahao XU <[email protected]>
`O_RDWR` is a valid access mode for both read and write end of the pipe.

Signed-off-by: Jiahao XU <[email protected]>
@NobodyXu NobodyXu requested a review from thomcc November 4, 2023 23:45
Copy link
Member

@thomcc thomcc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good. Clean and nice to read, especially for code that could be much more complex. I left a lot of comments but I don't think any should be tough to resolve (anything that is a big change we should punt on until after this release, to avoid releasing too many possibly-breaking things at once).

src/async_executor.rs Show resolved Hide resolved
src/job_token/unix.rs Outdated Show resolved Hide resolved
// Note that we could use `num_cpus` here but it's an extra
// dependency that will almost never be used, so
// it's generally not too worth it.
let mut parallelism = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think perhaps we could use (at least as an upper bound) https://doc.rust-lang.org/nightly/std/thread/fn.available_parallelism.html if we bump to 1.59.

Did you say that some folks (@dot-asm? Sorry if I'm misremembering) needed support for older Rust than that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @dot-asm is against bumping msrv past 1.56

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dot-asm can you make a defense as to why? It would be pretty nice to use available_parallelism here to avoid oversaturating the CPU on machines without a lot of concurrency (which includes a lot of vms/containers and machines people use for CI)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't NUM_JOBS going to be set most of the time when cc is being invoked through Cargo? Thus making parallelism's default value less likely to be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. This is fine for now then, but I think we should this in a version or so, even if only when parallelism is active. For now, you're right that it's probably not a big deal.

(I might reach out to @dot-asm to see why they're stuck and how common their case is likely to be though)

Anyway can you change the comment to say that it should use available_parallelism once MSRV is bumped?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've added a "TODO" comment.

src/job_token/unix.rs Outdated Show resolved Hide resolved
src/job_token.rs Outdated Show resolved Hide resolved
src/job_token.rs Outdated Show resolved Hide resolved
src/job_token/unix.rs Show resolved Hide resolved
src/job_token/windows.rs Outdated Show resolved Hide resolved
src/job_token/windows.rs Show resolved Hide resolved
src/lib.rs Show resolved Hide resolved
Co-authored-by: Thom Chiovoloni <[email protected]>
@thomcc thomcc mentioned this pull request Nov 5, 2023
Signed-off-by: Jiahao XU <[email protected]>
Since the manual specifies that only `--jobsewrver-auth` will be used
and windows does not have the concept of fds anyway.

Signed-off-by: Jiahao XU <[email protected]>
// Note that we could use `num_cpus` here but it's an extra
// dependency that will almost never be used, so
// it's generally not too worth it.
let mut parallelism = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. This is fine for now then, but I think we should this in a version or so, even if only when parallelism is active. For now, you're right that it's probably not a big deal.

(I might reach out to @dot-asm to see why they're stuck and how common their case is likely to be though)

Anyway can you change the comment to say that it should use available_parallelism once MSRV is bumped?

src/job_token/unix.rs Show resolved Hide resolved
src/job_token/windows.rs Show resolved Hide resolved
THREAD_SYNCHRONIZE, WAIT_ABANDONED, WAIT_FAILED, WAIT_OBJECT_0, WAIT_TIMEOUT,
};

const WAIT_ABANDOEND_ERR_MSG: &str = r#" The specified object is a mutex object that was not released by the thread that owned the mutex object before the owning thread terminated. Ownership of the mutex object is granted to the calling thread and the mutex state is set to nonsignaled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is way more error information than we should provide, and none of it is likely to be useful for us. It's also wrong, because we're waiting on a semaphore object, but it says it was a mutex object, and describes some of the concerns around abandoned mutex objects.

Also, Windows semaphore objects aren't owned by a process/thread (unlike windows mutex objects), so I am pretty sure that WAIT_ABANDONED can never be returned for one anyway.

So we should probably just change this to something like "Wait on jobserver semaphore returned WAIT_ABANDONED" (it probably doesn't need to be const this way too), and add a comment in try_acquire that we believe this to be impossible for a semaphore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've updated the errmsg and added the comment.

@NobodyXu NobodyXu requested a review from thomcc November 7, 2023 09:26
Copy link
Member

@thomcc thomcc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work, this is great!

@thomcc thomcc merged commit fcedb00 into rust-lang:main Nov 11, 2023
16 checks passed
@NobodyXu NobodyXu deleted the jobserver branch November 11, 2023 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants