Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fast WaitOnAddress-based thread parker for Windows. #77618

Merged
merged 8 commits into from
Dec 14, 2020

Conversation

m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented Oct 6, 2020

This adds a fast futex-based thread parker for Windows. It either uses WaitOnAddress+WakeByAddressSingle or NT Keyed Events (NtWaitForKeyedEvent+NtReleaseKeyedEvent), depending on which is available. Together, this makes this thread parker work for Windows XP and up. Before this change, park()/unpark() did not work on Windows XP: it needs condition variables, which only exist since Windows Vista.


Unfortunately, NT Keyed Events are an undocumented Windows API. However:

  • This API is relatively simple with obvious behaviour, and there are several (unofficial) articles documenting the details. [1]
  • parking_lot has been using this API for years (on Windows versions before Windows 8). [2] Many big projects extensively use parking_lot, such as servo and the Rust compiler itself.
  • It is the underlying API used by Windows SRW locks and Windows critical sections. [3] [4]
  • The source code of the implementations of Wine, ReactOs, and Windows XP are available and match the expected behaviour.
  • The main risk with an undocumented API is that it might change in the future. But since we only use it for older versions of Windows, that's not a problem.
  • Even if these functions do not block or wake as we expect (which is unlikely, see all previous points), this implementation would still be memory safe. The NT Keyed Events API is only used to sleep/block in the right place.

[1]: http://www.locklessinc.com/articles/keyed_events/
[2]: Amanieu/parking_lot@43abbc964e
[3]: https://docs.microsoft.com/en-us/archive/msdn-magazine/2012/november/windows-with-c-the-evolution-of-synchronization-in-windows-and-c
[4]: Windows Internals, Part 1, ISBN 9780735671300


The choice of fallback API is inspired by parking_lot(_core), but the implementation of this thread parker is different. While parking_lot has no use for a fast path (park() directly returning if unpark() was already called), this implementation has a fast path that returns without even checking which waiting/waking API to use, as the same atomic variable with compatible states is used in all cases.

@rust-highfive
Copy link
Collaborator

r? @kennytm

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 6, 2020
@m-ou-se
Copy link
Member Author

m-ou-se commented Oct 6, 2020

@rustbot modify labels: +T-libs +O-windows +A-concurrency

@rustbot rustbot added A-concurrency Area: Concurrency O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Oct 6, 2020
@m-ou-se
Copy link
Member Author

m-ou-se commented Oct 8, 2020

@retep998 If I understood correctly, you're the right person to ask about Windows APIs in Rust?

@programmerjake
Copy link
Member

AFAICT this looks correct. Note that an acquire operation only synchronizes with the most recent release operation on a particular atomic variable (see LLVM Lang spec), so two unparks followed by a park all on different threads would only have the park thread synchronize with whichever of the two unparks came later, making reading any data written by the earlier unparking thread technically a data race. This is technically allowed by park's documentation, but could be unexpected.

@m-ou-se
Copy link
Member Author

m-ou-se commented Oct 27, 2020

@rustbot ping windows

@rustbot
Copy link
Collaborator

rustbot commented Oct 27, 2020

Hey Windows Group! This bug has been identified as a good "Windows candidate".
In case it's useful, here are some instructions for tackling these sorts of
bugs. Maybe take a look?
Thanks! <3

cc @arlosi @danielframpton @gdr-at-ms @kennykerr @luqmana @lzybkr @retep998 @rylev @sivadeilra

@kennykerr
Copy link
Contributor

I really cannot condone the use of undocumented APIs. I would highly recommend dropping support for Windows XP. There are so many great concurrency APIs that are available starting with Windows 7. This would dramatically simplify the code and avoid the need to dynamically load anything.

I did however want to comment on the compat_fn macro - I was curious how that was implemented. It uses GetModuleHandleW to find the module to pass to GetProcAddress when it should instead be using LoadLibraryW. Using GetModuleHandleW assumes the module has already been loaded by the calling process. That may be a safe bet for certain functions that you want to “delay load” but certainly not all. LoadLibraryW will just go the extra mile of also loading the module if it has not already been loaded. There's also no need to call FreeLibrary for such APIs so it should be a pretty simple fix.

@m-ou-se
Copy link
Member Author

m-ou-se commented Oct 27, 2020

I really cannot condone the use of undocumented APIs. I would highly recommend dropping support for Windows XP. There are so many great concurrency APIs that are available starting with Windows 7. This would dramatically simplify the code and avoid the need to dynamically load anything.

Windows 7 does not have anything futex-like (like WaitOnAddress for Windows 8+) though.

Do you think there's a significant difference between fully dropping support for XP (and Vista, etc.) versus a best-effort implementation using this undocumented API? Lots of Rust software already uses this API, including Rustc itself, through parking_lot.

I did however want to comment on the compat_fn macro - I was curious how that was implemented. It uses GetModuleHandleW to find the module to pass to GetProcAddress when it should instead be using LoadLibraryW. Using GetModuleHandleW assumes the module has already been loaded by the calling process. That may be a safe bet for certain functions that you want to “delay load” but certainly not all. LoadLibraryW will just go the extra mile of also loading the module if it has not already been loaded. There's also no need to call FreeLibrary for such APIs so it should be a pretty simple fix.

Thanks! Opened #78444 for that.

@kennykerr
Copy link
Contributor

I think WaitOnAddress and friends are some of the few that are new to Windows 8. Most of the slim locks, condition variables, thread pool primitives - all of which are incredibly fast and powerful - are available on Windows 7 (some Vista) and later. Those are indispensable for high-performance concurrency and completely missing from Windows XP.

I haven't looked too closely at the Rust thread parker requirements, but why not you just use an event e.g. CreateEventW/SetEvent/WaitForSingleObject? Events may sound "heavyweight" but they are very efficient in practice (and they've been around forever).

@oldnewthing may also have some suggestions for WaitOnAddress alternatives.

@ChrisDenton
Copy link
Member

For Vista+ there's SlimRW/ConditionVariable? I think they're basically wrappers around keyed events anyway, no?

@kennykerr
Copy link
Contributor

For Vista+ there's SlimRW/ConditionVariable? I think they're basically wrappers around keyed events anyway, no?

That's correct. I wrote about that here.

@m-ou-se
Copy link
Member Author

m-ou-se commented Oct 27, 2020

For Vista+ there's SlimRW/ConditionVariable? I think they're basically wrappers around keyed events anyway, no?

Yeah, that's what Rust's Mutex and Condvar use when available.

WaitOnAddress/keyed events are be a much nicer building block, especially for things that are not exactly mutexes or condition variables, because it allows inlining the atomic compare-exchange fast path and only requires a call to the API when waiting is necessary. A thread parker based on a condition variable also needs a mutex and a separate state variable too, making it less efficient and more complicated than just WaitOnAddress on a single byte.

The std code does not know at compile time on which Windows version it is going to run, so if it has an implementation based on WaitOnAddress, it still needs a fallback that works on Windows 7. The keyed events api as a fallback fits nicely, and doesn't even require checking availablility of the new api in the fast path (as in this PR). Any other fallback will need this check on every park()/unpark() (just like Mutex already does to decide between the slim lock and a critical section), or always use the older API.

@luqmana
Copy link
Member

luqmana commented Oct 27, 2020

Replacing keyed events with the classic CreateEvent/SetEvent/WaitForSingleObject is fairly straightforward. I haven't tried to compile this but the general idea is hopefully clear https://gist.github.com/a40c879e6bdcfe3ca4b67948302dd337 (though note there is some error handling missing there).

Edit: Updated for per-parker event handles

@m-ou-se
Copy link
Member Author

m-ou-se commented Oct 27, 2020

Replacing keyed events with the classic CreateEvent/SetEvent/WaitForSingleObject is fairly straightforward. I haven't tried to compile this but the general idea is hopefully clear https://gist.github.com/a40c879e6bdcfe3ca4b67948302dd337 (though note there is some error handling missing there).

Uh, that uses a single event object for all thread parkers, without using anything as key. How would that work? How would unpark() cause the right park()'ed thread to wake up?

@luqmana
Copy link
Member

luqmana commented Oct 27, 2020

Replacing keyed events with the classic CreateEvent/SetEvent/WaitForSingleObject is fairly straightforward. I haven't tried to compile this but the general idea is hopefully clear https://gist.github.com/a40c879e6bdcfe3ca4b67948302dd337 (though note there is some error handling missing there).

Uh, that uses a single event object for all thread parkers, without using anything as key. How would that work? How would unpark() cause the right park()'ed thread to wake up?

Ah, true. Hmm. What's the cost of an extra Handle in Parker? Not quite as slim as the futex impls but still better than the generic one.

@ChrisDenton
Copy link
Member

To summarise some assorted thoughts based on discussions here and elsewhere:

One of the issues with the current compatibility shim is that it hard codes legacy dll names. Loading of the new sync APIs doesn't have this issue as it uses the API set ("api-ms-win-core-synch-l1-2-0").

Another issue is that loading a module restricts where Rust code can be used. However, if this dynamic loading becomes limited to park/unpark then it at least reduces the chances of this being an issue in practice (though park is used by sync::once).

Using undocumented ntdll functions is more controversial. Perhaps it would help if this was gated by an explicit runtime check of the OS version? I mean, people still aren't going to be comfortable with this but at least it would be explicitly limited.


These issues may be solved more satisfactorily in the future by either a target_os_version cfg option or by more fine grained separation of targets. Either would allow for different implementations at compile time. But until then there doesn't seem to be a way to square this circle without using undocumented APIs?

@kennykerr
Copy link
Contributor

They're not really legacy DLL names - they're the actual DLL names. The API sets are just forwarders but the OS loader can figure it out either way. The actual DLL names are preferred IMO because they work down-level before API sets were introduced. The official onecore and onecoreuap libs we use internally and provide through the SDK redirect to the actual DLL names where possible.

Dynamic loading is a normal part of Windows programming. Version checks on the other hand are extremely bad for reliability and compat. I'd sooner recommend the undocumented APIs than have you start checking version numbers. 😉

@ChrisDenton
Copy link
Member

Ah, elsewhere it was mentioned that hard coding dll names like "kernel32" made things more difficult in the lab test when the OS being used doesn't have kernel32.

@camelid camelid added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 20, 2020
@Dylan-DPC-zz
Copy link

r? @dtolnay

@rust-highfive rust-highfive assigned dtolnay and unassigned kennytm Nov 20, 2020
@Dylan-DPC-zz
Copy link

@dtolnay any updates on reviewing this?

@dtolnay
Copy link
Member

dtolnay commented Dec 14, 2020

I am not confident enough about the tradeoffs here — it would be good to get some additional sets of critical eyes on this implementation from the libs team.

@rust-lang/libs
@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented Dec 14, 2020

Team member @dtolnay has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Dec 14, 2020
@BurntSushi
Copy link
Member

I like the fact that this matches Wine and ReactOS. If we're doing what they're doing, then I think we're in good company.

However, I am a bit skeptical of adding support for Windows XP where it seems like none existed before? I don't understand everything involved here, but it sounds like the XP support is incidental? That is, even if we didn't endeavor to add XP support, we would still get it by virtue of supporting Windows 7. Am I understanding that correctly?

@m-ou-se
Copy link
Member Author

m-ou-se commented Dec 14, 2020

it sounds like the XP support is incidental? That is, even if we didn't endeavor to add XP support, we would still get it by virtue of supporting Windows 7. Am I understanding that correctly?

Yes.

@Amanieu
Copy link
Member

Amanieu commented Dec 14, 2020

I don't think an FCP is needed here since this doesn't change any user-visible APIs. I reviewed both code paths and they look correct to me, so I'm just going to merge this as it is.

@bors r+

@bors
Copy link
Contributor

bors commented Dec 14, 2020

📌 Commit 03fb61c has been approved by Amanieu

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 14, 2020
@bors
Copy link
Contributor

bors commented Dec 14, 2020

⌛ Testing commit 03fb61c with merge fa41639...

@bors
Copy link
Contributor

bors commented Dec 14, 2020

☀️ Test successful - checks-actions
Approved by: Amanieu
Pushing fa41639 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Dec 14, 2020
@bors bors merged commit fa41639 into rust-lang:master Dec 14, 2020
@rustbot rustbot added this to the 1.50.0 milestone Dec 14, 2020
@rfcbot rfcbot removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Dec 14, 2020
@m-ou-se m-ou-se deleted the windows-parker branch January 14, 2021 18:58
@m-ou-se m-ou-se changed the title Add fast futex-based thread parker for Windows. Add fast WaitOnAddress-based thread parker for Windows. Aug 26, 2021
fn keyed_event_handle() -> c::HANDLE {
const INVALID: usize = !0;
static HANDLE: AtomicUsize = AtomicUsize::new(INVALID);
match HANDLE.load(Relaxed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May need use atomic init_once to guard this, stop calling NtCreateKeyedEvent multiple times

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-concurrency Area: Concurrency merged-by-bors This PR was explicitly merged by bors. O-windows Operating system: Windows S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.