Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use futex-based synchronization on Apple platforms #122408

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

joboet
Copy link
Member

@joboet joboet commented Mar 12, 2024

Last week, Apple released macOS version 14.4, which introduced a public futex API called os_sync_wait_on_address (Apple has failed to include the documentation provided in the defining header os/os_sync_wait_on_address.h on its website). As the private API backing these functions has been around since macOS 10.12, our minimum supported version, it can be used as fallback without risking breakage in future versions.

This PR thus switches over Mutex to os_unfair_lock, which is movable and supports priority inheritance, and all other synchronization primitives (Condvar, RwLock, Once and thread parking) to the futex-based ones also used on Linux and Windows.

TODO: implement miri shims

@joboet joboet added the A-atomic Area: Atomics, barriers, and sync primitives label Mar 12, 2024
@joboet joboet marked this pull request as draft March 12, 2024 20:36
@rustbot
Copy link
Collaborator

rustbot commented Mar 12, 2024

r? @m-ou-se

rustbot has assigned @m-ou-se.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added O-unix Operating system: Unix-like S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 12, 2024
library/std/src/sys/pal/unix/futex.rs Show resolved Hide resolved

// These syscalls appeared with macOS 10.12.
weak! {
pub fn __ulock_wait(u32, *mut c_void, u64, u32) -> c_int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind explaining more what the thought process of these going undetected is? Seems like a risky chance that might require a hot fix if the MAS or iOS app store scanners see through this.

In reality I think this won't be enough and we need to call ulock things via syscall directly instead. This means that the "fallback" Apple futex implementation is only going to be usable on macOS.

I know Electron apps on the MAS use direct syscalls to emulate private functions, so this is probably what Rust should do too on macOS with precedence:

#define	SYS_ulock_wait     515
#define	SYS_ulock_wake     516
#define	SYS_ulock_wait2    544

For iOS, detect if the os_sync_ functions are available or fallback to the non-futex implementation :/ Calling a syscall directly is forbidden per the iOS headers so std can't do the same as macOS:

__WATCHOS_PROHIBITED __TVOS_PROHIBITED
__OS_AVAILABILITY_MSG(ios,deprecated=10.0,"syscall(2) is unsupported; "
    "please switch to a supported interface. For SYS_kdebug_trace use kdebug_signpost().")
__OS_AVAILABILITY_MSG(macosx,deprecated=10.12,"syscall(2) is unsupported; "
    "please switch to a supported interface. For SYS_kdebug_trace use kdebug_signpost().")
int	 syscall(int, ...);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a "spirit of the law" kind of situation. Apple understandably wants App Store software to continue working in newer versions, so they prohibit private API use, as they want to be able to remove or change that API at will. By only using the private API as fallback and not linking it directly, we fulfil this wish. If we directly linked the private API instead, we'd get dynamic linker errors if they remove it, so we can't do that.

On the other hand, we break the "letter of the law" as we are using private API. In my opinion, this is totally fine and justified, but of course, this is only my interpretation. It would be great if someone from Apple could look over this, just so that we know that they are aware of this. My past attempt at reaching out to them for this has been unsuccessful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you consider the solution of atomic-wait? They use libc++.

https://github.com/m-ou-se/atomic-wait/blob/main/src/macos.rs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But these only exist starting with macOS 11, which is way above our minimum of 10.12, so the ulock fallback would be needed regardless.

Copy link
Contributor

@madsmtm madsmtm Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sets precedent for using private APIs in the standard library (our current use of weak! is only used for symbols that are available in newer versions), which I think is a fairly big deal, and should perhaps be discussed more broadly (in a separate issue? (that could be FCP'ed?)).

Somewhat related is #114186. Tagging in particular @thomcc and @workingjubilee, as they seem to have been involved in this kind of stuff before?


At the very least, we should use the dlsym! macro explicitly here, to make it very clear that we're not weakly linking the symbol (which would be very visible in the binary), but actually loading it at runtime, and thereby evading the App Store's checks (dlsym itself seems to be allowed).

I believe this is sufficient such that we do not need to do raw syscalls (which I remember to have caused problems for Go in the past since as the ABI isn't stable (?)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we shouldn't be calling this directly, it'll trip the app store static analysis.

wait(UL_COMPARE_AND_WAIT | ULF_NO_ERRNO, addr, value, 0);
}
} else {
panic!("your system is below the minimum supported version of Rust");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These panics also get to be deleted if raw syscalls are used :)

Copy link
Contributor

@madsmtm madsmtm Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do keep the panics, I'd go for an abort here instead, this is not recoverable in any way and avoiding the overhead from unwind landing pads would be nice.

@bors
Copy link
Contributor

bors commented Mar 13, 2024

☔ The latest upstream changes (presumably #122423) made this pull request unmergeable. Please resolve the merge conflicts.

@joboet joboet added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 8, 2024
bors added a commit to rust-lang/miri that referenced this pull request Jul 14, 2024
RalfJung pushed a commit to RalfJung/rust that referenced this pull request Jul 14, 2024
Copy link
Contributor

@madsmtm madsmtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, nice work, this will be a great perf improvement!

I know it is a draft, but once you're ready, it'd be nice to split the Mutex changes apart from the futex changes, if possible.

@rustbot label O-apple

Comment on lines +12 to +15
target_os = "macos",
target_os = "ios",
target_os = "tvos",
target_os = "watchos",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use target_vendor = "apple", as that also includes visionOS. Also applies elsewhere.

//!
//! On Apple platforms, priority inheritance is the default for locks. To avoid
//! having to use pthread's mutex, which needs some tricks to work correctly, we
//! instead use `os_unfair_lock`, which is small, movable and supports priority-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

movable

Could you provide some documentation for this claim? The comment on os_unfair_lock states:

In Swift, note that use of the & operator on an unfair lock can copy or move the lock memory, leading to misbehavior. Use an OSAllocatedUnfairLock to safely wrap access to the lock memory instead. If you use os_unfair_lock APIs directly, always make sure to store and use the lock in memory with a stable address.

Which seems to hint at the lock not being movable? Or am I misunderstanding?

Copy link
Contributor

@madsmtm madsmtm Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found rust-lang/miri#3859, rust-lang/miri#3745 (comment) and rust-lang/miri#3745 (comment), but is any of that documented, and can it be relied on?

Also discussed in #131005.

Copy link
Member

@RalfJung RalfJung Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If std plans to actually move macOS locks then Miri should probably properly emulate whatever the macOS behavior is there -- but absent any documentation, it seems dubious to rely on them being movable? (Even more so if not even Swift relies on them being movable.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure they are movable. You initialize the lock with a constant #DEFINE and the size of the lock isn't even large enough to store a pointer, so all the operations just read/write into the 32 bits available. There are no init/deinit operations either. Based on the header, seems like the time it cares about the address is when you've locked it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that safe Rust can move a lock while it is held:

use std::mem;
use std::sync::Mutex;

fn main() {
    let m = Mutex::new(0);
    mem::forget(m.lock());
    // Move the lock while it is "held" (really: leaked)
    let m2 = m;
    // Now try to acquire the lock again.
    let _guard = m2.lock();
}

We better be sure that this deadlocks (and will deadlock in all future macOS versions) before committing to this implementation strategy.

Comment on lines +150 to +154
/// See os/os_sync_wait_on_address.h (Apple has failed to upload the documentation
/// to their website) for documentation of the public API and
/// https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b9f69dbd3c8c3fd30a/bsd/sys/ulock.h#L69
/// for the header file of the private API, along with its usage in libpthread
/// https://github.com/apple-oss-distributions/libpthread/blob/d8c4e3c212553d3e0f5d76bb7d45a8acd61302dc/src/pthread_cond.c#L463
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

library/std/src/sys/pal/unix/futex.rs Show resolved Hide resolved
Comment on lines +268 to +285
if let Some(wake) = os_sync_wake_by_address_any.get() {
unsafe { wake(addr, size_of::<u32>(), OS_SYNC_WAKE_BY_ADDRESS_NONE) == 0 }
} else if let Some(wake) = __ulock_wake.get() {
// __ulock_wake can get interrupted, so retry until either waking up a
// waiter or failing because there are no waiters (ENOENT).
loop {
let r = unsafe { wake(UL_COMPARE_AND_WAIT | ULF_NO_ERRNO, addr, 0) };

if r >= 0 {
return true;
} else {
match -r {
libc::ENOENT => return false,
libc::EINTR => continue,
err => panic!("__ulock_wake failed: {}", Error::from_raw_os_error(err)),
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, are you sure this is correct? The implementation of os_sync_wake_by_address_any seems to just call __ulock_wake without a loop? So either we need to loop both places, or in none of them?

Same goes for futex_wake.

@rustbot rustbot added O-apple Operating system: Apple (macOS, iOS, tvOS, visionOS, watchOS) labels Sep 26, 2024
@joboet joboet added the I-libs-nominated Nominated for discussion during a libs team meeting. label Oct 10, 2024
@Dylan-DPC Dylan-DPC added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-blocked Status: Blocked on something else such as an RFC or other implementation work. labels Oct 14, 2024
@the8472
Copy link
Member

the8472 commented Oct 23, 2024

We discussed this during today's T-libs meeting (albeit with low attendance). We might be willing to go along with the changes on the theory that apps are validated on current OS versions and will use current stable APIs and so shouldn't cause issues and that the fallback APIs have been available for a long time and so won't suddenly disappear on older versions.

But in previous cases where we've started to rely on internal APIs we've leaned on platform experts and their investigations how risky that would be. So we'd like to do the same here. @rustbot ping apple

@rustbot
Copy link
Collaborator

rustbot commented Oct 23, 2024

Hey Apple notification group! This issue or PR could use some Apple-specific
guidance. Could one of you weigh in? Thanks <3

(In case it's useful, here are some instructions for tackling these sorts of
issues).

cc @BlackHoleFox @hkratz @inflation @madsmtm @nvzqz @shepmaster @thomcc

@the8472
Copy link
Member

the8472 commented Oct 23, 2024

This sets precedent for using private APIs in the standard library (our current use of weak! is only used for symbols that are available in newer versions), which I think is a fairly big deal, and should perhaps be discussed more broadly (in a separate issue? (that could be FCP'ed?)).

We do use internal APIs on windows (NtCreateFile), but that became necessary to fix a CVE and was used after looking at the microsoft ecosystem official projects. And windows is a more open platform in a way (hah) since they don't review most software. So it's not fully comparable.

@madsmtm
Copy link
Contributor

madsmtm commented Oct 23, 2024

on the theory that apps are validated on current OS versions and will use current stable APIs and so shouldn't cause issues and that the fallback APIs have been available for a long time and so won't suddenly disappear on older versions.

Just to be clear, the problematic API here is not so much the newly added APIs (os_sync_wait_on_address etc.), but the fallback implementation for older OS'es (__ulock_wait) that might trip Apple's analysis, and make it very hard for Rust users to submit their application to the App Store.


It would be nice if someone could try to submit an application for review in the App Store that uses this branch of rustc, and see if Apple's review picks up on the usage of __ulock_wait via. dlsym or not, that would appease most of my worries here.

@joshtriplett joshtriplett added the relnotes Marks issues that should be documented in the release notes of the next release. label Nov 6, 2024
@m-ou-se m-ou-se added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed I-libs-nominated Nominated for discussion during a libs team meeting. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Nov 20, 2024
const OS_UNFAIR_LOCK_INIT: os_unfair_lock = os_unfair_lock { _opaque: 0 };

extern "C" {
fn os_unfair_lock_lock(lock: *mut os_unfair_lock);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A blog from firefox states that os_unfair_lock has poor performance: https://hacks.mozilla.org/2022/10/improving-firefox-responsiveness-on-macos/

Is it still true?

(I do not deny that os_unfair_lock is better than current pthread implementation.)

Grimeh added a commit to Grimeh/atomic-wait that referenced this pull request Dec 1, 2024
Shamelessly stolen from pending PR: rust-lang/rust#122408
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-atomic Area: Atomics, barriers, and sync primitives O-apple Operating system: Apple (macOS, iOS, tvOS, visionOS, watchOS) O-unix Operating system: Unix-like relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.