-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use_file: Remove use of spin-locks #125
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, as the open() syscall will never block when opening a device
This is better, but, in theory, and if you are really unlucky, can still lead to a priority inversion, if the thread is preempted. Non-blocking massively reduces chances of preemption, but does not make them zero. Moreover, we are doing a syscall and I have vague recollections that the kernel uses syscalls as a chance to check if the quant is exhausted, so preemtion probability might actually be higher than in the uniformly distributed case. So, this is likely ok, but not almost surely ok :)
I agree, after looking over the code, I think removing spinning entirely is the best bet (and also makes the code more readable). In the very rare case we race on the |
It just occurred to me that if we are using the open syscall here then we ... can just call pthread_lock as well? |
I looked into doing that originally, and then again when I was fixing this issue. It seems much more complicated to do than the current code. There are a bunch of pthread methods you have to call (or at least the stdlib calls them, I'm not sure the complexity (vs this PR) is worth it. |
stdlib needs to guard agains reentrant locking, which we don't have to, so the implementation is pretty straight forward (modulo the lack of I like it because it doges the other edge case -- unbounded number of concurrently opened file descriptors (not that my assertion from the blog that an init functions would be run at most #cores times is wrong). |
Well, which I think we don't need to do, for the following reasons:
|
I was just going to mention this. File-descriptors are a finite resource on most OSes, and I'd be more comfortable with a solution that does not risk exhausting them in pathological cases. |
// numbers. The file will be opened exactly once. All successful calls will | ||
// return the same file descriptor. This file descriptor is never closed. | ||
fn get_rng_fd() -> Result<libc::c_int, Error> { | ||
static FD: AtomicUsize = AtomicUsize::new(LazyUsize::UNINIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how often this needs to happen, or whether this happening is worth optimizing, but a different way to implement this is to do something like this to keep the common path branchless: https://github.com/calebzulawski/multiversion/pull/6/files#diff-0a878480aac95b54dd822d02c4ad345eR76
cc @TethysSvensson - it might be a thing worth attempting in a subsequent PR after an approach that works "correctly" gets merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnzlbg The reason why that trick works in multiversion is, that the value we are trying to lazily compute is already a function pointer. This means that the best-case performance is always going to involve at least 1 indirect function call. The trick makes sure that the common path does not pay anything in addition to that 1 indirect function.
Here it looks like you are trying to lazily compute a file descriptor. As far as I can tell, the current cost in the common path is a single conditional branch on a relaxed load. This appears to me to be cheaper than alternative cost of 1 indirect function call on a relaxed load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have one suggestion though, assuming that you would ever want to inline get_rng_fd
. I think in that case, you would want to inline the initial check, while keeping the slow initializer un-inlined. I am imagining something like this:
#[inline]
fn get_rng_fd() -> Result<libc::c_int, Error> {
static FD: AtomicUsize = AtomicUsize::new(LazyUsize::UNINIT);
if let Some(fd) = get_fd() {
Ok(fd)
} else {
initialize_slow()
}
#[inline(always)]
fn get_fd() -> Option<libc::c_int> { ... }
#[inline(always)]
fn initialize_slow() -> Result<libc::c_int, Error> { ... }
}
However if I understand this code correctly, the I don't think this function is meant to be inlined(?), so in that case it does not really matter either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However if I understand this code correctly, the I don't think this function is meant to be inlined(?), so in that case it does not really matter either way.
This is correct, the only external function of this crate is getrandom::getrandom
which is not marked #[inline]
, so I don't think marking internal functions #[inline]
really does anything.
In fact it looks like (in release mode) the compiler just inlines everything in this file into use_file::getrandom_inner
. I checked with the current nightly build.
EDIT: here's the ASM when compiled with --release
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, noticed only a couple of unrelated nits.
Signed-off-by: Joe Richey <[email protected]>
Don't spin when polling /dev/random. We also remove the use of spin locks when opening the persistent fd for platforms that require it. For both these cases, we can just use the pthread lock/unlock methods in libc. This includes adding Mutex and DropGuard abstractions. Signed-off-by: Joe Richey <[email protected]>
We no longer use spin-locks anywhere in getrandom, so remove any interfaces which spin. Signed-off-by: Joe Richey <[email protected]>
@dhardy @newpavlov This PR is ready for review/merging. Comments have been addressed, and the CI is passing.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I have only one question and if @dhardy does not have any objections I will do the merge.
// before returning, making sure we don't violate the pthread_mutex_t API. | ||
static MUTEX: Mutex = Mutex::new(); | ||
unsafe { MUTEX.lock() }; | ||
let _guard = DropGuard(|| unsafe { MUTEX.unlock() }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it guaranteed that destructor will run right before function exit (be it return
or panic)? For some reason I thought that with NLL drop place can change depending on liveness analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NLL (and its future extensions w/ Polonious or what not) is just for Lifetimes. Drop
is still scope based, so the drop
method is always executed when the object leaves scope. As the guard is at function scope, it drop
s on return
or on panic
unwinding.
The liveness analysis is just used to determine if a set of borrows are permitted. This is why (modulo compiler bugs) NLL was introduced without it being a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question above, otherwise looks good to me.
} | ||
unsafe fn lock(&self) { | ||
let r = libc::pthread_mutex_lock(self.0.get()); | ||
debug_assert_eq!(r, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not check the return value in all builds? (Also unlock
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stdlib also uses debug_assert
, and man page specifically says that lock/unlock basically can't fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha. Closer reading of the possible error values shows that none should be applicable in this case, and if any were that would be a code error, so okay. (Assuming EAGAIN
is specific to recursive locks.)
|
||
unsafe impl Sync for Mutex {} | ||
|
||
struct DropGuard<F: FnMut()>(F); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised this isn't a part of the core
library!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Often times, this does not work in higher level code, because closure borrows the environment, and so you can't modify it. We only able to use it because we close over low-level file descriptor, which is Copy
.
To be clear, it is an oftentimes useful utility, but it is not as useful in practice as it could seem.
EDIT: you can, of course, make a guard smart-pointer with DerefMut (https://docs.rs/scopeguard/1.0.0/scopeguard/#scope-guard-with-value), but that's a more complex API with some design choices.
Remove the general purpose spin-lock from getrandom, and don't spin
when polling /dev/random. We also remove the use of spin locks when
opening the persistent fd for platforms that require it.
For both these cases, we can just use the pthread lock/unlock methods
in libc. We also do some minor cleanup to better make use of
Result
types and
DropGuard
s.Essentially, this change does the "standard" Mutex based synchronization approach, it just has to do it without
libstd
. With this change, we continue to have the property thatgetrandom
uses at most one file descriptor per process.EDIT: Updated description to indicate that we are removing all uses of spin locks.
Thanks to @matklad for pointing this out. See his blog post about
getrandom
and matklad/once_cell#61.Signed-off-by: Joe Richey [email protected]