-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve RDRAND implementation #24
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me except two small nitpicks.
@newpavlov this should be ready to merge, provided you're OK with an implementation that needs up to two calls to I have an alternative implementation that only calls |
Personally I prefer the alternative implementation. We even could drop If we'll decide to keep the current implementation, then you can make fn get_rand_unaligned(dest: &mut [u8]) -> Result<(), Error> {
for chunk in dest.chunks_mut(mem::size_of::<u64>()) {
let data = get_rand_u64()?;
let n = chunk.len();
chunk.copy_from_slice(&data.to_ne_bytes()[..n]);
}
Ok(())
} @dhardy |
Oh that's much nicer, I just updated this PR to use essentially that code. The generated assembly still gives what we expect.
I'm still trying to figure out if there is a way to get the alternative implementation working without as much unsafe code. I'll let you know my progress. |
So after looking at this some more, it’s probably not worth it to hyper-optimize this to maximize the number of aligned accesses. On x86 the same code gets emitted for aligned/unaligned accesses anyway (unaligned ones just happen to be slower). I’ll rewrite this to just use chunks_mut_exact, should make everything fairly short. Note: as most of the time this is being used to fill a freshly allocated 128/256 but key, everything will usually be properly aligned regardless. |
@dhardy @newpavlov this should be ready for final review/merging, see the updated PR description. |
This change makes a few improvements to be closer to Intel's recommendations
RETRY_LIMIT
is the one recommended by Intel.chunks_exact_mut
to elide most calls tomemcpy
. See generated assembly.unsafe
code except that used to invoke the RDRAND intrinsic.x86_64-fortanix-unknown-sgx
Stylistically, the code is now much easier to read. The file has also been renamed to
rdrand.rs
to better reflect what it does. This diff makes it easier to see what has actually changed.