Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign of PRG-related APIs #413

Open
divergentdave opened this issue Jan 13, 2023 · 2 comments
Open

Redesign of PRG-related APIs #413

divergentdave opened this issue Jan 13, 2023 · 2 comments

Comments

@divergentdave
Copy link
Collaborator

divergentdave commented Jan 13, 2023

While profiling IdpfPoplar, I've noticed significant contributions from memcpy of AES internal states and allocation and freeing of a Vec<u8> buffer. (memcpy took up 20% of a profile all-told. Allocation and freeing of the buffer mentioned below took 3%.) We could cut back on the CPU time needed for these overhead tasks with some API and internal changes, I think.

The PrgAes128 struct is 752 bytes. It wraps a Cmac<Aes128>, which consists of expanded round keys, a block for the CMAC state, and another block and u8 offset for buffering input. Looking at assembly listings, callers of PrgAes128::into_seed_stream() perform a memcpy of 752 bytes prior to the call, and then the listing of PrgAes128::into_seed_stream() contains another memcpy of 752 bytes. Similarly, calls to Prng::from_seed_stream(seed_stream) are preceded by a memcpy of 768 bytes. (the size of SeedStreamAes128)

I think we want to keep the semantics that into_seed_stream() consumes the PRG state. (which rules out changing it to take &mut self) We could box these RustCrypto types, which would make PrgAes128 and SeedStreamAes128 much smaller, and eliminate memcpys when destructuring the structs or passing them between functions.

Prng contains a SeedStream, a buffer, and some integers. The buffer is sized to hold 128 encoded field elements. For Field64, this is 1KiB. Poplar1 only ever needs two field elements from each seed input, so significant work goes to waste allocating, zeroing, and deallocating this. We could add a capacity argument to the constructors, to right-size this buffer. Prio3 may benefit from this change as well. (I think Prio3 instantiations use fewer field elements than Prio2 in typical use) Prng is a pub(crate) API, so we only need to update internal uses.

One extra optimization: I think we could replace Cmac<Aes128> with Cmac<Aes128Enc>, and Ctr64BE<Aes128> with Ctr64BE<Aes128Enc>, to avoid deriving and storing an extra set of expanded round keys. This was incorrect, Aes128, Aes128Enc, and Aes128Dec all hold the same set of round keys. They only differ in what ZST tokens they hold to control whether they implement encryption and decryption traits.

@cjpatton
Copy link
Collaborator

cjpatton commented Jan 13, 2023

Great analysis! However I would suggest holding off on further work on this for now.

For security reasons, in VDAF-04 we will replace PrgAes128 with an extensible output hash function, such as SHA-3 or KangarooTwelve: cfrg/draft-irtf-cfrg-vdaf#106.

One concern about this change is that it'll incur a performance penalty. The cost is negligible for Prio3 (see benchmarks reported in that issue), as the CPU time seems to be dominated by field arithmetic. It remains to be seen whether Poplar1 will be performant enough with SHA-3; but even if we want a PRG based on AES, we plan to move to some fixed-key mode: cfrg/draft-irtf-cfrg-vdaf#32.

@cjpatton
Copy link
Collaborator

cjpatton commented Jan 9, 2025

@divergentdave is this still relevant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants