Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize multiplicative inverse #1111

Open
divergentdave opened this issue Aug 19, 2024 · 4 comments
Open

Optimize multiplicative inverse #1111

divergentdave opened this issue Aug 19, 2024 · 4 comments

Comments

@divergentdave
Copy link
Collaborator

divergentdave commented Aug 19, 2024

Multiplicative inverses show up in a few places in Prio3, as a correction for adding multiple shares of constants in circuits, and in NTT inverse operations to properly scale outputs. In each case, we take an ordinal number, project it into the field, and then invert it. (This is all likely more performance-relevant for smaller circuit sizes than for larger sizes, since multiplications for NTTs grow faster than any for inverses)

Where possible, we could hoist these inverses into circuit or gadget constructors, to precompute this inverse once and reuse it.

Otherwise, we could specialize inv() by using "addition chain exponentiation" instead of "square and multiply exponentiation" that the more-generic pow() does. The idea is that, instead of computing x, x^2, x^4, x^8, etc., and multiplying selected power of two powers together, we instead do an expensive search upfront for a good routine that computes a different set of intermediate values, using squares and multiplies, to get the same result. Since all our primes are chosen such that they are both close to a power of two, and are such that p-1 is very smooth, with many factors of 2, then the exponent we use for inversion via Euler's theorem, p-2, has a lot of ones in its binary expansion. This translates to lots of multiplies, and counting multiply operations would get us something just under double the bit width of the prime. All our primes are big enough that doing an exhaustive search for optimal addition chains is impractical, but https://github.com/mmcloughlin/addchain can find very good addition chains quickly. This would let us take the number of multiply operations per inverse from just under double the bit width to just over the bit width. (hat tip to Filippo's stream: https://www.twitch.tv/videos/2211473764)

Edit: it is hilarious, actually, that this got issue number 1111, because _1111 is the sort of temporary variable name you would commonly see in addition chain exponentiation routines

@cjpatton
Copy link
Collaborator

@armfazh came up with the following for his Go implementation:
https://github.com/cloudflare/circl/blob/main/vdaf/prio3/arith/fp64/inverse.go
https://github.com/cloudflare/circl/blob/main/vdaf/prio3/arith/fp128/inverse.go

These may have been generated by fiat-crypto, I'm not 100% sure.

@divergentdave
Copy link
Collaborator Author

Yup, those look like addition chain routines, so same idea. (It seems fiat-crypto doesn't use addition chain exponentiation, there are just a couple files related to it without any other references)

@armfazh
Copy link
Contributor

armfazh commented Jan 17, 2025

Correct, I've generated addition chains with @mmcloughlin library.
It's a one time search, so no need to run the search online.

On the other hand, fiat-crypto implements a method based on GCD, and that implementation may be faster than exponentiation, but at the expense of ensuring constant-time protections.

I have preference for the exponentiation method.

@armfazh
Copy link
Contributor

armfazh commented Jan 17, 2025

The optimizations I used in CIRCL for inverses are twofold:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants