-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize multiplicative inverse #1111
Comments
@armfazh came up with the following for his Go implementation: These may have been generated by fiat-crypto, I'm not 100% sure. |
Yup, those look like addition chain routines, so same idea. (It seems fiat-crypto doesn't use addition chain exponentiation, there are just a couple files related to it without any other references) |
Correct, I've generated addition chains with @mmcloughlin library. On the other hand, fiat-crypto implements a method based on GCD, and that implementation may be faster than exponentiation, but at the expense of ensuring constant-time protections. I have preference for the exponentiation method. |
The optimizations I used in CIRCL for inverses are twofold:
|
Multiplicative inverses show up in a few places in Prio3, as a correction for adding multiple shares of constants in circuits, and in NTT inverse operations to properly scale outputs. In each case, we take an ordinal number, project it into the field, and then invert it. (This is all likely more performance-relevant for smaller circuit sizes than for larger sizes, since multiplications for NTTs grow faster than any for inverses)
Where possible, we could hoist these inverses into circuit or gadget constructors, to precompute this inverse once and reuse it.
Otherwise, we could specialize
inv()
by using "addition chain exponentiation" instead of "square and multiply exponentiation" that the more-genericpow()
does. The idea is that, instead of computing x, x^2, x^4, x^8, etc., and multiplying selected power of two powers together, we instead do an expensive search upfront for a good routine that computes a different set of intermediate values, using squares and multiplies, to get the same result. Since all our primes are chosen such that they are both close to a power of two, and are such that p-1 is very smooth, with many factors of 2, then the exponent we use for inversion via Euler's theorem, p-2, has a lot of ones in its binary expansion. This translates to lots of multiplies, and counting multiply operations would get us something just under double the bit width of the prime. All our primes are big enough that doing an exhaustive search for optimal addition chains is impractical, but https://github.com/mmcloughlin/addchain can find very good addition chains quickly. This would let us take the number of multiply operations per inverse from just under double the bit width to just over the bit width. (hat tip to Filippo's stream: https://www.twitch.tv/videos/2211473764)Edit: it is hilarious, actually, that this got issue number 1111, because
_1111
is the sort of temporary variable name you would commonly see in addition chain exponentiation routinesThe text was updated successfully, but these errors were encountered: