Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always enforce necessary checks when deserializing points (and other improvements in library) #17

Closed
arhag opened this issue Nov 2, 2023 · 0 comments · Fixed by #16
Closed
Assignees

Comments

@arhag
Copy link
Member

arhag commented Nov 2, 2023

The functions that convert from serialized input (whether in Jacobian or Affine or Compressed form and whether big or little endian encoding) to a g1 or g2 should always ensure that all the prime field elements are less than the modulus. If any of the field elements are greater than or equal to the modulus, the deserialization should fail by returning an empty optional.

This change guarantees that the critical precondition (that the field elements are less than the modulus) is satisfied for all arithmetic functions that work on field elements (with two exceptions that will be discussed later). The function implementations do not have well-specified behavior if that precondition is not true (actually they should have well-specified behavior if the product of the two operands of a binary add/subtract/multiply is less than the square of the modulus). This means they could give different results than other implementations or even can give different results over time (as the implementation is changed or optimized), which could be a source of a consensus failure given that this library is intended to be used in blockchain environments.

In fact, we already saw this issue in practice with this library where on some inputs outside the valid mathematical domain for these functions, the different implementations (x86 ASM vs C++ for the low-level arithmetic operations like _mul) gave different results. This was caught in the "garbage tests" (called because they are intentionally providing garbage input to the host functions to test that they give back the same consistent garbage output) of Leap (AntelopeIO/leap#1701) when build on ARM. Note that it was not ARM that was the issue; it was the fact that the garbage output in the test was generated from the x86 ASM implementation but did not match the garbage output from the C++ implementation (regardless of which target it was compiled for).

The code should be updated to also explicitly reference this critical precondition for the functions that require it for correct (well-specified) behavior. This could be through comments, but ideally would also be through an assert that is not included when compiling in release mode.

A test case should be added to verify that deserializing points that do not satisfy the critical precondition for any of their field elements should result in the empty optional being returned.

I mentioned above that there were two exceptions to the claim that if the critical precondition was satisfied for field elements generated from the deserialization functions then the critical precondition would be satisfied for all arithmetic functions that work on field elements.

The first exception is for the calls to _mul within fp2::square and fp2::squareAssign. This means that unless the implementation of fp2::square and fp2::squareAssign is changed, we cannot have the critical precondition enforced on the _mul function. We could state that the critical precondition holds for the output of the _mul function, but we would need a relaxed precondition for the inputs to the _mul function that still guarantees that _mul will produce correct results that are well-specified. Perhaps that relaxed precondition is that the product of the operands is less than the square of the modulus. But even so, it needs to be proven (and also demonstrated through tests) that the _mul calls within fp2::square and fp2::squareAssign are always called satisfying that relaxed precondition. If this is not the case, then the implementations must be changed to guarantee that.

The second exception is within fp::inverse. I haven't verified to see if this is not possible for some reason, but it appears that the first _double within the for loop could be called on a u that is not less than the modulus. This should be investigated to see if it is the case and generally to audit the correctness of the inverse implementation for edge cases (and that it works as expected regardless of the field elements being in Montgomery form). If it is true, then one option to resolve the issue is that the implementation of fp::inverse needs a change so that the case can be checked for at runtime (just prior to the "Phase 2" comment) and corrected if necessary so that the critical precondition is satisfied for the _double function. Another option would be to demonstrate that the relaxed precondition (that the two times the input to double has to be less than the square of the modulus) is sufficient to ensure the behavior of _double is well defined and that it will always output a result that satisfies the critical precondition; and in that case no changes would be necessary to the fp::inverse function but the comment (and assert) in the _double function would have to be adjusted to only require the relaxed precondition.

There is an additional precondition that is desirable for g1 and g2 which is that the points are actually on the elliptic curve. This is something that the deserialization functions will verify if the check parameter is set to true. Unlike the critical precondition that the field elements are all less than the modulus, we do not want to always do this isOnCurve validation because it is a more expensive operation. So we want that to still only be conditionally validated if check == true. However, the default values for the check parameters of the various functions should be true.

There is less concern about this precondition being satisfied for operands provided to the various operations implemented in this library. It is true that if one or more of the operands to an operation like adding g1 points do not satisfy isOnCurve() then, generally, their sum would not satisfy isOnCurve() either. So the functions would be operating with operators that are not with the mathematically valid domain of the operator in which things are not very well-specified, and garbage input would still give garbage output. But there is significantly lower risk of different implementations giving different garbage output for the same garbage input, or it changing with time, given the nature of these operations. So we may wish to document that the relevant functions guarantee the post-condition that the resulting point is on the curve conditional on the input points also being on the curve. But at this time, there is not a requirement to make it a precondition for such functions (and particularly not one enforced with asserts) that all the input points be on the curve.

However, we should add test cases to this repository (and ensure they run for all three variations of the implementation: pure C++; x86 with BMI2 and ADX extensions; and x86 without extensions) that: use arbitrary g1 and g2 instances in which the field elements are all less than the modulus but the g1 and g2 instances do not satisfy isOnCurve(); do math operations (e.g. adding, scaling, weighted sum) on them; and then compare the outputs to hard-coded expected results. This acts as the safeguard to a changing implementation that the garbage tests in Leap were intended to act as. The appropriate place to keep those tests is in this library, not necessarily in Leap.

Some other improvements are recommended as part of this issue:

  1. Rename the following member functions for all the field element types: sub -> subtract; subAssign -> subtractAssign; mul -> multiply (there is no need to rename the mulBy* functions); mulAssign -> multiplyAssign; neg -> negate; conj -> conjugate. Also, rename sub to subtract, and neg to negate in g1 and g2.

  2. Add explicit to the constructor of fp, fp2, fp6, fp12, g1, and g2 that takes an std::array. Consider making the constructor of fp that takes an array private (and adding fp2, g1, and g2 as friends of the class); if this complicates things too much then don't worry about making the constructor private, but either way add a comment that it must be constructed with a value that is less than the modulus.

  3. Remove fp2::ladd and fp2::ldouble since they are not used anywhere. Also, consider removing _laddAssign since _ladd could be used to do the same thing just as efficiently. (Similar argument applies to removing _addAssign and _subAssign and for replacing _lsubAssign with the more general _lsub.) Also consider removing _doubleAssign since _double could be used to do the same thing just as efficiently. In fact, we could even consider replacing usage of _double with _add and replace _ldouble with _ladd, but it is best not to since it won't be as fast due to _add (or _ladd) having a set of add and addc instructions that use indirect addressing when the corresponding set of instructions in _double (or _ldouble) can avoid the indirect addressing.

  4. For consistency, add the following member functions to fp: add, addAssign, dbl, doubleAssign, subtract, subtractAssign, negate, multiply, multiplyAssign, square, and squareAssign. For all the field types consider implement the arithmetic logic in the *Assign function and then implementing the base operator in terms of its corresponding *Assign function by copying the first operand rather than starting with an default constructed field element. For example, add can be implemented in terms of addAssign and multiply can be implemented in terms of multiplyAssign. This reduces code duplication and ensures consistency of the various implementations.

  5. The implementation of fp::exp should be changed to use squareAssign and multiplyAssign instead of using _mul directly. The implementation of fp2::exp should be changed to use z.squareAssign(); instead of z = z.square(); and z.multiplyAssign(*this); instead of z = z.mul(*this);. The implementation of fp6::exp, fp12::exp, and fp12::cyclotomicExp should be changed to use z.multiplyAssign(*this); instead of z = z.mul(*this);.

  6. Rename g1::mulScalar to g1::scale and g2::mulScalar to g2::scale.

  7. Rename g1::multiExp to g1::weightedSum and g2::multiExp to g2::weightedSum.

  8. Consider adding addAssign, subtractAssign, and doubleAssign member functions to g1 and g2 and then reimplementing add, subtract, and dbl member functions in terms of their corresponding assign versions. Then, the implementations of g1::weightedSum and g2::weightedSum could use addAssign and doubleAssign instead of copying the temporary.

  9. For g1::weightedSum and g2::weightedSum, do not return an optional. To make this change, the specification of the functions should be changed such that they effectively fill in any missing points with the zero point and fill in any missing scalars with the scalar zero. Effectively, this means that instead of using scalars.size() as the bounds of the for loop in the implementation, the implementation should use std::min(points.size(), scalars.size()).

  10. For g1::weightedSum and g2::weightedSum, consider an implementation that avoids using floating point numbers. This can be a source of non-determinism which is concerning given the intended use of this library within blockchain applications. The only use of floating point in the implementation is to calculate the log10 of the size of the provided vectors and even that is only if the size is greater than or equal to 32. One simple solution is to only allow the function to be called with vectors of size less than 32. This can be a precondition of the function and if it is not satisfied, the function could just return the zero point. Users of this library, e.g. Leap, would be required to respect the precondition. For example, the Leap host function that uses weightedSum could return an error if the caller of the host function provides too many points/scalars. Alternatively, it could satisfy the request for arbitrary sizes by repeatedly calling weightedSum on different chunks and then summing over all of the points. This approach also allows us to remove the yield parameter from weightedSum since the function call should be fast enough for vectors of size less than 32. It would be the responsibility of the caller, e.g. Leap, to call yield after calling weightedSum on each chunk. An alternative solution is allow weightedSum to work on arbitrarily sized vectors (up to say 2^32 size) but to use a fast software implementation of log10 (e.g. binary search over a precomputed table of powers of 10). With this approach, however, the implementation should do more fine-grained yield() calls to ensure that even for vectors of large size, not too much time (less than 1 ms) goes by between successive yield() calls.

  11. Add a few consistency tests on the field element (primarily fp, but if appropriate similar ones can be done for fp2, fp6, and fp12) operations:

    • Check that doubling x (via the dbl or doubleAssign member functions) is consistent with adding x to itself (via the add or addAssign member functions). Do this with field elements that are both less than half the modulus and greater than half the modulus.
    • Check that squaring x (via the square or squareAssign member functions) is consistent with multiplying x with itself (via the multiply or multiplyAssign member functions).
    • Check that using weightedSum (for both g1 and g2) is consistent with computing the result using scale and add calls. Do this at least for points and scalars vector sizes: 0, 1, 2, 7, 8, 9, 31, 32, 63, 64, 65, 511, 512, 513. Ideally, the test would construct the random points and scalars once for the maximum case and then reuse them for the smaller cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants