-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLS12-381 Crypto Primitives #1071
Conversation
@@ -338,6 +338,8 @@ struct controller_impl { | |||
set_activation_handler<builtin_protocol_feature_t::get_code_hash>(); | |||
set_activation_handler<builtin_protocol_feature_t::get_block_num>(); | |||
set_activation_handler<builtin_protocol_feature_t::crypto_primitives>(); | |||
set_activation_handler<builtin_protocol_feature_t::bls_primitives>(); | |||
bls12_381::init(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the correct place for the library's init
function? The call to init
only needs to happen once when nodeos
starts. It is a live dispatcher that checks if adc
and bmi2
cpu features are available and if so sets the faster asm routines.
} ) | ||
( builtin_protocol_feature_t::bls_primitives, builtin_protocol_feature_spec{ | ||
"BLS_PRIMITIVES", | ||
fc::variant("2c302889ce2db124878f7c6092cf27519d450559ed2fdfbad14f532d90fc5139").as<digest_type>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This SHA256 hash value of the comment message below is probably incorrect. I haven't been able to reproduce any of the other hashes, so I assume the one I generated is probably wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SHA256 hash includes the trailing newline.
For example, GET_BLOCK_NUM, hash the string between the /**/, quoted below, including the trailing newline but not the newline before "Builtin".
"Builtin protocol feature: GET_BLOCK_NUM
Enables new `get_block_num` intrinsic which returns the current block number.
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So yours would be 01969c44de35999b924095ae7f50081a7f274409fdbccb9fc54fa7836c76089c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Kevin. Thank you very much for clarifying that. I tried a few different things but not that. I am now able to reproduce all the hashes.
Corrected the hash value.
sv.reserve(n); | ||
for(uint32_t i = 0; i < n; i++) | ||
{ | ||
bls12_381::g1 p = bls12_381::g1::fromJacobianBytesLE({reinterpret_cast<const uint8_t*>(points.data() + i*144), 144}, false, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before using the span<>
s passed as arguments, need to validate that the size of the span is what is required.
Consider, for example, a contract currently with 64KB of memory calling bls_g1_exp(64KB-8, 4, ..., n=1, ...)
. The host function validation wrappers will allow this call because the points
span is 4 bytes long and within the 0 to 64KB-1 valid range of memory. However, bls12_381::g1::fromJacobianBytesLE({reinterpret_cast<const uint8_t*>(points.data() + i*144), 144}
then goes on to use 144 bytes starting from 64KB-8. This will access invalid WASM memory leading to undefined behavior (what will likely happen in this case and at this point of time is a recovered SEGV which leaks the memory allocated by the std::vector<>::reserve()
above).
Host functions must ensure they only access memory within the passed span<>
ranges. So instead, need to validate that points.size() == n*144
. Consider, for example, this host function
leap/libraries/chain/webassembly/crypto.cpp
Lines 119 to 126 in ba919c7
int32_t interface::alt_bn128_add(span<const char> op1, span<const char> op2, span<char> result ) const { | |
if (op1.size() != 64 || op2.size() != 64 || result.size() < 64 || | |
bn256::g1_add(std::span<const uint8_t, 64>{(const uint8_t*)op1.data(), 64}, | |
std::span<const uint8_t, 64>{(const uint8_t*)op2.data(), 64}, | |
std::span<uint8_t, 64>{ (uint8_t*)result.data(), 64}) == -1) | |
return return_code::failure; | |
return return_code::success; | |
} |
notice how if
op1
is not 64 bytes an error is returned. Since bls_g1_exp()
does not return anything, the only reasonable option for bls_g1_exp
may be to assert that points.size() == n*144
(i.e. fail the transaction).
This same guidance applies to result
as well. Notice that the above alt_bn128_add
example checks that result
is at least 64 bytes. That may not be ideal and may have been an oversight in the original implementation -- I would suggest enforcing that the result
here is exactly 144 bytes.
(this feedback applies to all the added host functions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation and example scenario. I was expecting the execution to fail in those cases but didn't consider potential memory leaks. A pretty big deal, especially for network nodes that run indefinitely. Of course, these range checks are very important.
Updates in my most recent commit(s) include:
- added checks for all
span<>
ranges - make use of
return_code::success
/return_code::failure
as in BN host functions - also see related changes in
cdt
(proper error propagation back to smart contract and then handling witheosio::check
, again, exactly as in BN host functions) - fixed indentation of tabs (3 ws instead of 4)
bls12_381::g2 p_g2 = bls12_381::g2::fromJacobianBytesLE({reinterpret_cast<const uint8_t*>(g2_points.data() + i*288), 288}, false, true); | ||
bls12_381::pairing::add_pair(v, p_g1, p_g2); | ||
} | ||
bls12_381::fp12 r = bls12_381::pairing::calculate(v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whenever seeing a loop in a host function, need to be mindful on if a malicious contract can cause the loop to take "too long".
In this case, how long does bls_pairing
take if n = 80000 (I believe the approximate realistic maximum currently)?
We don't have hard rules, but historical guidance is that a host function should "yield" (allow the deadline timer to cancel the transaction) at least once per millisecond. But this is a manual yield the host function must perform. As an example,
leap/libraries/chain/webassembly/crypto.cpp
Lines 216 to 227 in ba919c7
void interface::sha3( span<const char> input, span<char> output, int32_t keccak ) const { | |
bool _keccak = keccak == 1; | |
const size_t bs = eosio::chain::config::hashing_checktime_block_size; | |
const char* data = input.data(); | |
uint32_t datalen = input.size(); | |
fc::sha3::encoder enc; | |
while ( datalen > bs ) { | |
enc.write( data, bs); | |
data += bs; | |
datalen -= bs; | |
context.trx_context.checktime(); | |
} |
see how every
eosio::chain::config::hashing_checktime_block_size
worth of input to hash will call context.trx_context.checktime();
And, general historical guidance is that host functions with conditions that fail to yield in 5+ms would be considered a security defect and must be fixed.
So, we may need to add context.trx_context.checktime()
calls in this loop depending on how long it takes to perform these operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, thanks for taking the time explaining everything so well. I really appreciate it. I already noticed calls to context.trx_context.checktime()
here and there, but forgot to ask about it.
There are three host functions with loops. According to your rule of thumb (approx. one check per ms) I did some time measuring on my machine to get an idea on how long each one of the loop iterations run. Here the results and the reasoning behind the numbers I chose for the checktime()
intervals:
g1_exp
:155ns
per loop iter=> times 10 = 1.55ms
g2_exp
:260ns
per loop iter=> times 6 = 1.56ms
pairng
:395ns
per loop iter=> times 4 = 1.58ms
Assuming significantly better HW on BP nodes I thought aiming for 1.5ms
on my machine should be fine. Also, in case of pairing
, having at least 3 pairs going through without checktime()
overhead would be a very common case for smart contracts that perform groth16 proof verifications (fwiw).
Updates in the most recent commit(s) include:
- added
context.trx_context.checktime()
calls in reasonable intervals
if(i%4 == 0) | ||
context.trx_context.checktime(); | ||
} | ||
bls12_381::fp12 r = bls12_381::pairing::calculate(v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of v.size() == 80000
how long does bls12_381::pairing::calculate()
take?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my machine about 20 seconds. I was thinking about it after I read you previous comment: Does that mean, we need to be able to pass the yield()
function to pairing::calculate()
and execute it there in reasonable intervals as well? (Edit: probably not, since now with the checktime()
inside the loop it wouldn't be possible to add that many pairs to v
anymore)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since now with the
checktime()
inside the loop it wouldn't be possible to add that many pairs tov
anymore
yeah, good point. But is there some number that takes, for example, 25ms to make it through the loop, but then takes 100ms to make it through bls12_381::pairing::calculate()
? That too would be considered a security defect. I'm not sure of the timing for these operations so maybe it's a non issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. So I did some measuring again:
It takes v.size() == 58
pairs in order for the entire loop to take 25ms
(on my machine). The call to pairing::calculate(v)
then takes about an additional 14.7ms
.
So the pairing is about half of the cost of the loop. I'm not exactly sure how those numbers would vary on a faster machine but I guess it would be a similar ratio between loop execution and paring calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's "only" ~10ms but I think we should be prudent and protect against that -- it definitely would fail previous guidelines we had back in the bug bounty days.
Since we control the library, the best choice is to pass in the ability for bls12_381::pairing::calculate()
to poll a yield function. An example of that is,
leap/libraries/chain/webassembly/crypto.cpp
Lines 137 to 139 in 7c0694b
int32_t interface::alt_bn128_pair(span<const char> g1_g2_pairs) const { | |
auto checktime = [this]() { context.trx_context.checktime(); }; | |
auto res = bn256::pairing_check({(const uint8_t*)g1_g2_pairs.data(), g1_g2_pairs.size()} , checktime); |
where the declaration of that is
/// pairing_check calculates the Optimal Ate pairing for a set of points.
/// @param marshaled_g1g2_pairs marshaled g1 g2 pair sequence
/// @return -1 for unmarshal error, 0 for unsuccessful pairing and 1 for successful pairing
int32_t pairing_check(std::span<const uint8_t> marshaled_g1g2_pairs, std::function<void()> yield);
if you wanted to make the yield optional for users, you could potentially do something like
int32_t pairing_check(std::span<const uint8_t> marshaled_g1g2_pairs, std::function<void()> yield = std::function<void()>());
or just require users to pass in a stub function. Either way is fine; importantly either way ultimately still allows the library to used outside of leap.
(there is another option: add a subjective limit to the size of n
; for example
leap/libraries/chain/webassembly/crypto.cpp
Lines 146 to 156 in 7c0694b
int32_t interface::mod_exp(span<const char> base, | |
span<const char> exp, | |
span<const char> modulus, | |
span<char> out) const { | |
if (context.control.is_speculative_block()) { | |
unsigned int base_modulus_size = std::max(base.size(), modulus.size()); | |
if (base_modulus_size < exp.size()) { | |
EOS_THROW(subjective_block_production_exception, | |
"mod_exp restriction: exponent bit size cannot exceed bit size of either base or modulus"); | |
} |
but I can't find any compelling reason to pick this choice due to our control of the bls lib)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Since there is a leap
branch of the library now, let's try to tailor it perfectly for leap integration.
I added a yield
function pointer parameter to pairing::calculate()
(pairing::miller_loop()
respectively) and g2::multiExp()
since this one could also run for about 15ms
in the 25ms
-loop worst case. Here are the numbers for all three functions (my machine):
- G1 multi-exponentiation: with
v.size() == 161
the loop takes25ms
=>g1::multiexp(v)
then takes only5.1ms
- G2 multi-exponentiation: with
v.size() == 96
the loop takes25ms
=>g2::multiexp(v)
then takes15.3ms
- Pairing: with
v.size() == 58
the loop takes25ms
=>pairing::calculate(v)
then takes14.7ms
The timings of g1::multiExp()
should be fine, I guess, since 25ms + 5.1ms ≅ 30ms
. So if 5ms
to 7ms
is an acceptable 'overhead' for the execution time of either pairing::calculate()
or g1/g2::multiExp()
after their corresponding loops (assuming the worst case vector sizes listed above with 25ms
preceding loop execution time) then only one yield()
call approximately in the middle of execution of pairing
and multiexp
should be fine, right? Worst case scenario is then about 25ms + ~7ms ≅ 32ms
total execution time in case the transaction fails.
It was a bit tricky though, to integrate the yield()
call accurately into the middle of the pairing
and mutliexp
functions because there are multiple loops (including nested loops) of which some depend on v.size()
. However, I believe I found a good way to add it approximately in the middle of each function's execution path, so that the time check is performed once if v
exceeds a certain size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mschoenebeck: I think we should pass the yield function into g1::multiExp()
as well for consistency.
We should not assume that there will always be a limit of 30 ms in a transaction. For example, we are proposing that the network upgrade to a higher max transaction time. With a max transaction time of 150 ms, the user could pass a 5x larger array and thus increase the time of the g1::multiexp
function call by 5x as well.
Additionally, I think the yield functions should be called frequently enough within the implementation so that the host functions cannot be abused to exceed the deadline by more than 5 ms even if we allow the transaction to execute indefinitely. We are proposing to raise the limit to 150 ms. We may push it to 250 ms in future. But in the longer term future, we may be able to take advantage of parallelism to even allow a transaction to execute for longer than even a block interval! And even if that does not happen, we may do that with read-only transactions which could potentially call these host functions. And we need to ensure read-only transactions respect our deadlines as well to maintain availability of the various features of the system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added yield()
call to g1::multiexp
. Within the implementation yield()
is called inside the loop that reads the parameters passed to the host function every 10 (g1
) or 6 (g2
) iterations. That should be frequent enough taking the above numbers into consideration, right?
…bled checks in 'pairing' and 'map' functions
regenerate deepmind log for new BLS_PRIMITIVES
|
I have one question about the correct place to link the bls library. I noticed that I linked the bls lib to |
BOOST_CHECK_EQUAL(p.inCorrectSubgroup(), false); | ||
g2 p_res = g2::fromJacobianBytesBE(hexToBytes<288>("0x158a2a1e3ce68c49f9795908aa3779c6919ed5de5cbcd1d2a331d0742d1eb3cb28014006b5f686204adb5fdca73aea570ee0f0d58880907c8de5867dd99b6b7306b2c3de4a1537e6d042f2b8e44c8086853728cc246726016b0fcf993db3d759005f8ac0cb55113c857c5cf3f83d9b624ce9a2a0a00a1206777cf935721c857b322a611ed0703cf3e922bfb8b19a1f5e10a341b2191ab5a15d35f69850d2adb633e5425eecb7f38dd486a95b3f74d60f3ee6cf692b3c76813407710630763f7605b3828c19203f661732a02f7f546ab354694128bbe5a792a9db4a443c0fe10af0df2bc1b8d07aee99bd6f8c6b26847011aa31634f42f722d52022c736369db470576687fdf819cf15a0db4c01a0bd7028ee17cefdf6d66557d47fb725b6d00f"), false, true).value(); | ||
BOOST_CHECK_EQUAL(p.equal(p_res), true); | ||
} FC_LOG_AND_RETHROW(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests of garbage input are good, but I would like to see unit tests that directly test the code paths of the host functions integrated into Leap to run with its unit tests. It should test for: these garbage cases; failure cases that return -1; and at least one successful case for each function (taken from the libraries test vectors).
This will require adding a new test contract.
See how it was done for the alt_bn_128 host functions: https://github.com/eosnetworkfoundation/mandel/pull/316/files#diff-87c8848613f6d5f711354edd61feeebcaae9b6acd4f5f16ca651075b989dc929
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying that. I added a test contract bls_primitives_test
to test all host functions. All tests pass except for one: For some reason the test for pairing
fails on eos-vm-oc
with a segmentation fault. I have no idea why. The same test passes for eos-vm
and eos-vm-jit
. I am not exactly sure how to debug this though. Any ideas what could go wrong here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that failure is because the host function is named bls_pairing
which doesn't match,
leap/libraries/chain/include/eosio/chain/webassembly/eos-vm-oc/intrinsic_mapping.hpp
Line 277 in 712b1a0
"env.bls_g1_pairing", |
Fixing the entry in intrinsic_mapping.hpp will resolve it.
(in 2.0 this mismatch would have ended up as a compile error, we lost that somewhere between 2.0 & 3.1 -- #621)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Matt. Fixed in: 59cc737
I synced with main. However, I had to resolve a merge conflict in |
Quick update: I was able to resolve the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good to me. There are some small tweaks the ENF will do including getting the CI to fully pass. But we are good as it is to merge this into the bls_integraton
branch.
Added library mschoenebeck/bls12-381 as a submodule. For
x86_64
there are high-speed asm implementations of all low-level arithmetic functions available, which optionally make use ofadc
andbmi2
CPU extensions (if available on host machine, live dispatcher viainit
call).Added the following types to
leap
:Added the following host functions to
leap
(according to https://eips.ethereum.org/EIPS/eip-2537):Check out mschoenebeck/aggsigtest for a sample contract using those new host functions.