Skip to content
This repository has been archived by the owner on Jul 14, 2023. It is now read-only.

Add host function calls for elliptic curve arithmetic to Polkadot #8

Closed
wants to merge 17 commits into from

Conversation

achimcc
Copy link

@achimcc achimcc commented Mar 13, 2023

No description provided.

@tomaka
Copy link
Contributor

tomaka commented Mar 13, 2023

Note that this repository is about Polkadot, not Substrate. The word "Substrate" should ideally not be mentioned at all.

Copy link

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as @tomaka said.

Also not sure we already need this. AFAIK this is still in experimental mode.

proposals/0000-arkworks-integration.md Outdated Show resolved Hide resolved
@achimcc achimcc changed the title Add host function calls for ark-substrate to Substrate repo Add host function calls akrwokrs to Polkadot Mar 14, 2023
@achimcc achimcc changed the title Add host function calls akrwokrs to Polkadot Add host function calls for arkworks to Polkadot Mar 14, 2023
@achimcc
Copy link
Author

achimcc commented Mar 14, 2023

Same as @tomaka said.

Also not sure we already need this. AFAIK this is still in experimental mode.

I removed all references to Substrate. I can also close the PR for now and re-open later?

@bkchr
Copy link

bkchr commented Mar 14, 2023

Same as @tomaka said.
Also not sure we already need this. AFAIK this is still in experimental mode.

I removed all references to Substrate. I can also close the PR for now and re-open later?

Are we 100% sure that we will need all these host functions and also that they will enable the functionality we want to have?

@achimcc
Copy link
Author

achimcc commented Mar 14, 2023

In theory, we could live without all of them. But every single host functions brings a performance improvement for an arithmetic operation on elliptic curve points, between 1.34x and up to 10.57x, so I would consider their existence as justified! They speed up the usage of the elliptic curves BLS12_381, BLS12_377, BW6_761, ED_ON_BLS12_377, ED_ON_BLS12_381, AFAIK they are all required somewhere for either Sassafrass or ZKP proof verification @burdges ?

@burdges
Copy link

burdges commented Mar 14, 2023

We do want the multi_miller_loop, final_exponentiation, and msm host calls for BLS12-381 in polkadot this year.

We usually reference benchmarks to the repo that generates them, so that others could rerun them. It's the benchmarks which justify adding those host calls, but not adding other host calls, so they should be referenced from the PPP document. We'll quote some facts from them here though I guess..

As for other curves, but We'll discuss the benchmarks this week or next to make decisions, but..

Right now the ed_on_* msms do not benefit so much, not sure the problems.

We'll likely do BW6-767 for BLS12-381 but arkworks just adding this arkworks-rs/curves#154 We only need BW6-767 so that polkadot and kusama can track one another's finality, not sure if this is really a priority. We'll discuss BLS12-377 and BW6-761 too.

@burdges
Copy link

burdges commented Mar 14, 2023

@bkchr How do you guys future proof error types in host calls? At present multi_miller_loop cannot fail, but Alistair noticed cofactor checking could happen inside the Miller loop, which requires an error type. We do not envision ever returning more than just this error type, but who knows.

Also, should we merge the error types for different host calls in the same subsystem? Ala,

pub enum PairingError {
    InternalPanic,
    FinalExpInverse, 
    MillerLoopCofactor,
}

@AlistairStewart
Copy link

We want to be able to do with host functions all these and more. We'll want more curves than these. A key reason that Ethereum right now is so bad for ZK is that the only elliptic curves it has precompiles for are the BN254 curve for pairings, which isn't so secure these days, or secp256k1, which is bad for proving systems because it isn't FFT friendly. Support for more interesting curves would make Polkadot more attractive for ZK developers. In particular SNARK friendly 2-chains of pairing curves like BW6-681/BLS12-377 here or for pairing cycles like the Pasta curves that aren't in this PR. That means that the things Aleo or Mina are doing aren't possible on Ethereum and they aren't really efficient enough on Substrate right now but they could be, with more host functions.

Now I did question whether we need seperate host functions for every curve, but @burdges seems to think that will make maintanability easier.

@achimcc
Copy link
Author

achimcc commented Mar 17, 2023

We want to be able to do with host functions all these and more. We'll want more curves than these. A key reason that Ethereum right now is so bad for ZK is that the only elliptic curves it has precompiles for are the BN254 curve for pairings, which isn't so secure these days, or secp256k1, which is bad for proving systems because it isn't FFT friendly. Support for more interesting curves would make Polkadot more attractive for ZK developers. In particular SNARK friendly 2-chains of pairing curves like BW6-681/BLS12-377 here or for pairing cycles like the Pasta curves that aren't in this PR. That means that the things Aleo or Mina are doing aren't possible on Ethereum and they aren't really efficient enough on Substrate right now but they could be, with more host functions.

Now I did question whether we need seperate host functions for every curve, but @burdges seems to think that will make maintanability easier.

I remember that Ashutov's original implementation used a call function handler and exposed only one new host function.

But I also remember, that @pepyakin mentioned to me, the only acceptable way to implement it, would be to expose one host call for every arithmetic operation.

@burdges
Copy link

burdges commented Mar 18, 2023

We already address versioning and deprecation from the substrate side, doing it again wastes effort, unless the substrate flavor winds up inadequate somehow, which seems unlikely.

@achimcc Can you fix the scalar multiplication benchmarks? like https://github.com/achimcc/substrate-arkworks-examples/blob/4dcd81ffe1be1fe64e10252c703d61a8a80d366d/pallets/template/src/bls12_381.rs#L153

At present, they come out insanely fast with variable time arithmetic, which prevents us understanding the other benchmarks. You'll need random-ish scalars and points generated before the benchmark starts, like with the others.

@achimcc
Copy link
Author

achimcc commented Mar 21, 2023

I did some mistakes in my benchmarking setup. But the MSM benchmarks where correct, just affine and scalar multiplication where way too fast since they were only multiplying everything by a factor 2 / doubling curve points everywhere. It seems like we need a few more host functions, aka affine and projective mul's, at least for bls12_381, bls12_377 and bw6_761. I updated the benchmark results table of this PR accordingly!

@bkchr
Copy link

bkchr commented Mar 22, 2023

@bkchr How do you guys future proof error types in host calls? At present multi_miller_loop cannot fail, but Alistair noticed cofactor checking could happen inside the Miller loop, which requires an error type. We do not envision ever returning more than just this error type, but who knows.

If we just need to report that there was an error, we could just use () as error type? Or do you expect some kind of special handling of the error types?

@burdges
Copy link

burdges commented Mar 22, 2023

Yes, we should just use () for the error type in both cases I guess, thanks for the sanity check. ;)

Reasoning:

A final exponentiation could fail if you feed it bad data, only doable with parachain code that's either intentionally flawed, or messes up G_T computations, ala MIPP ala snarkpack or marlin, etc. It's still just an invalid transaction/batch, but not even reachable from polnk, groth16, etc. without batching. In theory, there are type II pairings where even the Miller loop between a point and itself outputs a zero, but likely nobody ever uses type II, much less these shitty ones.

In future, we might've miller loops return an extra error that represents points not in the subgroup, which currently we'd check in deserialization. It's interesting to distinguish this cases from the first, but only because it says you've fucked up the code differently. And parachain could still do distinguish if necessary since these are separate host calls.

@achimcc
Copy link
Author

achimcc commented Mar 24, 2023

Yes, we should just use () for the error type in both cases I guess, thanks for the sanity check. ;)

Reasoning:

A final exponentiation could fail if you feed it bad data, only doable with parachain code that's either intentionally flawed, or messes up G_T computations, ala MIPP ala snarkpack or marlin, etc. It's still just an invalid transaction/batch, but not even reachable from polnk, groth16, etc. without batching. In theory, there are type II pairings where even the Miller loop between a point and itself outputs a zero, but likely nobody ever uses type II, much less these shitty ones.

In future, we might've miller loops return an extra error that represents points not in the subgroup, which currently we'd check in deserialization. It's interesting to distinguish this cases from the first, but only because it says you've fucked up the code differently. And parachain could still do distinguish if necessary since these are separate host calls.

I replaced PairingError by () everywhere in the code.

@SupremoUGH
Copy link

Why not support bn254 and ed_on_bn254 as well?

Copy link

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC is still to specific. This is not about introducing artworks, it is about introducing eliptic curves host functions to Polkadot. Other implementations of Polkadot will probably not use artworks.

proposals/0000-arkworks-integration.md Outdated Show resolved Hide resolved
proposals/0000-arkworks-integration.md Outdated Show resolved Hide resolved
proposals/0000-arkworks-integration.md Outdated Show resolved Hide resolved
@achimcc
Copy link
Author

achimcc commented Apr 7, 2023

Why not support bn254 and ed_on_bn254 as well?

Right now we want to focus on the mentioned host functions, we might add them later on.

@seunlanlege
Copy link

seunlanlege commented Apr 10, 2023

Please lets add group element additions as well, i see we already have multiplications

@achimcc
Copy link
Author

achimcc commented Apr 10, 2023

Please lets add group element additions as well, i see we already have multiplications

@burdges @bkchr should I do it? Would add even more host functions. We could also try to only add additions, AFAIK the mul's use the additions under the hood via the double and add algorithm for computations?

@achimcc
Copy link
Author

achimcc commented Apr 10, 2023

However, I remember that in my original benches, we were benchmarking mul's with a multiplication by two instead of using a random field element, this was effectively a curve point addition and the benchmarks revealed a slow down when using host functions due to host call and (de-)serialization overhead. So I'd be dubious that we achieve a performance improvement by host calls for additions! @seunlanlege

@seunlanlege
Copy link

So I'd be dubious that we achieve a performance improvement by host calls for additions! @seunlanlege

I think it's worth investigating, I'm happy to run the needed benchmarks myself. Lots of use cases for computing the sum of a lot of group elements.

@achimcc
Copy link
Author

achimcc commented Apr 11, 2023

So I'd be dubious that we achieve a performance improvement by host calls for additions! @seunlanlege

I think it's worth investigating, I'm happy to run the needed benchmarks myself. Lots of use cases for computing the sum of a lot of group elements.

I looked into it: I changed the arkworks traits with several PR's to prepare the arkworks integrations. Those PR's allow us to pass custom methods for affine and projective mul's. However, we can not overwrite the default models for add's now. This means we would either have to fork large parts of arkworks.rs/algebra models crate, or we would need to to do more PR's prior. Someone could investigate the performance improvements first by forking the model's completely and continue with some PR's in case there is any. But I'm rather sure that there wouldn't be any performance improvement, so I'd recommend not to do this now and to consider it as a possible (but unlikely) follow up PR.

@achimcc achimcc changed the title Add host function calls for arkworks to Polkadot Add host function calls for elliptic curve arithmetic to Polkadot May 4, 2023
@burdges
Copy link

burdges commented May 7, 2023

It's always nice if someone else tries their hand at benchmarks @seunlanlege We'd various difficulties with our own benchmarks.

Indeed, if you read the single scalar multiplications in https://github.com/achimcc/substrate-arkworks-examples/blob/main/benchmarks-comparison.md then you'll see they report way too fast. This is because those still use the scalar 2 and the scalar mults are variable time. See https://github.com/achimcc/native-bench-arkworks/blob/47d02ddc4fef3ac1bc61f149e9f75723456458fc/src/ed_on_bls12_377.rs#L20 for example. I'd seen what I wanted from the benchmarks so I felt no need to ask for improvements.

In fact, addition should be some fixed multiple (1-0.33) of those single 2*point multiplications in the current benchmarks, so yes you'd have a 5x speed up, as opposed to the 6-8x speed up seen elsewhere.

We've various small concerns around addition though like projective vs affine coordinates, ala mixed addition, and doublings, and special cofactor clearing optimizations. In many cases, we should choose if we give them separate hostcalls or if we suck them up into an existing hostcall at the cost of an extra if check. We might also choose where subgroup checks occur. It all risks a maintenance burden though.

We should really do partial aggregation for protocols which see significant usage, which invariably absorbs more of the additions. Schnorr signatures like sr25519 have half-aggregation. SnarkPack is way more powerful.

We need some substrate work before partial aggregation: substrate needs non-Merklised transient storage #11 which your PVF creates in on_initialize, appends to in transactions, and sends to the batch verifier in on_finalize. We'll mostly need people to actually implement and deploy partial aggregation though. It's massively easier to do partial aggregation on substrate/polkadot than on smart contract platforms, so doing all this gives polkadot advantages unavailable to ETH, et al.

tl;dr We should spend resources on partial aggregation instead of smart contract flavored optimizations.

@Noc2
Copy link
Contributor

Noc2 commented Jul 14, 2023

We will archive this repository in favor of the new RFCs repository by the Fellowship. So, I'm closing the issue here. I recommend opening a PR at the new repository.

@Noc2 Noc2 closed this Jul 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants