Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pathfinder_getProof RPC endpoint #714

Closed
pscott opened this issue Nov 7, 2022 · 20 comments · Fixed by #726
Closed

Add pathfinder_getProof RPC endpoint #714

pscott opened this issue Nov 7, 2022 · 20 comments · Fixed by #726

Comments

@pscott
Copy link
Contributor

pscott commented Nov 7, 2022

Similarly to EIP-1186, I suggest to add a pathfinder_getProof endpoint to pathfinder.
This would allow applications like storage proofs (courtesy of our friends at herodotus) on starknet.

Specification
As part of the pathfinder (not starknet because it's not part of the formal RPC specification), an additional method called pathfinder_getProof should be defined as follows:

patfhinder_getProof
Returns the account- and storage-values of the specified account including the Merkle-proof.

Parameters

  1. DATA: 1 felt - address of the account.
  2. ARRAY - array of storage-keys which should be proven and included. See starknet_getStorageAt
  3. QUANTITY | BLOCKHASH | TAG - integer block number / block hash / the string "latest" or "pending"

Returns
Object - An object:

  • contractProof: ARRAY - Array of MerkleTree-Nodes, starting with the root of the contract, using the contract address as path to go down along the tree.
  • contractData: An OPTIONAL object consisting of:
    • contractClassHash: DATA 1 felt - Class hash. See official documentation.
    • contractNonce: DATA 1 felt - contract's nonce. See official documentation.
    • root: DATA 1 felt - Root of the contract state tree. All storage will deliver a MerkleProof starting with this root.
    • contractStateVersion: DATA 1 felt - represents the state version. Might be changed in the future by Starkware.
    • storageProofs: ARRAY - Array of storage-entries as requested. Each entry is an array of MerkleTree-Nodes corresponding to the nodes encountered while going down the tree with using key as path.

Note

  • proofs can be proofs of membership or non membership. Meaning if a contract does not exist, contractProof will correspond to the list of nodes closest to the desired contract, thus showing that the exact contract doesn't exist. The same logic applies to storage proofs.
  • contractData object will be set to 0 if the contract doesn't exist.
@CHr15F0x
Copy link
Member

CHr15F0x commented Nov 7, 2022

@pscott is this more of a feature request or do you have plans to submit a PR of your own?

@pscott
Copy link
Contributor Author

pscott commented Nov 7, 2022

@pscott is this more of a feature request or do you have plans to submit a PR of your own?

I'd like to submit a PR on my own, but thought it would be a good idea to create an issue to discuss the design around it with you guys! :)

@Mirko-von-Leipzig
Copy link
Contributor

Mirko-von-Leipzig commented Nov 8, 2022

TL;DR: output also requires the following for a fully verifiable merkle proof, in order to link the global state root to a given contract's storage element:

  1. Merkle proof from global root to contract state hash
  2. Components to verify the contract state hash links the global root to the contract's root
    1. contract's class hash
    2. contract's nonce
    3. contract's state root (already included in proposed output)

Actual post

Looks reasonable to me. One thing that is missing from the return object - should it not include the merkle proof for the contract root as well? This is also present in EIP-1186 in the form of the following parts of the output:

  • balance: QUANTITY - the balance of the account. See eth_getBalance
  • codeHash: DATA, 32 Bytes - hash of the code of the account. For a simple Account without code it will return "0xc5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470"
  • nonce: QUANTITY, - nonce of the account. See eth_getTransactionCount
  • storageHash: ... (not this, this is the contract root equivalent)
  • accountProof: ARRAY - Array of rlp-serialized MerkleTree-Nodes, starting with the stateRoot-Node, following the path of the SHA3 (address) as key.

I'll explain how and what this is in StarkNet, and why it is required, below.

In StarkNet the only state part verifiable for given block is the global state root (both on L1 and L2). For this to be a verifiable merkle proof we would therefore need to connect the contract's state root to this global root.

Conceptually how this works is that there are two different merkle-patricia tries in StarkNet state. A global state tree - this connects the global root where each leaf contains a contract's state hash (note: not contract root). The keys in this global tree are the contract's address. A contract's state hash is a hash over the contract root, contract nonce and state hash version. And the second tree contains the contract storage values.

In order to get from global root to a specific contract storage value, one therefore has to do the following:

  1. Traverse the global tree from the global root using the contract's address. This gives us the contract's state hash.
  2. Lookup the contract's state root using the state hash from (1).
  3. Traverse the contract tree from the state root from (2). This gives us the storage value.

In order for us to have a fully verifiable merkle proof, we therefore need all three of these steps to be verifiable. Your current output allows for verifying only (3). So in addition, we need a merkle proof for (1), and then for (2) we require providing all the values that take part in the hash calculation.

The end user can then:

  1. Verify the merkle proof of (1), which links the global root to the given contract state hash.
  2. Verify the hash calculation of (2) which verifies that the contract root is indeed linked to the global root.
  3. Verify the merkle proof (3), which links the storage value to the contract root, and therefore to the global root.

There is one further concern I have, and that is regarding the verification of the state hash (2). The calculation is:

state_hash = H(H(H(class_hash, contract_root), contract_nonce), RESERVED)

RESERVED is a constant (currently 0), which presumably allows StarkWare to change this function in the future based on this switch.

The concern I have is that contract_nonce is not verifiable from here. I suppose we can assume that the caller has this information from beforehand and can check that the nonce we give matches their known value. And similarly for class_hash except of course that this is a constant once it is known. edit nevermind, I think the fact that the hash computes is proof enough of the nonce.

Another question is whether we should also give the value of RESERVED.. but that's a question for StarkWare and how they want to use this value in the future. For now we can ignore it.

edit ^^ I think we need to output this reserved value as well.

@Mirko-von-Leipzig
Copy link
Contributor

I've made some minor edits to the post above. Essentially the crossed out section on nonces, and added that I think we do need to output reserved value as well. Unclear at this stage what to name it, but we can ask starkware what this value represents.

@pscott
Copy link
Contributor Author

pscott commented Nov 8, 2022

@Mirko-von-Leipzig Thank you so much for this answer! I'm struggling to wrap my head around starknet's storage, and your comment is really a goldmine. (btw for anyone who might be reading this one day, I would recommend reading this official doc also).

I have a couple of comments / questions:

  1. You say the key for the global state tree is the contract's address. Am I correct in saying that, to find a leaf that corresponds to a certain contract, one simply needs to use the binary representation of the contract address, and go down the trie (going left when we see a 0, and right when we see a 1) ?
  2. If we call the first tree the global state tree and the second tree the contract tree, am I correct in saying that there is not only a single contract tree but there's actually one contract tree per leaf in the global state tree?
  3. What is the key for a contract tree? Is it simply a number that goes from 0 -> 2^251 - 1 (and follows the same logic as explained in 1.) ?

Thanks!

@Mirko-von-Leipzig
Copy link
Contributor

@Mirko-von-Leipzig Thank you so much for this answer! I'm struggling to wrap my head around starknet's storage, and your comment is really a goldmine. (btw for anyone who might be reading this one day, I would recommend reading this official doc also).

I have a couple of comments / questions:

1. You say the `key` for the global state tree is the contract's address. Am I correct in saying that, to find a leaf that corresponds to a certain contract, one simply needs to use the binary representation of the contract address, and go down the trie (going left when we see a `0`, and right when we see a `1`) ?

2. If we call the first tree the `global state tree` and the second tree the `contract tree`, am I correct in saying that there is not only a single `contract tree` but there's actually one contract tree per leaf in the `global state tree`?

3. What is the `key` for a `contract tree`? Is it simply a number that goes from `0 -> 2^251 - 1` (and follows the same logic as explained in 1.) ?

Thanks!

  1. Yes, that is correct. You can see our code here. This wrapper specifically takes in a ContractAddress before converting it to bits to use with the actual tree it is wrapping. This gives us type safety.
  2. Technically yes. There is technically a new tree for each contract, for each block. However, this would be a storage nightmare so under the hood we store all of these in a single database table. This is possible because its all just hashes, and if a node has the same hash then it has the same contents and we only have to store once. i.e. all the contract trees can share nodes.
  3. This key is the address of the storage item you are wanting. And yes it follows the same logic as in (1). The code for this follows wrapper logic for type safety.

@pscott
Copy link
Contributor Author

pscott commented Nov 8, 2022

Super useful! Thanks! I'm glad my questions were correct, it means I understood what you said in the first comment x)

Will be opening a draft PR soon hopefully! :)

@Mirko-von-Leipzig
Copy link
Contributor

Mirko-von-Leipzig commented Nov 8, 2022

Super useful! Thanks! I'm glad my questions were correct, it means I understood what you said in the first comment x)

Will be opening a draft PR soon hopefully! :)

I think a good PR draft starting point is to get the merkle proof working for a single tree. So no RPC, or anything else -- just an additional function to the merkle tree implementation which returns the merkle proof of a given key.

Another thing to consider ito API is what happens if the contract does not exist, or the storage key does not exist. It would not be enough to simply return an error -- since this query is about proving the state of something, the caller needs to be able to verify this is true i.e. verify that indeed, the contract or storage key does not exist.

What I saw somewhere in either the EIP or some Ethereum implementation was that they return a merkle proof that proves it does not exist. You return the list of nodes as close to the desired key as possible (with the final node proving non-existance because it has deviated from the key). This will necessarily end in either: a leaf node, which is not the desired key, or an edge node which deviates from the desired key's path.

For simplicity, it may be best to ignore missing keys initially, until later in the PR. Up to you of course.

@pscott pscott changed the title Add pathfinder_getProof RPC endpoint Add pathfinder_getStorageProofs RPC endpoint Nov 28, 2022
@FawadHa1der
Copy link

@Mirko-von-Leipzig @pscott Great work on this. Thanks so much. I have been playing around with the output of pathfinder_getProof and format is in json. I think there is some argument for it to be instead be RLP encoded in line with the eth_getProof format. The proofs in some use cases need to be verified on-chain and json deserialization of course is not feasible there and will have to be massaged into RLP again by the client. This could also be an improvement later on but wanted to raise a question. Thanks again Scott for working on this

@Mirko-von-Leipzig
Copy link
Contributor

I think there is some argument for it to be instead be RLP encoded in line with the eth_getProof format. The proofs in some use cases need to be verified on-chain and json deserialization of course is not feasible there and will have to be massaged into RLP again by the client.

I'm hesitant to mix different encoding formats just because Ethereum has done it. A counter-argument is that everyone else that is not doing onchain verification will have to now decode it again before its usuable.

Does onchain verification for this type of merkle tree already exist?

@pscott
Copy link
Contributor Author

pscott commented Nov 29, 2022

My 2cents on the matter:

  1. Not every chain uses RLP encoding.
  2. Client can always RLP encode the output if need be.
  3. L1 will probably not verify starknet proofs (simply because it's too expensive).
  4. However, Optimistic Rollups or EVM compatible ZK Rollups might want to verify those proofs.
  5. For these reasons I agree having RLP output directly from pathfinder is a nice option

My solution: allow this endpoint to have the --rlp option that would have the output RLP encoded (using the rlp crate).

@FawadHa1der
Copy link

I think there is some argument for it to be instead be RLP encoded in line with the eth_getProof format. The proofs in some use cases need to be verified on-chain and json deserialization of course is not feasible there and will have to be massaged into RLP again by the client.

I'm hesitant to mix different encoding formats just because Ethereum has done it. A counter-argument is that everyone else that is not doing onchain verification will have to now decode it again before its usuable.

Does onchain verification for this type of merkle tree already exist?

I should have been clearer, my motivation is for it to be onchain friendly format, be it RLP or SSZ etc. Just a cursory search did bring up Aragon repo where its being used https://github.com/aragon/aragon-factory/blob/fbb8963509020737dd647feb726fab66e4222ffe/contracts/vocdoni/lib.sol

Just like Scott mentioned there will be use cases where it needs to be verified on chain especially when transferring data between 2 different L2s directly without going through L1 which will be expensive. I think this will be a big use case in the future to transfer data between rollups directly and if the format does not need to be transformed massaged etc it will reduce developer pain/errors and possibly re-use the verifier on different chains/l2s/rollups. Errors on-chain tend to be way more expensive than off-chain and taking data directly from pathfinder_api into the verifier will help in that.

@Mirko-von-Leipzig
Copy link
Contributor

I think there is some argument for it to be instead be RLP encoded in line with the eth_getProof format. The proofs in some use cases need to be verified on-chain and json deserialization of course is not feasible there and will have to be massaged into RLP again by the client.

I'm hesitant to mix different encoding formats just because Ethereum has done it. A counter-argument is that everyone else that is not doing onchain verification will have to now decode it again before its usuable.
Does onchain verification for this type of merkle tree already exist?

I should have been clearer, my motivation is for it to be onchain friendly format, be it RLP or SSZ etc. Just a cursory search did bring up Aragon repo where its being used https://github.com/aragon/aragon-factory/blob/fbb8963509020737dd647feb726fab66e4222ffe/contracts/vocdoni/lib.sol

Just like Scott mentioned there will be use cases where it needs to be verified on chain especially when transferring data between 2 different L2s directly without going through L1 which will be expensive. I think this will be a big use case in the future to transfer data between rollups directly and if the format does not need to be transformed massaged etc it will reduce developer pain/errors and possibly re-use the verifier on different chains/l2s/rollups. Errors on-chain tend to be way more expensive than off-chain and taking data directly from pathfinder_api into the verifier will help in that.

That's a fair point. Establishing a baseline standard would be helpful ito ecosystem. I think SSZ would make more sense as its replacing RLP where possible? I'm not too informed about the trade-offs however.

@pscott pscott changed the title Add pathfinder_getStorageProofs RPC endpoint Add pathfinder_getProof RPC endpoint Dec 1, 2022
@pscott
Copy link
Contributor Author

pscott commented Dec 2, 2022

After thinking about it again, it seems to me that:

  1. All other RPC endpoints return JSON objects, and having this call return a SSZ could be very strange for the end user
  2. This means the end user will need to handle both JSON and SSZ (if we decide to only support SSZ).
  3. If we decide to support SSZ as an option, then this means code burden for this repo.
  4. Not everyone will want SSZ and we might get requests for RLP or other encoding schemes.

Given all of that, I'm in favor of keeping JSON right now, and potentially think about moving everything to SSZ (or considering having the SSZ option as output, if there is a strong case for it) :)

@pscott
Copy link
Contributor Author

pscott commented Dec 5, 2022

@Mirko-von-Leipzig I realize now I think we also need to return the global_state_tree root (aka GlobalRoot). Indeed I believe it is a useful piece of information for the user who would want to verify the contract_proof :)

I guess technically one can always run the starknet_getBlockWithTxHashes request and extract the corresponding root from there but I feel like it's a little bit cumbersome?

@Mirko-von-Leipzig
Copy link
Contributor

@Mirko-von-Leipzig I realize now I think we also need to return the global_state_tree root (aka GlobalRoot). Indeed I believe it is a useful piece of information for the user who would want to verify the contract_proof :)

I guess technically one can always run the starknet_getBlockWithTxHashes request and extract the corresponding root from there but I feel like it's a little bit cumbersome?

The root should always be present in the global tree's proof (unless of course the tree is empty, but that is a bit of an extreme edge case). So you can extract it from there if you want it to be an additional field?

@pscott
Copy link
Contributor Author

pscott commented Dec 5, 2022

The root should always be present in the global tree's proof (unless of course the tree is empty, but that is a bit of an extreme edge case). So you can extract it from there if you want it to be an additional field?

Hmm currently the roo tis not present in the global tree's proof. A proof looks like this for example: http://sprunge.us/M7EOWR

As you can see, the first element is a Binary node, not a StarkHash. Indeed, every node in ProofNode will either be BinaryProofNode or EdgeProofNode anyway, so it's not possible to just have a hash.

Now ofc, the user can use the first node and hash it! :) Just adds some complexity

@Mirko-von-Leipzig
Copy link
Contributor

The root should always be present in the global tree's proof (unless of course the tree is empty, but that is a bit of an extreme edge case). So you can extract it from there if you want it to be an additional field?

Hmm currently the roo tis not present in the global tree's proof. A proof looks like this for example: http://sprunge.us/M7EOWR

As you can see, the first element is a Binary node, not a StarkHash. Indeed, every node in ProofNode will either be BinaryProofNode or EdgeProofNode anyway, so it's not possible to just have a hash.

Now ofc, the user can use the first node and hash it! :) Just adds some complexity

Ah sorry I misread your question. Thing is, the caller / user must have the root from some other source -- otherwise all he knows is that you gave a proof for some root. He needs some way of trusting that the root is correct for the block ID he requested. i.e. he must have it already somehow. Us providing it doesn't help, just means its an untrusted value.

@pscott
Copy link
Contributor Author

pscott commented Dec 5, 2022

Yes I agree there's not point in having the root except... when testing. Which I am doing right now haha :p

@FawadHa1der
Copy link

Just wanted to give Scott a shoutout, thank you!! Fantastic job on this. And to pathfinder team for incorporating this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants