Skip to content
This repository has been archived by the owner on Jan 19, 2021. It is now read-only.

Support for proofs of null/absence. Dried up prove/verify. #82

Merged
merged 1 commit into from
Jun 22, 2019

Conversation

zmitton
Copy link
Contributor

@zmitton zmitton commented Feb 14, 2019

The best data structure for a proof, is actually a merkle-patricia-tree itself. This tree can be built by batching all the node values of the proof into the underlying db at their keccak as key. This optimizes multiple proofs by not having a need for duplicates. The verification takes place by simply using this tree that you built, as if it were the real merkle-patricia-tree. You can do any operations on it that you would normally do to the main one. The only subtlety, is that if at any point during traversal of said tree, it tries to “step in” to a hash value that it cant find in the underlying db, this means the proof is missing pieces and is invalid.

This will correctly handle null leaves. When performing get() on a value that is null, it will find its target node index (of the 17) that contains an empty byte array. This means the key corresponded to null. You can still even do put, because you will again arrive at an empty byte array and knowing that anything the rest of the way down said path was not initialized, can put the extension node to the new value and then hash each node back to the root as usual.

The current serialization format will work fine, but now can include multiple proofs with nodes in any ordering, except that the root node should always be at index[0].

@zmitton zmitton force-pushed the null-proof branch 2 times, most recently from fe5687a to fe225f9 Compare February 14, 2019 23:20
@zmitton
Copy link
Contributor Author

zmitton commented Feb 15, 2019

fixes #64

@coveralls
Copy link

coveralls commented Feb 15, 2019

Coverage Status

Coverage increased (+0.7%) to 93.761% when pulling 2f80f33 on zmitton:null-proof into 88f729d on ethereumjs:master.

@zmitton
Copy link
Contributor Author

zmitton commented Feb 15, 2019

fixes #47

@zmitton
Copy link
Contributor Author

zmitton commented Feb 15, 2019

fixes #19

@zmitton
Copy link
Contributor Author

zmitton commented Feb 15, 2019

Fixes #17

@holgerd77 holgerd77 requested a review from s1na February 15, 2019 09:49
@holgerd77
Copy link
Member

Hi @s1na, can you take a look at this, think you are actually deeper into the library than me? This will probably fail due to the Tape issue after merge on another CI run on master. Think it should nevertheless be ok to merge (if you are ok with it) and then directly merge-in #81 to have this fixed.

@holgerd77
Copy link
Member

@zmitton 4 fixes by one PR, that sounds at least extremely tempting! 😄 Thanks for the PR!

@zmitton
Copy link
Contributor Author

zmitton commented Feb 15, 2019

Looks like it 👍. That last one was completely accidental

src/baseTrie.js Outdated
@@ -49,6 +49,38 @@ module.exports = class Trie {
this.root = root
}

static fromProof(proofNodes, cb, proofTrie){
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what you are making here could be called a sparse trie . optional proofTrie argument so you can build onto an existing sparse trie is you want.

@@ -49,6 +49,38 @@ module.exports = class Trie {
this.root = root
}

static fromProof(proofNodes, cb, proofTrie){
let opStack = proofNodes.map((nodeValue) => {
return {type: 'put', key: ethUtil.keccak(nodeValue), value: ethUtil.toBuffer(nodeValue)}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its kindof like the verification is done in this step because when you traverse the tree you already know each key is the hash of the node it's storing

src/baseTrie.js Outdated
@@ -434,7 +467,8 @@ module.exports = class Trie {
childKey.push(childIndex)
const priority = childKey.length
taskExecutor.execute(priority, taskCallback => {
self._lookupNode(childRef, childNode => {
self._lookupNode(childRef, (e, childNode) => {
if(e){ return cb(e, null)}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the line that makes my tests pass, but I added the line to anywhere _lookupNode is being called (except in checkRoot which I believe is expected to just return false in that case).

So probably it could use some more tests that make sure it's properly propagating the other errors (but current tests continue to pass).

it('should not crash if given a non-existant root', function (t) {
var root = new Buffer('3f4399b08efe68945c1cf90ffe85bbe3ce978959da753f9e649f034015b8817d', 'hex')
var trie = new Trie(null, root)

trie.get('test', function (err, value) {
t.equal(value, null)
t.end(err)
t.notEqual(err, null)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note The behavior here has changed slightly, but reading the objective of the test I think it's ok

Copy link
Contributor

@s1na s1na left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool! I like the general approach. I suggest we break this into two PRs, one for modifying the _lookupNode behaviour (to make sure it has no unintended consequences), and one for the proofs.

@zmitton
Copy link
Contributor Author

zmitton commented Feb 18, 2019

ok yeah, I can do that. And might add some more tests before I re-submit the later

@holgerd77
Copy link
Member

Any update on this?

@zmitton
Copy link
Contributor Author

zmitton commented Feb 26, 2019

As requested by @s1na I've put the fix to 17 in its own PR. It might be another week or so before I submit the other half separately because I would like to test a few more angles from my own repo eth-proof . This is not so easy because I want to build specific configurations of trees to get the edge cases

@holgerd77
Copy link
Member

Just for mental preparation 😄: this will need some rebase and squashing of commits once ready. Is this necessary that you do this on top of the other code changes (haven't looked deeper into the code)? This will - probably - a bit difficult to get this out again once the other PR is merged?

@zmitton
Copy link
Contributor Author

zmitton commented Mar 18, 2019

ok, I have squashed all the commits into 1 per pull-request. Both should be ready to go

@holgerd77
Copy link
Member

Ah, this would need to have the commit from the merged PR removed. It also needs a rebase.

@holgerd77
Copy link
Member

(also: linting is currently failing, run npm run lint before pushing)

@zmitton
Copy link
Contributor Author

zmitton commented Mar 19, 2019

I'm not sure what you mean. This PR wont work without 0cd83dc underneith it.

can you describe the exact steps you would like done or maybe just cherry pick them in or something?

@holgerd77
Copy link
Member

The _lookupNode callback PR #83 has now been merged, therefore the respective commit should be removed from this PR, otherwise it is applied twice (with git rebase -i master from your branch, and then using drop on the commit).

Then git rebase master has to be applied to get the branch on top of latest master state and finally you should run npm run lint, fix the linting errors and add the fixes (eventually also with git rebase -i master and then squash a linting correcting commit to the main work commit).

Does this help? Did I overlook something?

@holgerd77
Copy link
Member

Hi @zmitton do you find some time to update this here? I would love to merge and do a release! 😀

@holgerd77
Copy link
Member

@zmitton This is not to get on your nerves 😄, just a gentle reminder in case this gets forgotten: if you find some time, an update would be appreciated, especially since we are really close to the finish line. 🎯

@zmitton
Copy link
Contributor Author

zmitton commented May 3, 2019

@holgerd77 luckily i didn't have to do any of the fancy stuff. Just a normal rebase from master. I believe it achieves the desired result. Let me know :)

@lgtm-com
Copy link

lgtm-com bot commented May 3, 2019

This pull request fixes 1 alert when merging 73e7b6d into 88f729d - view on LGTM.com

fixed alerts:

  • 1 for Useless assignment to local variable

Comment posted by LGTM.com

Copy link
Contributor

@s1na s1na left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think it's a really interesting idea, and if it actually works I'm sure it'll have its
use-cases.

I'd feel more confident if there was any other client which also had proof of absence, as I'm not sure if there are no edge cases.

In addition to the comments below, I tried to integrate this branch into ethereumjs-vm which failed (for reasons not related to this PR).

static prove (trie, key, cb) {
trie.findPath(key, function (err, node, remaining, stack) {
if (err) return cb(err)
let p = stack.map((stackElem) => { return stackElem.serialize() })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're including embedded nodes (specifically those with length < 32) in the proof in comparison with the previous prove method. Is that necessary for the non-existence proofs?

})
}

static verifyProof (rootHash, key, proofNodes, cb) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't root of proofTrie be checked against rootHash here?

Copy link
Contributor Author

@zmitton zmitton May 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually yeah that's true. It's hard to say who's job this is. But if it's going to take the argument rootHash it should verify it. somehting like this will do it

  static verifyProof (rootHash, key, proofNodes, cb) {
    let proofTrie = new Trie(null, rootHash)
    Trie.fromProof(proofNodes, (error, proofTrie) => {
      if (error) cb(new Error('Invalid proof nodes given'), null)
      proofTrie.get(key, (e, r) => {
        return cb(e, r)
      })
    }, proofTrie)

@@ -135,16 +167,11 @@ module.exports = class Trie {
cb(null, new TrieNode(node))
} else {
this.db.get(node, (err, value) => {
if (err) {
throw err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db.get doesn't return an error if value is not found (just null for value). I wouldn't remove this if there's no special reason for this, as I can imagine in future db.get could potentially return errors for other reasons (e.g. validation of input).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db.get doesn't return an error if value is not found
There is a special reason for it. The way in which this tree is traversed and the property that is is a merkle tree, the child node must always exist in a valid tree.

I can imagine in future db.get could potentially return errors for other reasons
Im still returning that error below. Im just not blowing up as before (throwing an error during async function was just unhelpful blow up)

test/proof.js Outdated
@@ -37,30 +37,53 @@ tape('simple merkle proofs generation and verification', function (tester) {
})
})
},
function (cb) {
function (cb) { // should create a valid proof of null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I wish you had simply added new test cases instead of modifying existing ones. Are these 3 cases not valid anymore? Going through them I have a hard time distinguishing between different result of verifyProof:

  • If proof is for a key that doesn't exist and verifyProof checks the same key, val and err are null
  • If proof is for a key and verifyProof checks a totally random key (that wasn't in proof), there'll be error
  • In the first test case, proof is for key2bb, then it tries to verifyProof for key2, and it again returns null for val and err. I.e. the same proof is also a valid proof of absence for key2, which is certainly interesting, although maybe unexpected?

I'm not sure if users would need to distinguish between all the various states? One benefit of the previous approach in my eyes is that only the value for which the proof was generated would be verified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If proof is for a key that doesn't exist and verifyProof checks the same key, val and err are null

the proof should prove the key is associated with null. It returns the correct value -> null, and it does not raise an error, because the verify function, having been called on that key, returned the sufficient nodes to prove this.

If proof is for a key and verifyProof checks a totally random key (that wasn't in proof), there'll be error

since this proof does not contain sufficient nodes to prove the random key, verifyProof(randomKey) will return an error. It does not matter if randomKey exists or not. It matter that the proof contains the correct nodes to verify such.

In the first test case, proof is for key2bb, then it tries to verifyProof for key2, and it again returns null for val and err. I.e. the same proof is also a valid proof of absence for key2, which is certainly interesting, although maybe unexpected?

Yes, this was an interesting edge-case that I wanted to test. I did not properly document it. In this case, the proof happens to contain enough nodes to prove the value at key2 because traversing into key22 would touch all the same nodes as traversing into key2 (and an extra). So the proof from key22 will be valid to prove the value of any shorter key.

One benefit of the previous approach in my eyes is that only the value for which the proof was generated would be verified

I dont think thats a benefit except maybe for testing. It also has a major drawback in that it does not allow proofs to be efficiently combined. This approach allows the "proofTree" to be drop-in replacement for a "regular tree" (it can even do put). A light client could in the future process state transitions using existing EVM software.

partial/sparse tree support for adding to existing sparse tree

lint fixup

verify root as well
@lgtm-com
Copy link

lgtm-com bot commented May 3, 2019

This pull request fixes 1 alert when merging 2f80f33 into 88f729d - view on LGTM.com

fixed alerts:

  • 1 for Useless assignment to local variable

Comment posted by LGTM.com

@zmitton
Copy link
Contributor Author

zmitton commented May 6, 2019

I'd feel more confident if there was any other client which also had proof of absence, as I'm not sure if there are no edge cases.

I made the npm package that does use all this stuff by the way

Copy link
Contributor

@s1na s1na left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked, Geth has a similar behaviour for null keys, from their documentation:

If the trie does not contain a value for key, the returned proof contains all
nodes of the longest existing prefix of the key (at least the root node), ending
with the node that proves the absence of the key.

I'm going to approve the changes. I'd appreciate it however if someone else would also review.

@holgerd77
Copy link
Member

Ok, will merge here now. I would suggest we release this as v3.1.0?

@holgerd77 holgerd77 merged commit b8f612b into ethereumjs:master Jun 22, 2019
@s1na
Copy link
Contributor

s1na commented Jun 24, 2019

I'll have to look deeper, but just having a quick look at the previous merged PRs I think we might have to do a v4.0.0 release, as there were some changes to the API.

A tiny change (this doesn't warrant a major release though) is #83. But the major change is #74. It drops the getRaw and other ..Raw methods, and requires passing a DB instance to the constructor.

One option (that I tend to favor) is to add a simple wrapper to baseTrie for now which calls db.get, etc. (for backwards-compatibility), and don't change how the constructor accepts the db. Then we can do a v3.1.0 release.

Something else I suggest we change is the ES6 export syntax introduced in #71 in the util files (e.g. https://github.com/ethereumjs/merkle-patricia-tree/pull/71/files#diff-6ab8792413c5dfbbbfa72dfcdedbb5d1). I had some problems with it in the browser.

If you agree with the above suggestions I can go ahead and prepare a PR.

@holgerd77
Copy link
Member

Sounds good, would be great if you prepare a PR.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants