Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rehashing support for inclusion + consistency proofs. #305

Merged
merged 13 commits into from
Jan 30, 2017

Conversation

Martin2112
Copy link
Contributor

This allows us to serve proofs at arbitrary tree sizes. Snapshot recomputation annotates portions of the path that require rehashing and this is then used to recalculate a subtree node when storage returns the nodes. Add some proof tests at the node / path calculation level to improve coverage. If proofs are at STH corresponding sizes the new code paths are not taken so probably safe!

Sorry this PR is a bit big but it was proving tricky to separate out.

Copy link
Contributor

@codingllama codingllama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've still got to have a better look at
"server/proof_fetcher_test.go" (just glanced into it, haven't really looked into the test tables or TestTreeX tests).

The other files I think I've reviewed properly, though.

return []NodeFetch{}, fmt.Errorf("invalid params ts: %d index: %d, bitlen:%d", treeSize, index, maxBitLen)
func CalcInclusionProofNodeAddresses(snapshot, index, treeSize int64, maxBitLen int) ([]NodeFetch, error) {
if snapshot > treeSize || index >= snapshot || index < 0 || snapshot < 1 || maxBitLen <= 0 {
return []NodeFetch{}, fmt.Errorf("invalid params s: %d index: %d ts: %d, bitlen:%d", snapshot, index, treeSize, maxBitLen)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: s/[]NodeFetch{}/nil

Ditto for others occurrences.

For consistency with my previous nits. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return []NodeFetch{}, fmt.Errorf("invalid params prior: %d treesize: %d, bitlen:%d", previousTreeSize, treeSize, maxBitLen)
func CalcConsistencyProofNodeAddresses(snapshot1, snapshot2, treeSize int64, maxBitLen int) ([]NodeFetch, error) {
if snapshot1 > snapshot2 || snapshot1 > treeSize || snapshot2 > treeSize || snapshot1 < 1 || snapshot2 < 1 || maxBitLen <= 0 {
return []NodeFetch{}, fmt.Errorf("invalid params s1: %d s2: %d tss: %d, bitlen:%d", snapshot1, snapshot2, treeSize, maxBitLen)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/tss/ts/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// tree state at the snapshot size differs from the size we've stored it at. The calculations
// also need to take into account missing levels, see the tree diagrams in this file.
// If called with snapshot equal to the tree size returns empty. Otherwise, assuming no errors,
// the output of this should always be exactly one node. Either a copy of one of the nodes in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the output of this should always be exactly one node". I think it should say "at least one node".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rehashing it's one node. Added a clarification.

// Nothing to do
return []NodeFetch{}, nil
} else if snapshot > treeSize {
return fetches, fmt.Errorf("recomputePastSnapshot: %d does not exist for tree of size %d", snapshot, treeSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: return nil, fmt.Errorf("...")

Yes, "fetches" is empty here, but it seems a bit strange to return a named variable together with an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// This is the index of the last node that actually exists in the underlying tree
lastNodeAtLevel := treeSize - 1

// Work up towards, the root we may find the node we need without needing to rehash if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "Work up towards, the root we may find..." sounds a bit strange. Should it be: "Work up towards the root. We may find ... "?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah typo fixed.

return trillian.Proof{LeafIndex: leafIndex, ProofNode: r.proof}, r.proofError
}

// dedupAndFetchNodes() removes duplicates from the set of fetches and then passes the result to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I understood that it removes duplicates and returns a "deduped" slice, which would make the index matching at line 27 break. We should probably mention that it dedups the query, but returns duplicates in the resulting slice.

var rehashTests = []rehashTest{
{
desc: "no rehash",
index: int64(126),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to see how the index matches the leaves / tree size. Could you add a comment to clarify? (Ditto for others.)

Maybe I should take a break :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just testing that the right value gets copied in. May not be actually required as I can't remember why it's there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could remove them it would be better. They definitely confused me. (Still do tbh.)


func TestRehasher(t *testing.T) {
for _, rehashTest := range rehashTests {
r := newRehasher()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would prefer to run the tests against fetchNodesAndBuildProof() instead of directly using the rehasher. We're reaching inside private functions anyway, but the boundary of proof_fetcher.go seems to be fetchNodesAndBuildProof.

Ditto for TestDedupFetcher / dedupAndFetchNodes below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rehasher is notionally separate API and I did consider pulling it out but it's not a massive amount of code. Most tests do call fetchNodesAndBuildProof. The dedup stuff has gone now.

nodes, err := dedupAndFetchNodes(tx, 37, dedupTest.input)

if err == nil && dedupTest.storageError != nil {
t.Fatalf("%s: got nil, want error: %v", dedupTest.desc, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.Errorf and continue? Or maybe do the checks inside a switch? It doesn't seem like a mismatch here should stop other test cases.

Ditto for the Fatal below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure because it's just going to dump hex strings that don't match. I think printing a bunch of these is no extra help but willing to be convinced otherwise.

if err != nil && dedupTest.storageError == nil {
t.Fatalf("%s: got error: %v, want nil", dedupTest.desc, err)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If we get a (correct) error response we're still going ahead and checking got != want, but instead we should check that the errors match and stop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was removed with the dedup code.

// are valid. There must be at least one fetch. All fetches must have the same rehash state and if
// there is only one fetch then it must not be a rehash. If all checks pass then the fetches
// represent one node after rehashing is completed.
func checkRecomputation(fetches []NodeFetch) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. SGTM.

nodes = append(nodes, node)
}

for i, node := range nodes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit strange. Having a package check its own logic is one thing, assuming the storage implementation might break in some random manner is another. If we can't trust queries by ID we're in a rough place.

If you feel this adds a real benefit, please push back. I wanted to try to make my case once more, but I'll stop now.

var n2n3n4 = &trillian.Node{NodeHash: th.HashChildren(h4, th.HashChildren(h3, h2))}
var n4n5 = &trillian.Node{NodeHash: th.HashChildren(h5, h4)}

var rehashTests = []rehashTest{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(still a nit) IMO, the smaller the scope the better. If those matter to a single test, having it inside the test both tells me that and moves them closer to where they're used.

var rehashTests = []rehashTest{
{
desc: "no rehash",
index: int64(126),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could remove them it would be better. They definitely confused me. (Still do tbh.)

for ts := 2; ts <= 32; ts++ {
mt := treeAtSize(ts)
r := testonly.NewMultiFakeNodeReaderFromLeaves([]testonly.LeafBatch{
{TreeRevision: 3, Leaves: expandLeaves(0, ts-1), ExpectedRoot: expectedRootAtSize(mt)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Extract "3" to a treeRevision variable and use it below, so it doesn't look like a magic number at other lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


for s := 2; s <= ts; s++ {
for l := 0; l < s; l++ {
fetches, err := merkle.CalcInclusionProofNodeAddresses(int64(s), int64(l), int64(ts), 64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Extract maxBitLen := 64

})

for s := 2; s <= ts; s++ {
for l := 0; l < s; l++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I realize Go has this thing with short variable names, but I would much prefer "s" to be "snapshot", "l" to be "leaf", "r" to be "nodeReader" and so on. It's a bit hard to parse the CalcInclusionProof call below without jumping around a bit.


for s := 2; s <= 32; s++ {
for l := 0; l < s; l++ {
fetches, err := merkle.CalcInclusionProofNodeAddresses(int64(s), int64(l), 32, 64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Same readability comments as above (longer names, extracting magic numbers to named vars).


for s1 := 2; s1 < ts; s1++ {
for s2 := s1 + 1; s2 < ts; s2++ {
fetches, err := merkle.CalcConsistencyProofNodeAddresses(int64(s1), int64(s2), int64(ts), 64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: As above (readability)

(This one is actually easier to parse than the others.)

if err != nil {
panic(err)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Remove vertical whitespace?

(Same for the following functions.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Still think this looks ugly.

@codingllama
Copy link
Contributor

Please address the remaining comments, but feel free to merge afterwards. They're mostly nits anyway.

@Martin2112
Copy link
Contributor Author

Rebased to pick up recent changes before starting rework.

Copy link
Contributor

@codingllama codingllama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved (barring merge conflicts)

@Martin2112 Martin2112 merged commit 21fa859 into google:master Jan 30, 2017
@Martin2112 Martin2112 deleted the add_rehashing branch January 30, 2017 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants