Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trie/pathdb: state iterator (snapshot integration pt 4) #30654

Merged
merged 2 commits into from
Dec 16, 2024

Conversation

rjl493456442
Copy link
Member

@rjl493456442 rjl493456442 commented Oct 22, 2024

It's highly recommended to review the changes by comparing the files directly.

In this pull request, the state iterator is implemented. It's mostly a copy-paste from
the original state snapshot package, but still has some important changes to highlight
here:

(a) The iterator for the disk layer consists of a diff iterator and a disk iterator.

Originally, the disk layer in the state snapshot was a wrapper around the disk, and
its corresponding iterator was also a wrapper around the disk iterator. However,
due to structural differences, the disk layer iterator is divided into two parts:

  • The disk iterator, which traverses the content stored on disk.
  • The diff iterator, which traverses the aggregated state buffer.

Checkout BinaryIterator and FastIterator for more details.


(b) The staleness management is improved in the diffAccountIterator and diffStorageIterator

Originally, in the diffAccountIterator, the layer’s staleness had to be checked within the Next
function to ensure the iterator remained usable. Additionally, a read lock on the associated diff
layer was required to first retrieve the account blob. This read lock protection is essential to
prevent concurrent map read/write. Afterward, a staleness check was performed to ensure the
retrieved data was not outdated.

The entire logic can be simplified as follows: a loadAccount callback is provided to retrieve
account data. If the corresponding state is immutable (e.g., diff layers in the path database),
the staleness check can be skipped, and a single account data retrieval is sufficient. However,
if the corresponding state is mutable (e.g., the disk layer in the path database), the callback
can operate as follows:

func(hash common.Hash) ([]byte, error) {
    dl.lock.RLock()
    defer dl.lock.RUnlock()

    if dl.stale {
        return nil, errSnapshotStale
    }
    return dl.buffer.states.mustAccount(hash)
}

The callback solution can eliminate the complexity for managing concurrency with the read
lock for atomic operation.

@rjl493456442 rjl493456442 force-pushed the snapshot-integration-p4 branch 2 times, most recently from 0c0db5a to 409da23 Compare October 23, 2024 06:10
@holiman holiman changed the title Snapshot integration p4 trie/pathdb: state iterator (snapshot integration pt 2) Oct 23, 2024
@holiman holiman changed the title trie/pathdb: state iterator (snapshot integration pt 2) trie/pathdb: state iterator (snapshot integration pt 4) Oct 23, 2024
@rjl493456442 rjl493456442 force-pushed the snapshot-integration-p4 branch 4 times, most recently from aac2c48 to 352e348 Compare December 2, 2024 02:12
@rjl493456442 rjl493456442 force-pushed the snapshot-integration-p4 branch from 352e348 to 6d1581a Compare December 10, 2024 02:12
triedb/pathdb/database.go Outdated Show resolved Hide resolved
// holdableIterator is a wrapper of underlying database iterator. It extends
// the basic iterator interface by adding Hold which can hold the element
// locally where the iterator is currently located and serve it up next time.
type holdableIterator struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't a great name; how about cachingIterator ? the elements that are "held" are basically cached so that you don't have to iterate the 2nd time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or "replayableIterator", since the iteration is "replayable"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or "pin"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am open for renaming it, for sure. But I don't want to change it in this pull request.

It's a copy-paste from the state snapshot and it's easier to not change it for review.

}
it.key = common.CopyBytes(it.it.Key())
it.val = common.CopyBytes(it.it.Value())
it.atHeld = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit weird that a function called Hold will set atHeld to false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's polish it in the following PRs, this pr is already huge enough and let's focus on the core changes.

if !bytes.Equal(it.Value(), []byte(content[order[idx]])) {
t.Errorf("item %d: value mismatch: have %s, want %s", idx, string(it.Value()), content[order[idx]])
}
// Should be safe to call discard multiple times
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean "hold" instead of "discard" ?

Comment on lines +29 to +37
a Iterator
b Iterator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have more descriptive names, e.g. left and right ? Same thing with k. Is that a key ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about a primary and a secondary iterator. IMO a and b is slightly better than left and right. And yes, k is the current key. I guess it could be renmed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to rename in a following PR.

// newBinaryAccountIterator creates a simplistic account iterator to step over
// all the accounts in a slow, but easily verifiable way.
//
// nolint:all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this nolint here ? I don't see anything obviously wrong.

Comment on lines +41 to +46
switch bytes.Compare(hashI[:], hashJ[:]) {
case -1:
return -1
case 1:
return 1
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
switch bytes.Compare(hashI[:], hashJ[:]) {
case -1:
return -1
case 1:
return 1
}
comparison := bytes.Compare(hashI[:], hashJ[:])
if comparison != 0 {
return comparison
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to change it in a following PR.

@@ -98,6 +98,17 @@ func (s *stateSet) account(hash common.Hash) ([]byte, bool) {
return nil, false // account is unknown in this set
}

// mustAccount returns the account data associated with the specified address
// hash. The difference is this function will return an error if the account
// is not found.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it the wrong name then ? For example, we used to have MustCommit which would panic if it couldn't commit the block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah maybe tryAccount?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let me find a more reasonable name

@@ -110,6 +121,19 @@ func (s *stateSet) storage(accountHash, storageHash common.Hash) ([]byte, bool)
return nil, false // storage is unknown in this set
}

// mustStorage returns the storage slot associated with the specified address
// hash and storage key hash. The difference is this function will return an
// error if the storage slot is not found.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, most code is is copied. However

if the corresponding state is mutable (e.g., the disk layer in the path database), the callback can operate as follows:

Where is that located?

Comment on lines +29 to +37
a Iterator
b Iterator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about a primary and a secondary iterator. IMO a and b is slightly better than left and right. And yes, k is the current key. I guess it could be renmed

@@ -98,6 +98,17 @@ func (s *stateSet) account(hash common.Hash) ([]byte, bool) {
return nil, false // account is unknown in this set
}

// mustAccount returns the account data associated with the specified address
// hash. The difference is this function will return an error if the account
// is not found.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah maybe tryAccount?

priority: depth + 1,
})
case *diffLayer:
// The state set in diff layer is immutable and will never be stale,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we never merge into layers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For diff layers in the pathdb, yes they are immutable and we never modify the content of them.

The only mutable container is the buffer in the disk layer, which will accept the state set from the diff layers on top.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only mutable container is the buffer in the disk layer, which will accept the state set from the diff layers on top.

That's nice. We could do the same in the snap layers. Instead of having n mutable layers, where only the bottom-most difflayer is the one that is actually mutable, we could treat them differently.

@rjl493456442
Copy link
Member Author

holiman
holiman previously approved these changes Dec 13, 2024
Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

gballet
gballet previously approved these changes Dec 13, 2024
Copy link
Member

@gballet gballet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went over your responses. LGTM

@rjl493456442 rjl493456442 dismissed stale reviews from gballet and holiman via 5a5a70e December 16, 2024 07:01
@rjl493456442 rjl493456442 force-pushed the snapshot-integration-p4 branch 3 times, most recently from d5fbbb8 to 6cef4a6 Compare December 16, 2024 07:14
@rjl493456442 rjl493456442 force-pushed the snapshot-integration-p4 branch from 6cef4a6 to 4ab8eef Compare December 16, 2024 07:27
Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rjl493456442 rjl493456442 added this to the 1.14.13 milestone Dec 16, 2024
@rjl493456442 rjl493456442 merged commit bc1ec69 into ethereum:master Dec 16, 2024
3 checks passed
@fjl fjl modified the milestones: 1.14.13, 1.15.0 Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants