Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

harden and speed up block sync #3358

Merged
merged 2 commits into from
Feb 7, 2022
Merged

harden and speed up block sync #3358

merged 2 commits into from
Feb 7, 2022

Conversation

arnetheduck
Copy link
Member

The GetBlockBy* server implementation currently reads SSZ bytes from
database, deserializes them into a Nim object then serializes them right
back to SSZ - here, we eliminate the deser/ser steps and send the bytes
straight to the network. Unfortunately, the snappy recoding must still
be done because of differences in framing.

Also, the quota system makes one giant request for quota right before
sending all blocks - this means that a 1024 block request will be
"paused" for a long time, then all blocks will be sent at once causing a
spike in database reads which potentially will see the reading client
time out before any block is sent.

Finally, on the reading side we make several copies of blocks as they
travel through various queues - this was not noticeable before but
becomes a problem in two cases: bellatrix blocks are up to 10mb (instead
of .. 30-40kb) and when backfilling, we process a lot more of them a lot
faster.

  • fix status comparisons for nodes syncing from genesis (harden status message handling #3327 was a bit
    too hard)
  • don't hit database at all for post-altair slots in GetBlock v1
    requests

@github-actions
Copy link

github-actions bot commented Feb 3, 2022

Unit Test Results

     12 files  ±0     806 suites  ±0   31m 28s ⏱️ - 2m 5s
1 652 tests ±0  1 606 ✔️ +2    46 💤  - 2  0 ±0 
9 665 runs  ±0  9 565 ✔️ +4  100 💤  - 4  0 ±0 

Results for commit adc1682. ± Comparison against base commit 49282e9.

♻️ This comment has been updated with latest results.

The `GetBlockBy*` server implementation currently reads SSZ bytes from
database, deserializes them into a Nim object then serializes them right
back to SSZ - here, we eliminate the deser/ser steps and send the bytes
straight to the network. Unfortunately, the snappy recoding must still
be done because of differences in framing.

Also, the quota system makes one giant request for quota right before
sending all blocks - this means that a 1024 block request will be
"paused" for a long time, then all blocks will be sent at once causing a
spike in database reads which potentially will see the reading client
time out before any block is sent.

Finally, on the reading side we make several copies of blocks as they
travel through various queues - this was not noticeable before but
becomes a problem in two cases: bellatrix blocks are up to 10mb (instead
of .. 30-40kb) and when backfilling, we process a lot more of them a lot
faster.

* fix status comparisons for nodes syncing from genesis (#3327 was a bit
too hard)
* don't hit database at all for post-altair slots in GetBlock v1
requests
# TODO Semantically, this request should return a non-ref, but doing so
# runs into extreme inefficiency due to the compiler introducing
# hidden copies - in future nim versions, this should be revisited
# TODO This code is more complicated than it needs to be, in order to please
Copy link
Contributor

@zah zah Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this code be simpler? It already looks pretty devoid of boilerplate or any other elements obscuring the intention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing a client function manually or separately from the server, in a way that has the right signature (that is different from the signature of the server, because bytes and ref and so on) would be a lot easier to read and audit, as a consumer of the code (like rest).

Basically, a client for an RPC and a server implementation are two separate entities and it's useful to think of them that way as well: it's wrong for example that in order to consume and RPC, I have to import everything that's needed to implement the server - most of the time I'm looking for the client code, the first reaction is to look at this implementation.

Copy link
Contributor

@zah zah Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most people reading this comment will assume it claims that the body of the function below is more complicated than it needs to be. You could be a bit more specific in the comment perhaps.

Otherwise, what you describe above is not "simpler" - it will be more complicated in the sense that it requires more modules and definitions that must be kept in sync, but I do agree that it optimizes the metrics cited by you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, even this code is more complicated: it uses a typed MultipleChunksResponse with a type that never is used for forming the response - instead we go into "raw" mode with a special function whose only role is to throw away the unused type - a straightforward implementation would get rid of these indirections as well.

Complexity includes the complexity of understanding the code, not just writing it - in fact, I tend to find it a good investment to maximise simplicity for the reader at the expense of the writer: therefore it ends up "globally" more simple - choosing what information to hide and what to be explicit has a direct effect on the perceived complexity of using the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else:
(await peer.beaconBlocksByRoot(BlockRootsList items)).map() do (blcks: seq[phase0.SignedBeaconBlock]) -> auto:
blcks.mapIt(ForkedSignedBeaconBlock.init(it))
(await beaconBlocksByRoot(peer, BlockRootsList items)).map(
Copy link
Contributor

@zah zah Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still quite a lot of copying going on here. Would you like the network layer to allocate the blocks as refs directly? We can also bite the bullet and start using refs inside the branches of ForkedSignedBeaconBlock to avoid more copying in other places as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start using refs inside the branches of ForkedSignedBeaconBlock to avoid more copying in other places as well.

this goes a bit too far - it would infect the spec code with the weak guarantees of ref - already, the tests needed changing because of mutability difference between value types and reference types, and this is not a source of bugs I'd like to have to consider inside the security moat that chaindag is.

also, it's actually not that hard to do the code avoid these copies as well - but semantically, what I really am waiting for is move + lent support in the language - with that in place we can write semantically correct code and avoid most copies in a principled way. The profiler says that this copy is not really an issue, but should it turn into one, there's a simple way to avoid it as well, specially if we drop the v1 request (which should be safe to do).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, it's possible to use ref fields inside the "Forked" types and still have a similar external views over the data if we introduce some accessors in place of the currently directly exposed fields. The accessors will dereference the ref fields and will preserve the property that immutable Forked object can provide you only with immutable fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, it's a good workaround to keep in mind should we find that we need it - at this point, even at backfill speeds we really don't - also because it's only needed in asynchronous layers, in general - in non-async code, we're already pretty good at avoiding copies - this is another reason to prefer explicit ref Xxx over ref object: we can be precise about where a reference is necessary (either for semantic or efficiency reasons).

The sync manager code in particular was vulnerable to this effect because it was shuffling seq:s of blocks around in several queues - it would in fact have been possible to use a ref seq[Forked... instead for the same efficiency benefit.

all that said, I'm not strongly opposed to the idea, but I find it an ugly hack / workaround that I would only introduce if forced, since better options (hopefully) will work in future nim:s.

This also reminds me of the importance of having let a = b.x be an alias, and not a copy - this is a critical change that needs to happen in nim for all my beautiful theory to become practical - that, and fixing all the unnecessary-temporary bugs, of which there are plenty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in writing this comment, I started liking the ref seq idea even more, fwiw ;)

@zah zah merged commit c7abc97 into unstable Feb 7, 2022
@zah zah deleted the harden-block-sync branch February 7, 2022 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants