Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

harden status message handling #3327

Merged
merged 1 commit into from
Jan 27, 2022
Merged

harden status message handling #3327

merged 1 commit into from
Jan 27, 2022

Conversation

arnetheduck
Copy link
Member

Additional sanity checking of the status message exchanged during a
fresh connection:

  • check that head and finalized make sense, slot-wise
  • verify that finalized root lies on the canonical chain, when possible
  • re-check these things for every status message during sync

Additional sanity checking of the status message exchanged during a
fresh connection:

* check that head and finalized make sense, slot-wise
* verify that finalized root lies on the canonical chain, when possible
* re-check these things for every status message during sync
@github-actions
Copy link

github-actions bot commented Jan 26, 2022

Unit Test Results

     12 files  ±0     794 suites  ±0   36m 15s ⏱️ - 1m 11s
1 617 tests ±0  1 569 ✔️ ±0    48 💤 ±0  0 ±0 
9 517 runs  ±0  9 413 ✔️ ±0  104 💤 ±0  0 ±0 

Results for commit c7e88a6. ± Comparison against base commit fdb76b8.

♻️ This comment has been updated with latest results.

@zah zah merged commit 84b6ad8 into unstable Jan 27, 2022
@zah zah deleted the sync-msg-unviable branch January 27, 2022 16:46
arnetheduck added a commit that referenced this pull request Feb 3, 2022
The `GetBlockBy*` server implementation currently reads SSZ bytes from
database, deserializes them into a Nim object then serializes them right
back to SSZ - here, we eliminate the deser/ser steps and send the bytes
straight to the network. Unfortunately, the snappy recoding must still
be done because of differences in framing.

Also, the quota system makes one giant request for quota right before
sending all blocks - this means that a 1024 block request will be
"paused" for a long time, then all blocks will be sent at once causing a
spike in database reads which potentially will see the reading client
time out before any block is sent.

Finally, on the reading side we make several copies of blocks as they
travel through various queues - this was not noticeable before but
becomes a problem in two cases: bellatrix blocks are up to 10mb (instead
of .. 30-40kb) and when backfilling, we process a lot more of them a lot
faster.

* fix status comparisons for nodes syncing from genesis (#3327 was a bit
too hard)
* don't hit database at all for post-altair slots in GetBlock v1
requests
arnetheduck added a commit that referenced this pull request Feb 4, 2022
The `GetBlockBy*` server implementation currently reads SSZ bytes from
database, deserializes them into a Nim object then serializes them right
back to SSZ - here, we eliminate the deser/ser steps and send the bytes
straight to the network. Unfortunately, the snappy recoding must still
be done because of differences in framing.

Also, the quota system makes one giant request for quota right before
sending all blocks - this means that a 1024 block request will be
"paused" for a long time, then all blocks will be sent at once causing a
spike in database reads which potentially will see the reading client
time out before any block is sent.

Finally, on the reading side we make several copies of blocks as they
travel through various queues - this was not noticeable before but
becomes a problem in two cases: bellatrix blocks are up to 10mb (instead
of .. 30-40kb) and when backfilling, we process a lot more of them a lot
faster.

* fix status comparisons for nodes syncing from genesis (#3327 was a bit
too hard)
* don't hit database at all for post-altair slots in GetBlock v1
requests
zah pushed a commit that referenced this pull request Feb 7, 2022
* harden and speed up block sync

The `GetBlockBy*` server implementation currently reads SSZ bytes from
database, deserializes them into a Nim object then serializes them right
back to SSZ - here, we eliminate the deser/ser steps and send the bytes
straight to the network. Unfortunately, the snappy recoding must still
be done because of differences in framing.

Also, the quota system makes one giant request for quota right before
sending all blocks - this means that a 1024 block request will be
"paused" for a long time, then all blocks will be sent at once causing a
spike in database reads which potentially will see the reading client
time out before any block is sent.

Finally, on the reading side we make several copies of blocks as they
travel through various queues - this was not noticeable before but
becomes a problem in two cases: bellatrix blocks are up to 10mb (instead
of .. 30-40kb) and when backfilling, we process a lot more of them a lot
faster.

* fix status comparisons for nodes syncing from genesis (#3327 was a bit
too hard)
* don't hit database at all for post-altair slots in GetBlock v1
requests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants