harden status message handling #3327

arnetheduck · 2022-01-26T16:20:41Z

Additional sanity checking of the status message exchanged during a
fresh connection:

check that head and finalized make sense, slot-wise
verify that finalized root lies on the canonical chain, when possible
re-check these things for every status message during sync

Additional sanity checking of the status message exchanged during a fresh connection: * check that head and finalized make sense, slot-wise * verify that finalized root lies on the canonical chain, when possible * re-check these things for every status message during sync

github-actions · 2022-01-26T20:37:20Z

Unit Test Results

    12 files ±0   794 suites ±0 36m 15s ⏱️ - 1m 11s
1 617 tests ±0 1 569 ✔️ ±0   48 💤 ±0 0 ❌ ±0
9 517 runs ±0 9 413 ✔️ ±0 104 💤 ±0 0 ❌ ±0

Results for commit c7e88a6. ± Comparison against base commit fdb76b8.

♻️ This comment has been updated with latest results.

The `GetBlockBy*` server implementation currently reads SSZ bytes from database, deserializes them into a Nim object then serializes them right back to SSZ - here, we eliminate the deser/ser steps and send the bytes straight to the network. Unfortunately, the snappy recoding must still be done because of differences in framing. Also, the quota system makes one giant request for quota right before sending all blocks - this means that a 1024 block request will be "paused" for a long time, then all blocks will be sent at once causing a spike in database reads which potentially will see the reading client time out before any block is sent. Finally, on the reading side we make several copies of blocks as they travel through various queues - this was not noticeable before but becomes a problem in two cases: bellatrix blocks are up to 10mb (instead of .. 30-40kb) and when backfilling, we process a lot more of them a lot faster. * fix status comparisons for nodes syncing from genesis (#3327 was a bit too hard) * don't hit database at all for post-altair slots in GetBlock v1 requests

* harden and speed up block sync The `GetBlockBy*` server implementation currently reads SSZ bytes from database, deserializes them into a Nim object then serializes them right back to SSZ - here, we eliminate the deser/ser steps and send the bytes straight to the network. Unfortunately, the snappy recoding must still be done because of differences in framing. Also, the quota system makes one giant request for quota right before sending all blocks - this means that a 1024 block request will be "paused" for a long time, then all blocks will be sent at once causing a spike in database reads which potentially will see the reading client time out before any block is sent. Finally, on the reading side we make several copies of blocks as they travel through various queues - this was not noticeable before but becomes a problem in two cases: bellatrix blocks are up to 10mb (instead of .. 30-40kb) and when backfilling, we process a lot more of them a lot faster. * fix status comparisons for nodes syncing from genesis (#3327 was a bit too hard) * don't hit database at all for post-altair slots in GetBlock v1 requests

zah merged commit 84b6ad8 into unstable Jan 27, 2022

zah deleted the sync-msg-unviable branch January 27, 2022 16:46

arnetheduck mentioned this pull request Feb 3, 2022

harden and speed up block sync #3358

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harden status message handling #3327

harden status message handling #3327

arnetheduck commented Jan 26, 2022

github-actions bot commented Jan 26, 2022 •

edited

Loading

harden status message handling #3327

harden status message handling #3327

Conversation

arnetheduck commented Jan 26, 2022

github-actions bot commented Jan 26, 2022 • edited Loading

Unit Test Results

github-actions bot commented Jan 26, 2022 •

edited

Loading