Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async batch verification (+40% sig verification throughput) #5176

Merged
merged 5 commits into from
Aug 3, 2023

Conversation

arnetheduck
Copy link
Member

When batch verification is done, the main thread is blocked reducing concurrency.

With this PR, the new thread signalling primitive in chronos is used to offload the full batch verification process to a separate thread allowing the main threads to continue async operations while the other threads verify signatures.

Similar to previous behavior, the number of ongoing batch verifications is capped to prevent runaway resource usage.

In addition to the asynchronous processing, 3 addition changes help drive throughput:

  • A loop is used for batch accumulation: this prevents a stampede of small batches in eager mode where both the eager and the scheduled batch runner would pick batches off the queue, prematurely picking "fresh" batches off the queue
  • An additional small wait is introduced for small batches - this helps create slightly larger batches which make better used of the increased concurrency
  • Up to 2 batches are scheduled to the threadpool during high pressure, reducing startup latency for the threads

Together, these changes increase attestation verification throughput under load up to 40%.

More CPU cores used (110 vs 140% usage - the cores here are running other nimbus' - on my node, CPU hovers around 180% usage after this PR):
image

Better attestation throughput (7k -> ~10k/slot)
image

Increased batch size (leading to more efficient verification):
image

Depends on:
status-im/nim-chronos#406
status-im/nim-blscurve#154
status-im/nim-taskpools#33
#5169

Copy link
Contributor

@mratsim mratsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent.

In that case we can probably go a bit further, create the Taskpool directly in another thread.

Because in this case, assuming you have a Rpi with 4 cores, only 3 will work on cryptography and there is 1 which will handle the rest but I think the rest might not be heavy enough to fully utilize it.

In that case we have a slight oversubscription but the OS routinely handles that fine.

Main issue is changing taskpools to handle work submitted from a thread that is not part of the threadpool.

# If the hardware is too slow to keep up or an event caused a temporary
# buildup of signature verification tasks, the batch will be dropped so as to
# recover and not cause even further buildup - this puts an (elastic) upper
# bound on the amount of queued-up work
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to the dropped work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the attestations are dropped instead of being processed -> no gossip etc

@github-actions
Copy link

github-actions bot commented Jul 10, 2023

Unit Test Results

         9 files  ±0    1 077 suites  ±0   44m 54s ⏱️ + 2m 28s
  3 723 tests ±0    3 444 ✔️ ±0  279 💤 ±0  0 ±0 
15 874 runs  ±0  15 569 ✔️ ±0  305 💤 ±0  0 ±0 

Results for commit 7d4cbaa. ± Comparison against base commit 32e8bfe.

♻️ This comment has been updated with latest results.

Base automatically changed from pre-check-block to unstable July 11, 2023 16:55
# Pruning the queue here makes sure we catch up with processing if need be
batchCrypto.pruneBatchQueue() # Skip old batches
proc batchVerifyTask(task: ptr BatchTask) {.nimcall.} =
# Task suitable for running in taskpools - look, no GC!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see you're just skipping the GC.

@arnetheduck
Copy link
Member Author

In that case we can probably go a bit further, create the Taskpool directly in another thread.

with a Flowvar in hand, one can wait for specific tasks instead of the whole pool so which thread "owns" the pool should ideally be less critical - it's already an mpmc:ish kind of setup where tasks can spawn more tasks and wait for tasks from anywhere, no? This, along with a preference to post tasks to not-this-thread.

potentially one should allow Flowvar[void] for synchronization purposes when the data is transferred out-of-band as often happens.

@zah zah force-pushed the multi-parallel branch from 8aae9dc to 649d7d3 Compare July 27, 2023 13:33
@zah zah marked this pull request as ready for review July 27, 2023 13:33
arnetheduck and others added 3 commits August 1, 2023 18:45
When batch verification is done, the main thread is blocked reducing
concurrency.

With this PR, the new thread signalling primitive in chronos is used to
offload the full batch verification process to a separate thread
allowing the main threads to continue async operations while the other
threads verify signatures.

Similar to previous behavior, the number of ongoing batch verifications
is capped to prevent runaway resource usage.

In addition to the asynchronous processing, 3 addition changes help
drive throughput:

* A loop is used for batch accumulation: this prevents a stampede of
small batches in eager mode where both the eager and the scheduled batch
runner would pick batches off the queue, prematurely picking "fresh"
batches off the queue
* An additional small wait is introduced for small batches - this helps
create slightly larger batches which make better used of the increased
concurrency
* Up to 2 batches are scheduled to the threadpool during high pressure,
reducing startup latency for the threads

Together, these changes increase attestation verification throughput
under load up to 30%.
@zah zah force-pushed the multi-parallel branch from 649d7d3 to f84faaf Compare August 1, 2023 15:45
@zah zah merged commit b8a3241 into unstable Aug 3, 2023
@zah zah deleted the multi-parallel branch August 3, 2023 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants