v2.1: rpc: improve latency by not blocking worker threads polling IO notifications (backport of #3242) #4412
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Some RPC operations are CPU bound and run for a significant amount of time. Those operations end up blocking worker threads that are also used to handle IO notifications, leading to notifications not being polled often enough and so for the whole RPC server to potentially become slow and exhibit high latency. When latency gets high enough it can exceed request timeouts, leading to failed requests.
Summary of Changes
This PR makes some of the most CPU expensive RPC methods use
tokio::task::spawn_blocking
to run cpu hungry code. This way the worker threads doing IO don't get blocked and latency is improved.The methods changed so far include:
getMultipleAccounts
getProgramAccounts
getAccountInfo
getTokenAccountsByDelegate
getTokenAccountsByOwner
I'm not super familiar with RPC so I've changed what looking at the code seems to be loading/copying a lot of data around. Please feel free to suggest more!
Test plan
Methodolgy for selection of CPU defaults
Run this
blocks
benchmark script while tweaking CPU params. This was run on a 48 CPU machine.rpc_threads
rpc_blocking_threads
Methodology
Using this script for computing metrics: https://gist.github.com/steveluscher/b4959b9601093b0009f1d7646217b030, ran each of these
account-cluster-bench
suites before and after this PR:account-info
block
blocks
first-available-block
multiple-accounts
slot
supply
token-accounts-by-delegate
token-accounts-by-owner
token-supply
transaction
transaction-parsed
version
Using a command similar to this:
Note
You can adjust the
sleep 15
if you want the validator to stack up more slots before starting the bench.Warning
When running benches that require token accounts, supply a
mint
,space
, and actually create the token account using the fixture found here.Results
Warning
These results are a little messed up, because what's actually happening here is that the benchmark script is spitting out averages in 3s windows. The avg/p50/p90/p99 of those numbers is what you're seeing in this table. Not correct, but directionally correct.
Note
Filling in this grid would take a long time, especially if run against a mainnet RPC with production traffic. We may just choose to land this as ‘certainly better, how much we can't say exactly.’
account-info
block
blocks
first-available-block
multiple-accounts
slot
supply
token-accounts-by-delegate
token-accounts-by-owner
token-supply
transaction
transaction-parsed
version