-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ExecutionTimeEstimate
mode for congestion control
#20994
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
2 Skipped Deployments
|
fc51819
to
8170065
Compare
- `ExecutionTimeObserver` tracks local measurements of execution time and submits observations to consensus when moving average changes by more than a threshold. - `ExecutionTimeEstimator` records execution time observations received from consensus and computes stake-weighted medians for use in congestion control. This PR contains the core implementation of the feature but is missing some important components required for use in production: - storage of received observations for crash recovery - propagation of observations to next epoch - metrics The heuristics it uses are as simple as possible for a first pass and will likely require some tuning.
8170065
to
221fce6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor issues, feel free to make the TODOs instead of fixing them immediately, since we aren't enabling this yet anyway.
@@ -1364,6 +1365,12 @@ impl SuiNode { | |||
} | |||
} | |||
|
|||
ExecutionTimeObserver::spawn( | |||
&epoch_store, | |||
Box::new(consensus_adapter.clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a Box of an Arc? Can we just do the Arc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a Box<dyn SubmitToConsensus>
SubmitToConsensus
trait is implemented for Arc<ConsensusAdapter>
if obs.estimates.len() | ||
> epoch_store | ||
.protocol_config() | ||
.max_programmable_tx_commands() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if max commands is a reasonable starting point for this value, lets put it behind an appropriately named method that just uses that value?
and - it seems unnecessarily high? you wouldn't expect to see a transaction with 1000 unique entry points
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you wouldn't expect to normally, but the limit is 1024 commands, so in theory you could?
the current implementation has no code to cap the message size beyond the command limit, so I think it is exactly correct for now to use this limit here
however i agree we may want to add batching, or a smaller observation limit - i put a TODO for that, prob best to add the separate limit at the same time as the (future) code that implements it?
if let Err(e) = tx_local_execution_time.try_send((ptb.clone(), timings, total_duration)) { | ||
// This channel should not overflow, but if it does, don't wait; just log an error | ||
// and drop the observation. | ||
warn!("failed to send local execution time to observer: {e}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably want to use a metric instead of a log for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, metrics are on the future-PR TODO list, but I put another one here specifically
let key = ExecutionTimeObservationKey::from_command(command); | ||
let local_observation = | ||
self.local_observations | ||
.entry(key.clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key is heavy enough that it might be worth it to avoid doing unnecessary clones here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from our chat for posterity: maybe start w metrics and measure utilization/throughput, as that would complicate the code here and not sure how big of a difference it makes?
if let Err(e) = self | ||
.consensus_adapter | ||
.submit_to_consensus(&[transaction], &epoch_store) | ||
.await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can block for some time, which will probably cause the channel to fill up.
It may be better to have a separate task that handles submissions. We can send observations to it via a channel, it can pull as many observations from the channel as it can without blocking, send them all in a batch, and repeat.
We probably also need to have some sender-side rate limits on how fast we submit things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what in here would block for a long time? afaict it does some basic checks and then spawns a separate task for the submission in submit_unchecked
?
added a todo for rate limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looked more, and even though submit_to_consensus
is an async trait, the real implementation appears to have zero await
s in it - the only thing that await
s is the mock
makes me think it might be worth making the trait non-async and then changing the mock to just panic if try_send
fails (it already has a very large channel buffer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good - i thought that this had a potentially long delay but obviously i was wrong
.enumerate() | ||
.filter_map(|(i, (_, duration))| { | ||
duration.map(|duration| { | ||
let authority_index: AuthorityIndex = i.try_into().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i as u32
seems fine here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost certainly you are right, but it seems better as a default to use the safe version unless there is a reason not to?
|
||
// Send a new observation through consensus if our current moving average | ||
// differs too much from the last one we shared. | ||
// TODO: Consider only sharing observations for entrypoints with congestion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a further TODO to note, which is that we shouldn't bother sharing if the consensus estimate already agrees with our local estimate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would require adding a dependency on the estimator to the observer, which is something I was trying to avoid, but yes it's something we def could consider.
updated the todo
16ae058
to
f54d36d
Compare
timings: &[ExecutionTiming], | ||
total_duration: Duration, | ||
) { | ||
assert_eq!(tx.commands.len(), timings.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this before - this assert is not valid. Timings can be shorter due to an abort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks
Description
ExecutionTimeObserver
tracks local measurements of execution time and submits observations to consensus when moving average changes by more than a threshold.ExecutionTimeEstimator
records execution time observations received from consensus and computes stake-weighted medians for use in congestion control.This PR contains the core implementation of the feature but is missing some important components required for use in production:
The heuristics it uses are as simple as possible for a first pass and will likely require some tuning.
Test plan
Added & updated unit tests. Disabled for now by protocol config.
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.