Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ExecutionTimeEstimate mode for congestion control #20994

Merged
merged 4 commits into from
Jan 31, 2025

Conversation

aschran
Copy link
Contributor

@aschran aschran commented Jan 28, 2025

Description

  • ExecutionTimeObserver tracks local measurements of execution time and submits observations to consensus when moving average changes by more than a threshold.

  • ExecutionTimeEstimator records execution time observations received from consensus and computes stake-weighted medians for use in congestion control.

This PR contains the core implementation of the feature but is missing some important components required for use in production:

  • storage of received observations for crash recovery
  • propagation of observations to next epoch
  • metrics

The heuristics it uses are as simple as possible for a first pass and will likely require some tuning.

Test plan

Added & updated unit tests. Disabled for now by protocol config.


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • gRPC:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:

@aschran aschran requested review from a team and mystenmark as code owners January 28, 2025 16:18
Copy link

vercel bot commented Jan 28, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 30, 2025 10:43pm
2 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Jan 30, 2025 10:43pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Jan 30, 2025 10:43pm

- `ExecutionTimeObserver` tracks local measurements of execution time
  and submits observations to consensus when moving average changes by
  more than a threshold.

- `ExecutionTimeEstimator` records execution time observations received
  from consensus and computes stake-weighted medians for use in
  congestion control.

This PR contains the core implementation of the feature but is missing
some important components required for use in production:
- storage of received observations for crash recovery
- propagation of observations to next epoch
- metrics

The heuristics it uses are as simple as possible for a first pass and
will likely require some tuning.
Copy link
Contributor

@mystenmark mystenmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor issues, feel free to make the TODOs instead of fixing them immediately, since we aren't enabling this yet anyway.

@@ -1364,6 +1365,12 @@ impl SuiNode {
}
}

ExecutionTimeObserver::spawn(
&epoch_store,
Box::new(consensus_adapter.clone()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a Box of an Arc? Can we just do the Arc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a Box<dyn SubmitToConsensus>

SubmitToConsensus trait is implemented for Arc<ConsensusAdapter>

if obs.estimates.len()
> epoch_store
.protocol_config()
.max_programmable_tx_commands()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if max commands is a reasonable starting point for this value, lets put it behind an appropriately named method that just uses that value?

and - it seems unnecessarily high? you wouldn't expect to see a transaction with 1000 unique entry points

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you wouldn't expect to normally, but the limit is 1024 commands, so in theory you could?

the current implementation has no code to cap the message size beyond the command limit, so I think it is exactly correct for now to use this limit here

however i agree we may want to add batching, or a smaller observation limit - i put a TODO for that, prob best to add the separate limit at the same time as the (future) code that implements it?

if let Err(e) = tx_local_execution_time.try_send((ptb.clone(), timings, total_duration)) {
// This channel should not overflow, but if it does, don't wait; just log an error
// and drop the observation.
warn!("failed to send local execution time to observer: {e}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably want to use a metric instead of a log for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, metrics are on the future-PR TODO list, but I put another one here specifically

let key = ExecutionTimeObservationKey::from_command(command);
let local_observation =
self.local_observations
.entry(key.clone())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key is heavy enough that it might be worth it to avoid doing unnecessary clones here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from our chat for posterity: maybe start w metrics and measure utilization/throughput, as that would complicate the code here and not sure how big of a difference it makes?

if let Err(e) = self
.consensus_adapter
.submit_to_consensus(&[transaction], &epoch_store)
.await
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can block for some time, which will probably cause the channel to fill up.

It may be better to have a separate task that handles submissions. We can send observations to it via a channel, it can pull as many observations from the channel as it can without blocking, send them all in a batch, and repeat.

We probably also need to have some sender-side rate limits on how fast we submit things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what in here would block for a long time? afaict it does some basic checks and then spawns a separate task for the submission in submit_unchecked?

added a todo for rate limit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looked more, and even though submit_to_consensus is an async trait, the real implementation appears to have zero awaits in it - the only thing that awaits is the mock

makes me think it might be worth making the trait non-async and then changing the mock to just panic if try_send fails (it already has a very large channel buffer)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good - i thought that this had a potentially long delay but obviously i was wrong

.enumerate()
.filter_map(|(i, (_, duration))| {
duration.map(|duration| {
let authority_index: AuthorityIndex = i.try_into().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i as u32 seems fine here too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost certainly you are right, but it seems better as a default to use the safe version unless there is a reason not to?


// Send a new observation through consensus if our current moving average
// differs too much from the last one we shared.
// TODO: Consider only sharing observations for entrypoints with congestion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a further TODO to note, which is that we shouldn't bother sharing if the consensus estimate already agrees with our local estimate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would require adding a dependency on the estimator to the observer, which is something I was trying to avoid, but yes it's something we def could consider.
updated the todo

@aschran aschran requested a review from mystenmark January 30, 2025 17:32
timings: &[ExecutionTiming],
total_duration: Duration,
) {
assert_eq!(tx.commands.len(), timings.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this before - this assert is not valid. Timings can be shorter due to an abort

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks

@aschran aschran temporarily deployed to sui-typescript-aws-kms-test-env January 30, 2025 22:41 — with GitHub Actions Inactive
@aschran aschran merged commit d6ed8a5 into main Jan 31, 2025
47 checks passed
@aschran aschran deleted the aschran/cc-estimator-2 branch January 31, 2025 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants