-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Traffic Control] Add dynamic config via admin api #20887
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
2 Skipped Deployments
|
8e20a05
to
9f63abd
Compare
9f63abd
to
813327f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not the most familiar with this code, but LGTM
e5a02bc
to
8a45392
Compare
8a45392
to
6fa8705
Compare
161d531
to
4402ce6
Compare
…erences with a new instance containing old data, then shutdown old instance
2e88644
to
8f75876
Compare
8f75876
to
2bbaaa3
Compare
2bbaaa3
to
1ca093a
Compare
1ca093a
to
14c4643
Compare
14c4643
to
a6ee6c2
Compare
…; during downtime we ignore tally requests with log line
a6ee6c2
to
5958cb6
Compare
@@ -461,19 +462,27 @@ impl TestNConnIPPolicy { | |||
|
|||
fn handle_tally(&mut self, tally: TrafficTally) -> PolicyResponse { | |||
let client = if let Some(client) = tally.direct { | |||
warn!("TESTING -- Handling tally for client: {:?}", client); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be removed
if let Some(error_threshold) = error_threshold { | ||
match *self.error_policy.as_ref().unwrap().lock().await { | ||
TrafficControlPolicy::FreqThreshold(ref mut policy) => { | ||
policy.client_threshold = error_threshold; | ||
if let Some(dry_run) = dry_run { | ||
policy.config.dry_run = dry_run; | ||
} | ||
} | ||
TrafficControlPolicy::TestNConnIP(ref mut policy) => { | ||
policy.threshold = error_threshold; | ||
if let Some(dry_run) = dry_run { | ||
policy.config.dry_run = dry_run; | ||
} | ||
} | ||
_ => { | ||
return Err(SuiError::InvalidAdminRequest( | ||
"Unsupported prior error policy type during traffic control reconfiguration" | ||
.to_string(), | ||
)); | ||
} | ||
} | ||
} | ||
if let Some(spam_threshold) = spam_threshold { | ||
match *self.spam_policy.as_ref().unwrap().lock().await { | ||
TrafficControlPolicy::FreqThreshold(ref mut policy) => { | ||
policy.client_threshold = spam_threshold; | ||
if let Some(dry_run) = dry_run { | ||
policy.config.dry_run = dry_run; | ||
} | ||
} | ||
TrafficControlPolicy::TestNConnIP(ref mut policy) => { | ||
policy.threshold = spam_threshold; | ||
if let Some(dry_run) = dry_run { | ||
policy.config.dry_run = dry_run; | ||
} | ||
} | ||
_ => { | ||
return Err(SuiError::InvalidAdminRequest( | ||
"Unsupported prior spam policy type during traffic control reconfiguration" | ||
.to_string(), | ||
)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could you avoid repetition here using a helper func, eg:
async fn update_policy_threshold(
policy: &Arc<Mutex<TrafficControlPolicy>>,
threshold: u64,
dry_run: Option<bool>,
) -> Result<(), SuiError> {
match *policy.lock().await {
TrafficControlPolicy::FreqThreshold(ref mut policy) => {
[...]
@@ -738,6 +824,7 @@ async fn test_traffic_sketch_allowlist_mode() { | |||
} | |||
|
|||
async fn assert_traffic_control_ok(mut test_cluster: TestCluster) -> Result<(), anyhow::Error> { | |||
telemetry_subscribers::init_for_testing(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the reason for adding this to the different tests?
can you expand on this? without the enforcement is there a risk that trafficController blocks JSONRPC requests? We do run this on fullnodes too
is this by design? I think it's desirable for validators to enable traffic control only via the api, but I understand if that's adding too much complexity to this PR. |
Description
Implement an admin API endpoint to do a hot reconfiguration of the traffic control policy without clearing blocklists or existing count min sketch.
To facilitate this, the PR introduces a refactor that allows AuthorityState to hold a reference to TrafficController, and for all sui callsites to reference this rather than instantiating separately. This is ok for our purposes (and indeed is much cleaner) because sui-node enforces that validators do not run json rpc server. Additionally, we need to make the policy a member of the traffic controller struct in order to reference it from the main thread.
Things to note:
For now, this only works when the node is previously running traffic controller with a blocklist policy. i.e. it will fail if used to enable traffic control on a node that did not already have it enabled. It will also fail if called when the node is running an allowlist policy.
Given the above, there are two intended use patterns:
Note also that this currently is supported for rpc node and validator server traffic control use cases. Bridge node is not yet supported.
Test plan
How did you test the new or updated feature?
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.