-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exponential backoff for cluster connections. #121
Conversation
fn default() -> Self { | ||
const DEFAULT_CONNECTION_RETRY_EXPONENT_BASE: u32 = 2; | ||
const DEFAULT_CONNECTION_RETRY_FACTOR: u32 = 100; | ||
const DEFAULT_NUMBER_OF_CONNECTION_RETRIESE: u32 = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEFAULT_NUMBER_OF_CONNECTION_RETRIESE => DEFAULT_NUMBER_OF_CONNECTION_RETRIES
let info = get_connection_info(node, params)?; | ||
C::connect(info, response_timeout, connection_timeout, socket_addr).await | ||
let info = get_connection_info(node, params.clone())?; | ||
let counter = std::sync::atomic::AtomicU32::new(params.exponential_backoff.number_of_retries); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is that - if i'm not mistaking - when we try to refresh connections we actually block the entire client from new requests, since we lock the connection container with the write lock, right?
so what's the number_of_retries, for a single node? and what's the max waiting time?
since refresh_connections can be called with multiple connection identifiers, we should see if we need to free the lock in some stages so the client won't get fully blocked (for example, retry the refresh_connections function, rather than per node)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not sure why we want this feature - what youre doing is adding EBOFF on top of TCP retry mechanisms which is already some form of EBOFF, maxed out to 130 seconds.
see /proc/sys/net/ipv4/tcp_syn_retries
Why would we have two EBOFF on top of each others?
params.exponential_backoff.factor as u64, | ||
), | ||
multiplier: params.exponential_backoff.exponent_base as f64, | ||
max_elapsed_time: None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have max elapsed time set to tens of minutes (it this is that parameter means)
} | ||
}) | ||
}) | ||
.await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might block the method for a very long time?
@@ -75,6 +76,27 @@ impl RetryParams { | |||
} | |||
} | |||
|
|||
#[derive(Clone)] | |||
pub(crate) struct ExponentialBackoffStrategy { | |||
pub(crate) exponent_base: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this differs from ExponentialBackoff of std? Actually the std terminology is the correct one - you have a base and a multiplier.
Actually, we block new requests both by taking the lock, and by changing the state to redis-rs/redis/src/cluster_async/mod.rs Line 1579 in 2518c0f
PollComplete .
|
Issue #, if available:
valkey-io/valkey-glide#473
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.