Add exponential backoff for cluster connections. #121

nihohit · 2024-02-18T11:05:53Z

Issue #, if available:
valkey-io/valkey-glide#473

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

barshaul · 2024-02-18T16:52:18Z

redis/src/cluster_client.rs

+    fn default() -> Self {
+        const DEFAULT_CONNECTION_RETRY_EXPONENT_BASE: u32 = 2;
+        const DEFAULT_CONNECTION_RETRY_FACTOR: u32 = 100;
+        const DEFAULT_NUMBER_OF_CONNECTION_RETRIESE: u32 = 6;


DEFAULT_NUMBER_OF_CONNECTION_RETRIESE => DEFAULT_NUMBER_OF_CONNECTION_RETRIES

barshaul · 2024-02-18T16:59:31Z

redis/src/cluster_async/connections_logic.rs

-    let info = get_connection_info(node, params)?;
-    C::connect(info, response_timeout, connection_timeout, socket_addr).await
+    let info = get_connection_info(node, params.clone())?;
+    let counter = std::sync::atomic::AtomicU32::new(params.exponential_backoff.number_of_retries);


My only concern is that - if i'm not mistaking - when we try to refresh connections we actually block the entire client from new requests, since we lock the connection container with the write lock, right?
so what's the number_of_retries, for a single node? and what's the max waiting time?
since refresh_connections can be called with multiple connection identifiers, we should see if we need to free the lock in some stages so the client won't get fully blocked (for example, retry the refresh_connections function, rather than per node)

ikolomi

Im not sure why we want this feature - what youre doing is adding EBOFF on top of TCP retry mechanisms which is already some form of EBOFF, maxed out to 130 seconds.
see /proc/sys/net/ipv4/tcp_syn_retries
Why would we have two EBOFF on top of each others?

ikolomi · 2024-02-19T07:22:44Z

redis/src/cluster_async/connections_logic.rs

+            params.exponential_backoff.factor as u64,
+        ),
+        multiplier: params.exponential_backoff.exponent_base as f64,
+        max_elapsed_time: None,


We should have max elapsed time set to tens of minutes (it this is that parameter means)

ikolomi · 2024-02-19T07:29:12Z

redis/src/cluster_async/connections_logic.rs

+            }
+        })
+    })
+    .await


this might block the method for a very long time?

ikolomi · 2024-02-19T07:41:50Z

redis/src/cluster_client.rs

@@ -75,6 +76,27 @@ impl RetryParams {
    }
 }

+#[derive(Clone)]
+pub(crate) struct ExponentialBackoffStrategy {
+    pub(crate) exponent_base: u32,


Why does this differs from ExponentialBackoff of std? Actually the std terminology is the correct one - you have a base and a multiplier.

shachlanAmazon · 2024-02-28T14:19:25Z

My only concern is that - if i'm not mistaking - when we try to refresh connections we actually block the entire client from new requests, since we lock the connection container with the write lock, right?

Actually, we block new requests both by taking the lock, and by changing the state to ConnectionState::Recover(RecoverFuture::Reconnect(future)), which AFAIs blocks new requests. So no matter what we do, if we take a longer time to refresh connections, we'll take a longer time to receive new requests. We could change refresh_connections to only take the write lock on connection, but

redis-rs/redis/src/cluster_async/mod.rs

Line 1579 in 2518c0f

fn poll_ready(

this will be blocked until the state is PollComplete.

Add exponential backoff for cluster connections.

b17856c

shachlanAmazon requested review from barshaul and ikolomi February 18, 2024 11:06

barshaul reviewed Feb 18, 2024

View reviewed changes

ikolomi reviewed Feb 19, 2024

View reviewed changes

nihohit closed this Jul 12, 2024

nihohit deleted the cluster-backoff branch July 12, 2024 10:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add exponential backoff for cluster connections. #121

Add exponential backoff for cluster connections. #121

nihohit commented Feb 18, 2024 •

edited by shachlanAmazon

Loading

barshaul Feb 18, 2024

barshaul Feb 18, 2024

ikolomi left a comment

ikolomi Feb 19, 2024

ikolomi Feb 19, 2024

ikolomi Feb 19, 2024

shachlanAmazon commented Feb 28, 2024 •

edited

Loading

Add exponential backoff for cluster connections. #121

Add exponential backoff for cluster connections. #121

Conversation

nihohit commented Feb 18, 2024 • edited by shachlanAmazon Loading

barshaul Feb 18, 2024

Choose a reason for hiding this comment

barshaul Feb 18, 2024

Choose a reason for hiding this comment

ikolomi left a comment

Choose a reason for hiding this comment

ikolomi Feb 19, 2024

Choose a reason for hiding this comment

ikolomi Feb 19, 2024

Choose a reason for hiding this comment

ikolomi Feb 19, 2024

Choose a reason for hiding this comment

shachlanAmazon commented Feb 28, 2024 • edited Loading

nihohit commented Feb 18, 2024 •

edited by shachlanAmazon

Loading

shachlanAmazon commented Feb 28, 2024 •

edited

Loading