Mitigate potential overloads in Round Robin load balancing in the event of node failure #676

havaker · 2023-03-23T11:26:26Z

In the default load balancing policy, round robin can lead to overloading of nodes in the event of a node failure. Under the usual round robin order, if a node such as A fails, the next node in the sequence (in this case, B) will take on all of A's requests, potentially causing it to become overloaded.

A potential solution to this issue is to shuffle chosen nodes in each load balancing plan's group, which would distribute the failed node's load more evenly among the remaining nodes. However, it should be noted that random shuffling is currently only implemented for replica choosing in the scylla::transport::load_balancing::DefaultPolicy. Shuffling all the nodes in the later stages of constructing a load balancing plan was considered, but deemed too costly, resulting in the use of round robin (#612 (comment)).

The text was updated successfully, but these errors were encountered:

havaker added the area/load-balancing label Mar 23, 2023

havaker changed the title ~~Mitigate overloads in Round Robin load balancing in the event of node failure~~ Mitigate potential overloads in Round Robin load balancing in the event of node failure Mar 23, 2023

piodul added this to the 1.1.0 milestone Mar 28, 2023

Lorak-mmk self-assigned this Nov 15, 2023

Lorak-mmk removed their assignment Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mitigate potential overloads in Round Robin load balancing in the event of node failure #676

Mitigate potential overloads in Round Robin load balancing in the event of node failure #676

havaker commented Mar 23, 2023

Mitigate potential overloads in Round Robin load balancing in the event of node failure #676

Mitigate potential overloads in Round Robin load balancing in the event of node failure #676

Comments

havaker commented Mar 23, 2023