Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive Redistribution Events During Cluster Size Changes #1389

Closed
VinozzZ opened this issue Oct 16, 2024 · 1 comment
Closed

Excessive Redistribution Events During Cluster Size Changes #1389

VinozzZ opened this issue Oct 16, 2024 · 1 comment
Assignees
Labels
type: bug Something isn't working
Milestone

Comments

@VinozzZ
Copy link
Contributor

VinozzZ commented Oct 16, 2024

Description

Currently, Refinery’s trace redistribution logic can trigger multiple redistribution events when cluster size changes occur back-to-back in a short time frame. Each cluster size change immediately triggers a redistribution, leading to increased traffic on nodes, which can worsen their stability, especially in environments with frequent scaling events.

Proposed Solution:

Introduce an initial delay using a timer before starting the redistribution process. The behavior will be as follows:

  • A timer will start after a cluster size change is detected.
  • Redistribution will only begin once the timer expires.
  • If another cluster size change occurs before the timer finishes, the timer will reset.
  • This ensures that redistribution only happens once the cluster has stabilized, reducing unnecessary redistribution events.
@VinozzZ VinozzZ added the type: bug Something isn't working label Oct 16, 2024
@VinozzZ VinozzZ added this to the v2.9 milestone Oct 16, 2024
@MikeGoldsmith MikeGoldsmith self-assigned this Oct 18, 2024
VinozzZ added a commit that referenced this issue Nov 6, 2024
## Which problem is this PR solving?

- #1389 

## Short description of the changes

- moved `redistributeNotifier` to its own file since `collect.go` is
getting big
- only notify the `triggered` channel when `timer` fires
- reset `timer` when receiving a peer membership change from `r.reset`
- added tests

---------

Co-authored-by: Kent Quirk <[email protected]>
@akvanhar akvanhar closed this as completed Nov 7, 2024
@akvanhar
Copy link
Contributor

akvanhar commented Nov 7, 2024

Fixed by #1403

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants