Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allow a single node to activate stress relief mode during significant load increase #1256

Merged
merged 5 commits into from
Jul 26, 2024

Conversation

VinozzZ
Copy link
Contributor

@VinozzZ VinozzZ commented Jul 26, 2024

Which problem is this PR solving?

Without resolving trace locality issue, a single peer can receive a large trace that significantly raise its stress level than the rest of the cluster. To address this issue, we can allow individual refineries to go into stress relief mode if their own stress is too high, even if the cluster's isn't.

Short description of the changes

  • use the max of the individual stress level and the cluster stress level as the overall stress level when calculating stress relief activation and deactivation
  • record the stress level that determined stress relief activation as stress_level metric
  • add tests

@VinozzZ VinozzZ requested a review from a team as a code owner July 26, 2024 16:51
@VinozzZ VinozzZ changed the title fix: allow a single node to activate stress relief mode during signif… fix: allow a single node to activate stress relief mode during significant load increase Jul 26, 2024
@VinozzZ VinozzZ added this to the v2.7 milestone Jul 26, 2024
Copy link
Contributor

@kentquirk kentquirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, and it also preserves our previous stress_level metric name, which is a nice bonus.

I made one suggestion that we add all levels to the single log message.

collect/stressRelief.go Outdated Show resolved Hide resolved
@VinozzZ VinozzZ merged commit 70781b0 into main Jul 26, 2024
5 checks passed
@VinozzZ VinozzZ deleted the yingrong.stress_level_for_trace_locality branch July 26, 2024 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants