Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuring a range for jitter to be within #17

Merged
merged 1 commit into from
Jan 9, 2025
Merged

Conversation

moskyb
Copy link
Collaborator

@moskyb moskyb commented Jan 9, 2025

Previously, jitter would only be available within a single second of the desired interval - eg, if we had jitter enabled and our next interval was 4.5 seconds, the actual calculated interval would be uniformly distributed between 4.5s and 5.5s.

The issue with this approach is that it doesn't let callers spread load out efficiently over long timescales - eg, if i have a fleet of millions of agents set to ping every ten seconds, spreading those pings out over one second is only a little better than having them hit all at once - we'd ideally like to be able to spread those pings out over a larger area of the ping interval.

This commit PR us to do just that - we can configure a custom range for jitter. To use our previous example, if we set a jitter range of (-1s, 3s], the actual intervals would be uniformly distributed between 3.5s -and 7.5s.

Previously, jitter would only be available within a single second of the desired interval - eg, if we had jitter enabled and our next interval was 4.5 seconds, the actual calculated interval would be uniformly distributed between 4.5s and 5.5s.

The issue with this approach is that it doesn't let callers spread load out efficiently over long timescales - eg, if i have a fleet of millions of agents set to ping every ten seconds, spreading those pings out over one second is only a little better than having them hit all at once - we'd ideally like to be able to spread those pings out over a larger area of the ping interval.

This commit enables us to do just that - we can configure a custom range for jitter. To use our previous example, if we set a jitter range of (-1s, 3s], the actual intervals would be uniformly distributed between 3.5s -and 7.5s.
@moskyb moskyb requested review from CerealBoy and a team January 9, 2025 00:12
Copy link

@CerealBoy CerealBoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we can re-look at a later time if we want to make things more dynamic

@moskyb moskyb merged commit ed7f763 into main Jan 9, 2025
1 check passed
@moskyb moskyb deleted the better-jitter branch January 9, 2025 00:46
moskyb added a commit that referenced this pull request Jan 13, 2025
The `roko.Retrier`'s `String()` method returns a message about the state of the retry loop, generally something of the form `Attempt 3/19 Retrying in 6s`. Previously, when jitter was enabled for a retrier, the call to `NextInterval()` within the `String()` method and the actual retry loop itself would be totally independent, and so if random jitter was enabled, they'd return different results - ie, `String()` would say `Attempt 3/19 Retrying in 5.42038947s`, but the retry loop would calculate a totally different value for jitter, and wait a longer or shorter amount of time.

This wasn't so bad when jitter was limited to one second - `String()` might say `nest attempt in 5.01s`, and then the retrier would actually sleep for `5.99s`, but what's a second or two between friends, right?

Unfortunately, in [#17](#17), some jabroni (me) added the ability for users to customise the range that jitter falls within, so it's now possible for `String()` to lie fairly significantly about the amount of time it's going to wait till the next request. If we configured a jitter range of [-10s, 30s] for our jitter range, the next interval reported by `String()` could be off by as much as 40 seconds, which is pretty bad.

This PR fixes this issue by refactoring `roko.Retrier` a little bit - now, at the start of the retry loop - before the callback is called - and stored in the state of the retrier. This means that any accesses to the next interval time will be consistent, even with jitter enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants