Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(influxv2plugin): Increase accepted retry-after header values. #9619

Merged
merged 3 commits into from
Aug 25, 2021

Conversation

philjb
Copy link
Contributor

@philjb philjb commented Aug 12, 2021

Updates the influxdata v2 output plugin that is used for influx cloud2 to respect larger retry-after header values. The current value of 60s is well below the possible values that influx cloud2 may provide. The current largest value seen is 5 minutes.

This PR increases accepted Retry-After header values to 10 minutes.

I know there is a limited cache size for buffered/delayed write requests and that this cache will begin to drop the oldest writes when the cache is full. However, if the server says that it will not accept a failed request until after 4 minutes (for example) in a retry-after, there is no benefit from trying it every 60s. That scenario creates unnecessary network and server load to reject a request that the client already knows will be rejected based on the retry-after header.

retry-after header is specifically used in cloud2 by rate limiters. If a write/read/delete request is denied because the customer has exceeded a rate limit, the 429 response will include the retry-after header specifying when the resource will be available again.

Required for all PRs:

  • Updated associated README.md.
  • Wrote appropriate unit tests.
  • Pull request title or commits are in conventional commit format (e.g. feat: or fix:)

resolves #

@telegraf-tiger telegraf-tiger bot added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Aug 12, 2021
retry := math.Max(backoff, retryAfterHeader)
retry = math.Min(retry, defaultMaxWait)
return time.Duration(retry) * time.Second
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the rounding that happens here when converting to time.Duration (int64), the first 6 retries have a retryDuration of 0s meaning there's no backoff at all.

Copy link

@brettbuddin brettbuddin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me.

@ssoroka
Copy link
Contributor

ssoroka commented Aug 16, 2021

Do we have a use case where 60s is not enough time for the back-off, or is this more academic? I have a lot of reservations about respecting really high retry-after durations. What's the max retry-after that influxdb is going to respond with? how does it calculate the duration?
In high-throughput cases, a 10 minute wait time is pretty unacceptable. 60 seconds is an eternity if you write out 10k records per second.

@philjb
Copy link
Contributor Author

philjb commented Aug 17, 2021

Do we have a use case where 60s is not enough time for the back-off

I didn't change the max backoff (still 60s) but yes, today, InfluxCloud2 may respond with a retry-after of 299 seconds. While getting near the maximum (5min) is unlikely for any given user, I've seen customers making requests at 4x the rate limit I added recently for delete requests. Such a customer would get a retry-after of ~200s if they attempted that rate today.

Most customers don't use telegraf for deletes (none?) but the same logic applies for the other cloud2 rate limits - the retry-after may be as large as 5 minutes. It's just wasted effort all around to resend a message at 60s if telegraf already knows it's going to be rejected until e.g. 120s have gone by.

Copy link
Contributor

@reimda reimda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The influxdb service won't return a large Retry-After header value without good reason. It's not the client's place to second guess the reason and ignore the header value. Looks good to me.

@ssoroka ssoroka merged commit 8daba8a into influxdata:master Aug 25, 2021
@philjb
Copy link
Contributor Author

philjb commented Oct 6, 2021

I believe this PR fixes: #9353

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants