-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(influxv2plugin): Increase accepted retry-after header values. #9619
Conversation
plugins/outputs/influxdb_v2/http.go
Outdated
retry := math.Max(backoff, retryAfterHeader) | ||
retry = math.Min(retry, defaultMaxWait) | ||
return time.Duration(retry) * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the rounding that happens here when converting to time.Duration (int64), the first 6 retries have a retryDuration of 0s
meaning there's no backoff at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me.
Looks like new artifacts were built from this PR. Get them here!Artifact URLs |
Do we have a use case where 60s is not enough time for the back-off, or is this more academic? I have a lot of reservations about respecting really high retry-after durations. What's the max retry-after that influxdb is going to respond with? how does it calculate the duration? |
I didn't change the max backoff (still 60s) but yes, today, InfluxCloud2 may respond with a retry-after of Most customers don't use telegraf for deletes (none?) but the same logic applies for the other cloud2 rate limits - the retry-after may be as large as 5 minutes. It's just wasted effort all around to resend a message at 60s if telegraf already knows it's going to be rejected until e.g. 120s have gone by. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The influxdb service won't return a large Retry-After header value without good reason. It's not the client's place to second guess the reason and ignore the header value. Looks good to me.
I believe this PR fixes: #9353 |
Updates the influxdata v2 output plugin that is used for influx cloud2 to respect larger retry-after header values. The current value of
60s
is well below the possible values that influx cloud2 may provide. The current largest value seen is 5 minutes.This PR increases accepted
Retry-After
header values to 10 minutes.I know there is a limited cache size for buffered/delayed write requests and that this cache will begin to drop the oldest writes when the cache is full. However, if the server says that it will not accept a failed request until after 4 minutes (for example) in a
retry-after
, there is no benefit from trying it every 60s. That scenario creates unnecessary network and server load to reject a request that the client already knows will be rejected based on theretry-after
header.retry-after
header is specifically used in cloud2 by rate limiters. If a write/read/delete request is denied because the customer has exceeded a rate limit, the 429 response will include theretry-after
header specifying when the resource will be available again.Required for all PRs:
resolves #