distributor, ruler: Expect METHOD_NOT_ALLOWED from pusher #7618

narqo · 2024-03-13T11:17:19Z

What this PR does

This is a follow up to #7503

I realized that distributor (and ruler) now need to expect the METHOD_NOT_ALLOWED (gRPC Unimplemented), from ingester, if the latter disables its PushgRPC method. This PR addresses that.

Note, the changes here chose to assume such a case as a client-side error, even though this is a misconfiguration (that is I believe this is not expected to happen in reality). What do you think, @pstibrany?

Which issue(s) this PR fixes or relates to

n/a

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
about-versioning.md updated with experimental features.

Signed-off-by: Vladimir Varankin <[email protected]>

pstibrany · 2024-03-13T11:57:40Z

I think we should not do this change, and treat it as internal error instead. User/client errors don't trigger our alerts, but internal errors do -- and this is clearly misconfiguration error. I think current state is fine as-is. WDYT?

narqo · 2024-03-13T13:51:09Z

I was thinking about this Marco's comment to my original changes:

I'm wondering if we should add a NOT_ALLOWED error, and then we map it as 405 status code and treat it as non retryable.

The concern is that in case of the misconfiguration, the clients will keep retrying their Push requests (will they?), which may not be ideal. But I also agree on the point about the alerting. I'm ok, with closing this one.

dimitarvdimitrov · 2024-03-13T15:15:52Z

i believe both error types will not be retried, but I may be missing something

mimir/pkg/ruler/compat.go

Line 93 in bf37da3

_, err := a.pusher.Push(user.InjectOrgID(a.ctx, a.userID), req)

I agree that it's better if the ruler fails more openly and start triggering alerts for failed writes (MimirRulerTooManyFailedPushes) instead of silently failing

narqo · 2024-03-14T09:34:08Z

i believe both error types will not be retried [in ruler], but I may be missing something

I should have noted that by "clients" I meant grafana-agent or prometheus. I think (although haven't double-checked that), they retry on 5xx.

pstibrany · 2024-03-14T09:38:24Z

I should have noted that by "clients" I meant grafana-agent or prometheus. I think (although haven't double-checked that), they retry on 5xx.

Yes, they do. This is also part of remote-write protocol: https://prometheus.io/docs/concepts/remote_write_spec/#retries-backoff

dimitarvdimitrov · 2024-03-15T09:49:38Z

So my understanding is that the case we're solving for is when a misconfigured or an old distributor replica is still using gRPC to talk to the ingesters. In this case we can either choose to discard the data that the client sent to that distributor (mapping to HTTP 400) or to continue retrying (mapping to HTTP 500).

IMO it's safer and more responsible to keep retrying the data instead of throwing it on the floor. The clients can keep retrying until the misconfiguration is fixed. They usually have some decent buffering capacity (tens of minutes, even hours). It's also an option that the Mimir cluster finishes some rollout and the "unimplemented" errors are only temporary and fix themselves after the rollout is complete.

narqo · 2024-03-15T13:02:05Z

The arguments above make very good sense; let's close this one.

distributor, ruler: expect METHOD_NOT_ALLOWED from pusher

c8d301f

Signed-off-by: Vladimir Varankin <[email protected]>

narqo requested review from a team as code owners March 13, 2024 11:17

narqo closed this Mar 15, 2024

narqo deleted the distributor-ingester-dont-push-grpc branch March 15, 2024 13:02

pracucci mentioned this pull request Mar 18, 2024

ingester: Support for disallowing Push API for ingest storage #7503

Merged

4 tasks

narqo mentioned this pull request Mar 18, 2024

distributor, ruler: expect METHOD_NOT_ALLOWED from pusher #7645

Merged

4 tasks

narqo mentioned this pull request May 13, 2024

mimirpb: export IsClientErrorCause() function #8122

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distributor, ruler: Expect METHOD_NOT_ALLOWED from pusher #7618

distributor, ruler: Expect METHOD_NOT_ALLOWED from pusher #7618

narqo commented Mar 13, 2024

pstibrany commented Mar 13, 2024

narqo commented Mar 13, 2024

dimitarvdimitrov commented Mar 13, 2024

narqo commented Mar 14, 2024 •

edited

Loading

pstibrany commented Mar 14, 2024

dimitarvdimitrov commented Mar 15, 2024

narqo commented Mar 15, 2024

distributor, ruler: Expect METHOD_NOT_ALLOWED from pusher #7618

distributor, ruler: Expect METHOD_NOT_ALLOWED from pusher #7618

Conversation

narqo commented Mar 13, 2024

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

pstibrany commented Mar 13, 2024

narqo commented Mar 13, 2024

dimitarvdimitrov commented Mar 13, 2024

narqo commented Mar 14, 2024 • edited Loading

pstibrany commented Mar 14, 2024

dimitarvdimitrov commented Mar 15, 2024

narqo commented Mar 15, 2024

narqo commented Mar 14, 2024 •

edited

Loading