Receive: Implement a sloppy quorum #5809

matej-g · 2022-10-21T14:38:04Z

An idea that I had for some time, I was wondering about opinion of others.

Is your proposal related to a problem?

Even though receiver is scalable, when replication is on, by increasing / decreasing number of nodes in a hashring, the fact is that due to how receiver handles requests (batch smaller request for each node + replicate the batched requests) a user can afford to lose only small number of nodes before we cannot guarantee quorum anymore and have to respond with failure to client (for example for replication factor 3 it's only one node). This means that even if we increase number of nodes in the hashring, it will not increase the resiliency of the system. The larger the number of nodes in the hashring, the higher the probability more than one node will be out at the same time.

Under normal circumstances and for many use cases, this might be fine - as is widely known, Prometheus clients will keep retrying if 5xx is returned up to 2 hours, which should be enough to resolve issues, so there's only small danger of data loss. However, for users which want to guarantee higher write availability and "near-live" data querying, the time window < 2 hours can be unacceptable. Good example would be users that have alerts defined depending on metrics written to receiver - in such situation, users are clearly interested in having those metrics available ASAP. Even though receiver does not do any rollback in case of partial writes (i.e. even if quorum is not reached, it might be that some requests are written despite receiver returning 5xx error) and therefore these metrics might be available, it cannot be guaranteed and for the duration of any outage, it is unknown whether the request will be written to at least one node.

Describe the solution you'd like

For cases when higher availability of receive writes is desired, we could consider accepting sloppy quorum. This could have a form of nodes accepting and temporarily storing writes on behalf of others and then retrying to send data to the appropriate node with backoff. A very rough idea would be to use something akin to agent, which would periodically retry to deliver data to the appropriate node.

To ensure instant read availability, we might consider to accept sloppy quorum only if at least one replication successfully ended up on the correct node and can be queried. Other option, although probably more complex and less performant, would be to make it possible to query data that is being temporarily stored on behalf of other node.

Describe alternatives you've considered

Take advantage of the in-built forward retry mechanism in receiver - this works fine for intermittent network errors, but cannot help during a longer node outages
Increase the replication number - with higher replication factor, user can afford to lose more nodes. However, this would result in unnecessary data redundancy and overhead of storing the extra replicated data, even though user does not have a need for it.

Additional context

Performance implications should certainly be considered here, as receivers have large footprint already (see e.g. #5751). Nevertheless, the trade-off might be favorable for the use cases described above.

cc @philipgough

The text was updated successfully, but these errors were encountered:

ahurtaud · 2023-01-24T11:54:12Z

Hello, Starting to implement stateless ruler which now query receive for ALERTS_FOR_STATE in receive.
We would be super interested in this, as of today if 1 receive ingester is down and for whatever reason does not recover fast, we have problems whatever the replication factor we set.

matej-g · 2023-01-31T10:24:43Z

Thanks @ahurtaud, unfortunately I think I'd be able to take this on in the foreseeable future 😢 , but I'd be happy to let someone else take over / help review etc.

matej-g added feature request/improvement component: receive labels Oct 21, 2022

GiedriusS mentioned this issue May 17, 2023

Thanos receiver issue when 1 receiver is down #6368

Open

douglascamata mentioned this issue Jan 30, 2024

[draft; not-ready] receive: Sloppy quorum implementation #7106

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Receive: Implement a sloppy quorum #5809

Receive: Implement a sloppy quorum #5809

matej-g commented Oct 21, 2022

ahurtaud commented Jan 24, 2023

matej-g commented Jan 31, 2023

Receive: Implement a sloppy quorum #5809

Receive: Implement a sloppy quorum #5809

Comments

matej-g commented Oct 21, 2022

Is your proposal related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

ahurtaud commented Jan 24, 2023

matej-g commented Jan 31, 2023