Add option to propagate OffsetOutOfRange error #1183

muirdm · 2018-10-03T23:49:04Z

When consuming a partition using a consumer group, the code
handles ErrOffsetOutOfRange errors by resetting to the "initial"
position, as specified by user (i.e. either oldest or newest available
offset). This, however, can be very dangerous. Say a consumer has
consumed up to offset 100 on replica A but replica B has only
replicated up to offset 99 due to temporary under-replication. During
a rebalance, sarama can end up with an offset out-of-range error if it
fetches partition metadata from replica B since the desired offset of
100 is greater than the newest offset of 99. The sarama consumer would
reset the offset in this case, which can cause reprocessing of old
data, especially if the initial offset is configured as "oldest".

This commit adds a config flag to disable this automatic reset. In the
above case, the consumer will be able to proceed normally after the
data replicates.

Resolves #1181

When consuming a partition using a consumer group, the code handles ErrOffsetOutOfRange errors by resetting to the "initial" position, as specified by user (i.e. either oldest or newest available offset). This, however, can be very dangerous. Say a consumer has consumed up to offset 100 on replica A but replica B has only replicated up to offset 99 due to temporary under-replication. During a rebalance, sarama can end up with an offset out-of-range error if it fetches partition metadata from replica B since the desired offset of 100 is greater than the newest offset of 99. The sarama consumer would reset the offset in this case, which can cause reprocessing of old data, especially if the initial offset is configured as "oldest". This commit adds a config flag to disable this automatic reset. In the above case, the consumer will be able to proceed normally after the data replicates.

varun06 · 2019-02-25T16:56:42Z

@dim can you please have a look also @muirrn can you please add a test to it?

dim · 2019-02-28T15:19:40Z

@varun06 I don't think this is easily testable. It's not something that can be easily re-created in a test scenario.

@muirrn thanks for the detailed explanation and the patch. One question: how would you handle these cases. Assuming the ErrOffsetOutOfRange error is returned to the user, how would you decide what to do next? I wonder if there is a more generic solution to the problem.

muirdm · 2019-02-28T15:40:13Z

The error gets propagated and we retry consuming until it works.

Maybe a general solution is to only fetch offsets from the group coordinator, but I'm not really sure.

dim · 2019-02-28T15:46:26Z

@muirrn the reason why we added this in the first place was an endless loop of ErrOffsetOutOfRange errors when a high-volume partition (with a very low TTL) was pruned and the stored offset was < the first available offset. I think your patch is safe to apply, but I would even go a bit further and make ResetInvalidOffsets to false by default

ghost · 2020-02-21T20:04:24Z

Thank you for your contribution! However, this pull request has not had any activity in the past 90 days and will be closed in 30 days if no updates occur.
If you believe the changes are still valid then please verify your branch has no conflicts with master and rebase if needed. If you are awaiting a (re-)review then please let us know.

muirdm · 2020-02-24T17:51:27Z

@dim are you still in favor of changing the default to false and proceeding with this change? I don't have a good understanding of how this might affect other users, so I was trying to be conservative.

dim · 2020-03-31T13:57:32Z

@muirdm yes, I still think that false may be a better default. I doubt it will affect anyone TBH. Apologies for the delayed response.

ghost · 2021-03-16T21:14:40Z

Thank you for your contribution! However, this pull request has not had any activity in the past 90 days and will be closed in 30 days if no updates occur.
If you believe the changes are still valid then please verify your branch has no conflicts with master and rebase if needed. If you are awaiting a (re-)review then please let us know.

ghost added the stale Issues and pull requests without any recent activity label Feb 21, 2020

ghost removed the stale Issues and pull requests without any recent activity label Feb 24, 2020

ghost added the stale Issues and pull requests without any recent activity label Mar 16, 2021

bai closed this Sep 13, 2021

dkolistratova mentioned this pull request Jun 16, 2022

feat: add option to propagate OffsetOutOfRange error #2252

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to propagate OffsetOutOfRange error #1183

Add option to propagate OffsetOutOfRange error #1183

muirdm commented Oct 3, 2018

varun06 commented Feb 25, 2019

dim commented Feb 28, 2019

muirdm commented Feb 28, 2019

dim commented Feb 28, 2019

ghost commented Feb 21, 2020

muirdm commented Feb 24, 2020

dim commented Mar 31, 2020

ghost commented Mar 16, 2021

Add option to propagate OffsetOutOfRange error #1183

Add option to propagate OffsetOutOfRange error #1183

Conversation

muirdm commented Oct 3, 2018

varun06 commented Feb 25, 2019

dim commented Feb 28, 2019

muirdm commented Feb 28, 2019

dim commented Feb 28, 2019

ghost commented Feb 21, 2020

muirdm commented Feb 24, 2020

dim commented Mar 31, 2020

ghost commented Mar 16, 2021