Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consumer group offset can reset during rebalance if underreplicated #1181

Closed
muirdm opened this issue Oct 2, 2018 · 5 comments
Closed

Consumer group offset can reset during rebalance if underreplicated #1181

muirdm opened this issue Oct 2, 2018 · 5 comments
Labels
question stale Issues and pull requests without any recent activity

Comments

@muirdm
Copy link

muirdm commented Oct 2, 2018

Versions

Sarama Version: 7479983
Kafka Version: 1.0, 2.0
Go Version: 1.11

Problem Description

If you have underreplication for whatever reason (e.g. publishing messages at sarama.WaitForLocal instead of WaitForAll), a rebalance can end up resetting a consumer group partition offset back to the initial position. This happens when you consume up to, say, offset 100 on replica A, but replica B only has data up to 99 due to temporary underreplication. When rebalancing to replica B, the client will return ErrOffsetOutOfRange trying to subscribe to offset 100, and that causes the consumer to reset to the initial offset.

Is there a reason ErrOffsetOutOfRange is not propagated up to the user? Are there any cases the user would want to silently reset the consumer offset?

Note that we experienced this behavior using sarama-cluster. I am not able to reproduce the error consistently and have not reproduced it with the sarama consumer yet. However, the code seems to behave the same as sarama-cluster in this regard.

@harshach
Copy link

@muirrn isn't consumer reading till the high-watermark so that means the data is available in replica too.
This should only happen if you have unclean.leader.election.enable set to true , is that the case in your cluster?

@muirdm
Copy link
Author

muirdm commented Oct 18, 2018

It isn't in our server config so it should be false by default for our version of kafka.

I think the issue is the consumer group reads the group's offset from the group's coordinator broker, but chooseStartingOffset compares that offset to the "newest" offset fetched via GetOffset, which fetches the offset from the partition's leader, not the group's coordinator. Since it is a different broker, it can have a lower newest offset if it is underreplicated.

@ghost
Copy link

ghost commented Feb 21, 2020

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur.
Please check if the master branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

@ghost ghost added the stale Issues and pull requests without any recent activity label Feb 21, 2020
@ghost ghost closed this as completed Mar 22, 2020
@dnwe dnwe reopened this May 5, 2021
@ghost ghost removed the stale Issues and pull requests without any recent activity label May 5, 2021
@github-actions
Copy link

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur.
Please check if the main branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

@github-actions github-actions bot added the stale Issues and pull requests without any recent activity label Aug 24, 2023
@dnwe
Copy link
Collaborator

dnwe commented Aug 24, 2023

Fixed by #2252 and the c.Consumer.Group.ResetInvalidOffsets = false option to propagate the ErrOffsetOutOfRange error back to the user

@dnwe dnwe closed this as completed Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question stale Issues and pull requests without any recent activity
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants