Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SKYEDEN-3020 | Handle retransmission zookeeper race condition #1936

Merged
merged 5 commits into from
Jan 8, 2025

Conversation

MarcinBobinski
Copy link
Collaborator

@MarcinBobinski MarcinBobinski commented Dec 11, 2024

Why:
The current implementation of the Retransmitter::reloadOffsets method contains a race condition. Multiple consumer instances can execute this code simultaneously, which may lead to unpredictable behavior.

Offsets are stored in ZooKeeper per partition, for example:
[1:{123}, 2:{345}, 3:{567}, 4:{678}].

The method first fetches the entire list of available partitions from ZooKeeper ([1,2,3,4]), then retrieves the specific offsets for each partition. The fetched offsets are processed for partitions assigned to the consumer and subsequently deleted from ZooKeeper. For instance, if consumer1 is assigned partitions [1,2], it will process these offsets and delete them, leaving only partitions [3,4] in ZooKeeper.

The issue arises when multiple consumers execute this process simultaneously. A consumer may fetch the list of available partitions, but by the time it attempts to fetch the offset for a partition, the offset may no longer exist because it has already been deleted by another consumer.

Fix:
To resolve this, the step of listing all available partitions from zookeper was removed. Instead, the method now directly fetches the partitions assigned to the consumer. This ensures that the consumer only retrieves data for partitions it is responsible for, eliminating the possibility of fetching offsets that may have been deleted by another consumer.

Copy link
Collaborator

@moscicky moscicky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR, thanks for the fix

@szczygiel-m szczygiel-m merged commit 6f6a5ed into master Jan 8, 2025
14 checks passed
@szczygiel-m szczygiel-m deleted the SKYEDEN-3020-RetransmissionFix branch January 8, 2025 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants