-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent multiple attempts to publish segments for the same sequence #14995
Prevent multiple attempts to publish segments for the same sequence #14995
Conversation
…or the same sequence
@AmatyaAvadhanula , could you please add some tests to verify the fixed behaviour? |
// The callback for a successful segment publish may remove a sequence from the publishingSequences, | ||
// which is racy and can allow the same sequence to be added to the set again. | ||
// Create a copy of publishing sequences to which we can only add elements, and not remove them. | ||
final Set<String> publishingSequencesSnapshot = new HashSet<>(publishingSequences); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new code will be simpler if we snapshot both sequences
and publishingSequences
and then create a third set which contains (sequencesSnapshot
- publishingSequencesSnapshot
) and iterate over that instead of sequencesSnapshot
.
Also, I wonder if there can still be duplicates as it really depends on the timing of snapshotting publishingSequences
. Would a better solution be to track publishedSequences
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new code will be simpler if we snapshot both sequences and publishingSequences and then create a third set which contains (sequencesSnapshot - publishingSequencesSnapshot) and iterate over that instead of sequencesSnapshot
As discussed offline, this is not being done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I wonder if there can still be duplicates as it really depends on the timing of snapshotting publishingSequences. Would a better solution be to track publishedSequences?
I think that this may not happen since the removal happens from sequences and then from publishingSequences, so this order may help. Although it doesn't seem ideal.
Thanks, I'll try to track publishedSequences instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@AmatyaAvadhanula , I assume the set of published sequences will never be too large to cause any memory concerns for the task, right?
if (!sequenceMetadata.isOpen() && | ||
!publishingSequences.contains(sequenceMetadata.getSequenceName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: symmetry with the next condition.
if (!sequenceMetadata.isOpen() && | |
!publishingSequences.contains(sequenceMetadata.getSequenceName()) | |
if (!sequenceMetadata.isOpen() | |
&& !publishingSequences.contains(sequenceMetadata.getSequenceName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Yes, @kfaraz |
…pache#14995) * Prevent a race that may cause multiple attempts to publish segments for the same sequence
…pache#14995) * Prevent a race that may cause multiple attempts to publish segments for the same sequence
…pache#14995) * Prevent a race that may cause multiple attempts to publish segments for the same sequence
…pache#14995) * Prevent a race that may cause multiple attempts to publish segments for the same sequence
…pache#14995) * Prevent a race that may cause multiple attempts to publish segments for the same sequence
Description
Fixes an issue where a race can cause a second attempt to publish segments for the same sequence.
During the second publish, the segments have already been published and are no longer associated with the sequence.
This causes the tasks to fail with an error like:
org.apache.druid.java.util.common.ISE: Attempting to publish with empty segment set, but total row count was not 0: [165263].
Observations
The following sequence of events can cause the error
maybePersistAndPublishSequences
Fix
When iterating over the sequences to check if they need to published, maintain a snapshot of the publishing sequences before iterating, so that a removal on successful publish doesn't cause the race to occur.
Tested on a local cluster by setting thread level breakpoints.
Release note
This PR has: