-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check pointing Leader Scheduler State #5352
Check pointing Leader Scheduler State #5352
Conversation
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
long updatedAtMillis = Long.parseLong((String) this.metadata.getOrDefault(Constants.UPDATED, "0")); | ||
long createdAtMillis = Long.parseLong((String) this.metadata.getOrDefault(Constants.CREATED, "0")); | ||
long updatedAtMillis = getMetadataField(Constants.UPDATED); | ||
long createdAtMillis = getMetadataField(CREATED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Constants.CREATED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed it
import static org.mockito.Mockito.verify; | ||
import static org.mockito.Mockito.when; | ||
import static org.mockito.internal.verification.VerificationModeFactory.times; | ||
|
||
@ExtendWith(MockitoExtension.class) | ||
public class CrawlerTest { | ||
private static final int DEFAULT_BATCH_SIZE = 50; | ||
Instant lastPollTime = Instant.ofEpochMilli(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to make this private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this private now
Instant currentTimeInstance = Instant.now(); | ||
if (Duration.between(lastLeaderSavedInstant, currentTimeInstance).toMinutes() >= 1) { | ||
// intermediate updates to master partition state | ||
updateLeaderProgressState(leaderPartition, latestModifiedTime, coordinator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make sure that this won't block creation of any partitions by filtering them out based on the timestamp. You can be a little lenient if needed by making the "lastPollTime" from before the time that you actually start the crawling, since source coordination will dedupe. If there are no issues with this, then it is ideal since that means less calls to source coordination. You can start crawling from lastPollTime - someShortAmountOfTime if there's a chance of any race conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, at the end of the loop, I am checkpointing with the "lastPollTime" before the crawl start. lastPollTime - someShortAmountOfTime looks like a good idea for the next crawling. I will see the impact of this change in the JQL and play with it to see if we can go with that choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good I think, but I have one question about the lastModifiedTime logic
@@ -51,20 +60,40 @@ public Instant crawl(Instant lastPollTime, | |||
continue; | |||
} | |||
itemInfoList.add(nextItem); | |||
if (nextItem.getLastModifiedAt().isAfter(latestModifiedTime)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if an item that has already been added to the list gets modified and then we add a different item to the list that got modified later? Won't we end up skipping the update from the first item?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Max is not impacting the jql query. The max is only for intermediate check pointing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hooray
Signed-off-by: Santhosh Gandhe <[email protected]>
e2fd7be
Description
When crawling through large Jira site, initial crawl might take more time than the Leader's lease time and it could expire before leader completes the crawling action. This fix is to check point frequently while crawling is in progress, which renews the lease as well as saves intermediate leader crawling state.
Issues Resolved
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.