Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CELEBORN-1825] The map task ended too early, which caused problems with data consistency #3056

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zhaostu4
Copy link
Contributor

@zhaostu4 zhaostu4 commented Jan 7, 2025

What changes were proposed in this pull request?

fix map task ended too early

Why are the changes needed?

During the process of PushDataRpcResponseCallback.updateLatestPartition, it is possible that newloc.hostAndPushPort and latest.hostAndPushPort are equal. If these two keys are equal, it will cause the inFlightRequestTracker to be reset to zero, and eventually lead to the task ending prematurely. The specific trigger scenarios are as follows:

  1. replicate.enabled=true
  2. The revive request carries the information of (*_REPLICA).
  3. The new primary host is the same as the previous primary host

Does this PR introduce any user-facing change?

NO

How was this patch tested?

GA

@zhaostu4 zhaostu4 changed the title CELEBORN-1825 The map task ended too early, which caused problems with data consistency [CELEBORN-1825] The map task ended too early, which caused problems with data consistency Jan 7, 2025
@FMX
Copy link
Contributor

FMX commented Jan 7, 2025

Lots of CI failure usually means that your PR has something wrong. Please fix them first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants