Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdc: Data may be lost during leader transfer #193

Closed
zeminzhou opened this issue Aug 3, 2022 · 2 comments · Fixed by #196
Closed

cdc: Data may be lost during leader transfer #193

zeminzhou opened this issue Aug 3, 2022 · 2 comments · Fixed by #196
Labels
type/bug Something isn't working

Comments

@zeminzhou
Copy link
Contributor

zeminzhou commented Aug 3, 2022

Bug Report

1. Describe the bug

In TiKV, when a leader transfer occurs, the old leader may send the last resolved ts, which may be larger than the new leader's first causal timestamp after transfer.

Details as follow(from log):

  1. At 20:04:06.105, leader transfer from A to B, now B's causal ts is 435080671018356250.
  2. At 20:04:06.106, A as a leader broadcasts resolved ts(435080671018356650).
  3. At 20:04:06.116, B becomes leader, appends ts(435080671018356350) for new key.
  4. At 20:04:06.132, cdc receives resolved ts(435080671018356650) from A.
  5. At 20:04:06.225, cdc start new request to B with start ts(435080671018356650).

So B don't send the key that is with ts(435080671018356350) to cdc.

2. Minimal reproduce step (Required)

  1. start cdc
  2. put 5000000 key/value pairs into src tikv
  3. get checksum from src tikv and dst tikv
  4. if checksums are same, repeat step2-3.

3. What did you see instead (Required)

  1. checksums are not same

4. What did you expect to see? (Required)

  1. checksums are not always same
@zeminzhou zeminzhou added the type/bug Something isn't working label Aug 3, 2022
@pingyu pingyu changed the title cdc: the CompareAndSet map lose cdc: the CompareAndSet may lose Aug 3, 2022
@zeminzhou zeminzhou changed the title cdc: the CompareAndSet may lose cdc: Data may be lost during leader transfer Aug 8, 2022
@zeminzhou
Copy link
Contributor Author

cc@pingyu @haojinming

@pingyu
Copy link
Collaborator

pingyu commented Aug 8, 2022

Some additional information:

  1. A causal timestamp flush in node B is invoked after B became the "Candidate" (see here, and here), before step 3.

  2. Node A will check leadership by check_leader RPC request before step 2. See here.

We may conclude that the node A & B do not get an agreement on who is the leader. So maybe we should flush causal timestamp after the moment node B "really" become the Leader (other than Candidate). Or use another method to check leader in node A.

pingyu pushed a commit that referenced this issue Aug 8, 2022
* resolvedts - x

Signed-off-by: zeminzhou <[email protected]>

* resolvedts - x

Signed-off-by: zeminzhou <[email protected]>

* resolvedts - x

Signed-off-by: zeminzhou <[email protected]>

* fix comment & ut

Signed-off-by: zeminzhou <[email protected]>

* add ut

Signed-off-by: zeminzhou <[email protected]>

* fix ut

Signed-off-by: zeminzhou <[email protected]>

* fix ut

Signed-off-by: zeminzhou <[email protected]>

* fix comment

Signed-off-by: zeminzhou <[email protected]>

* fix ut

Signed-off-by: zeminzhou <[email protected]>

* remove tmp

Signed-off-by: zeminzhou <[email protected]>

* fix comment

Signed-off-by: zeminzhou <[email protected]>

* fix ut

Signed-off-by: zeminzhou <[email protected]>

* fix kv ut timeout

Signed-off-by: zeminzhou <[email protected]>

* fix ut

Signed-off-by: zeminzhou <[email protected]>

* fix check

Signed-off-by: zeminzhou <[email protected]>

* fix ut

Signed-off-by: zeminzhou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants