Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on BatchCoprocessor request may fail randomly #44622

Closed
windtalker opened this issue Jun 13, 2023 · 0 comments · Fixed by #44623
Closed

Retry on BatchCoprocessor request may fail randomly #44622

windtalker opened this issue Jun 13, 2023 · 0 comments · Fixed by #44623
Labels
affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.2 affects-6.3 affects-6.4 affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-6.6 affects-7.0 affects-7.1 This bug affects the 7.1.x(LTS) versions. severity/major type/bug The issue is confirmed as a bug.

Comments

@windtalker
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!
The root cause is when retry a batch coprocessor request, we just collect the key ranges in the failed batch coprocessor request:

func (b *batchCopIterator) retryBatchCopTask(ctx context.Context, bo *backoff.Backoffer, batchTask *batchCopTask) ([]*batchCopTask, error) {
if batchTask.regionInfos != nil {
var ranges []kv.KeyRange
for _, ri := range batchTask.regionInfos {
ri.Ranges.Do(func(ran *kv.KeyRange) {
ranges = append(ranges, *ran)
})
}
ret, err := buildBatchCopTasksForNonPartitionedTable(ctx, bo, b.store, NewKeyRanges(ranges), b.req.StoreType, false, 0, false, 0, tiflashcompute.DispatchPolicyInvalid)
return ret, err
}
// Retry Partition Table Scan
keyRanges := make([]*KeyRanges, 0, len(batchTask.PartitionTableRegions))
pid := make([]int64, 0, len(batchTask.PartitionTableRegions))
for _, trs := range batchTask.PartitionTableRegions {
pid = append(pid, trs.PhysicalTableId)
ranges := make([]kv.KeyRange, 0, len(trs.Regions))
for _, ri := range trs.Regions {
for _, ran := range ri.Ranges {
ranges = append(ranges, kv.KeyRange{
StartKey: ran.Start,
EndKey: ran.End,
})
}
}
keyRanges = append(keyRanges, NewKeyRanges(ranges))
}
ret, err := buildBatchCopTasksForPartitionedTable(ctx, bo, b.store, keyRanges, b.req.StoreType, false, 0, false, 0, pid, tiflashcompute.DispatchPolicyInvalid)
return ret, err
}

But, for buildBatchCopTasksForNonPartitionedTable and buildBatchCopTasksForPartitionedTable, it actually assumes that the input key ranges is sorted. So an unsorted input may cause wrong result in SplitKeyRangesByLocations, and TiFlash may throw errors like
Error 1105 (HY000): [FLASH:Coprocessor:BadRequest] Income key ranges is illegal for region: 326: (while doing learner read for table, logical table_id: 2593)

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

@windtalker windtalker added type/bug The issue is confirmed as a bug. severity/major labels Jun 13, 2023
@ti-chi-bot ti-chi-bot bot added may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 labels Jun 13, 2023
@windtalker windtalker added affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.2 affects-6.3 affects-6.4 affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-6.6 affects-7.0 and removed may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 labels Jun 13, 2023
@windtalker windtalker added the affects-7.1 This bug affects the 7.1.x(LTS) versions. label Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.2 affects-6.3 affects-6.4 affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-6.6 affects-7.0 affects-7.1 This bug affects the 7.1.x(LTS) versions. severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant