-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: calculate GC safe point based on global min start timestamp #12223
Conversation
/run-all-tests |
Great job. |
store/tikv/gcworker/gc_worker.go
Outdated
globalMinStartTime := time.Unix(sec, nsec) | ||
|
||
diff := safePoint.Sub(globalMinStartTime) | ||
maxTxnTimeUse := time.Duration(tikv.MaxTxnTimeUse)*time.Millisecond + 10*time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think + 10secs
is not needed.
store/tikv/gcworker/gc_worker.go
Outdated
} | ||
physical := oracle.ExtractPhysical(globalMinStartTS) | ||
sec, nsec := physical/1e3, (physical%1e3)*1e6 | ||
globalMinStartTime := time.Unix(sec, nsec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can construct time with ns only,
time.Unix(0, physical * 1e6)
return globalMinStartTime | ||
} | ||
|
||
return safePoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's is possible that the diff > maxTxnTimeUse
but there is another transaction startTS before the safepoint.
So we need to check maxTxnTimeUse
during kvs
iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we needn't, since for these transactions which run longer than maxTxnTimeUse
should be failed finally
Codecov Report
@@ Coverage Diff @@
## master #12223 +/- ##
================================================
+ Coverage 79.8138% 81.0751% +1.2613%
================================================
Files 460 454 -6
Lines 102268 99150 -3118
================================================
- Hits 81624 80386 -1238
+ Misses 14746 12952 -1794
+ Partials 5898 5812 -86 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if the minStartTs
is not updated or not exist, we needn't start GC
globalMinStartTime := time.Unix(0, oracle.ExtractPhysical(math.MaxUint64)*1e6) | ||
maxTxnTimeUse := time.Duration(tikv.MaxTxnTimeUse)*time.Millisecond + 10*time.Second | ||
|
||
for _, v := range kvs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible the kvs
is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
return globalMinStartTime | ||
} | ||
|
||
return safePoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we needn't, since for these transactions which run longer than maxTxnTimeUse
should be failed finally
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@MyonKeminta PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Will it work for clusters without TiDB? |
store/tikv/gcworker/gc_worker.go
Outdated
return safePoint | ||
} | ||
|
||
maxTxnTimeUse := time.Duration(tikv.MaxTxnTimeUse)*time.Millisecond + 10*time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why plus 10s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is your previous suggestion. Leave 10s for commit, otherwise, the transaction may not be committed yet, and GC has already started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't get it. I think it's not necessary.
No, I don't think the original way will not work either. |
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
c5c0414
to
5ae29fd
Compare
Signed-off-by: Shuaipeng Yu <[email protected]>
Fixed a bug that could cause other long transactions of the same TiDB to fail. And added a test case. PTAL @coocood @AndreMouche @MyonKeminta |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
return safePoint | ||
} | ||
|
||
safePointTS := variable.GoTimeToTS(safePoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use oracle.ComposeTS
? The same to other usages of variable.GoTimeToTS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not need to GetPhysical()
.
err = spkv.Put(fmt.Sprintf("%s/%s", domain.ServerMinStartTSPath, "b"), "1") | ||
c.Assert(err, IsNil) | ||
sp = s.gcWorker.calSafePointByMinStartTS(now) | ||
c.Assert(sp, Equals, zeroTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may at least need to add a case that saves a normal time to etcd, to check if calSafePointByMinStartTs
parses the time correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add some integration tests.
for _, info := range pl { | ||
if info.CurTxnStartTS < minStartTS { | ||
if info.CurTxnStartTS > startTSLowerLimit && info.CurTxnStartTS < minStartTS { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any test that covers these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noop... Any suggestions? Maybe we need some intagration test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but I'm not familiar with that part of code. Just thinking it will be nice if this is covered by tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test case added, PTAL @MyonKeminta .
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Your auto merge job has been accepted, waiting for 12397 |
/run-all-tests |
What problem does this PR solve?
We do not need to increase GC life time or GC run interval to avoid the transaction aborted when it took too much time.
What is changed and how it works?
Check List
Tests
Code changes
Related changes