-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stale lock during tidb rolling restart results in resolved ts lag and cdc lag increase #52108
Comments
/remove-type question |
/severity major |
/assign @bb7133 |
/assign @YangKeao After doing some experiments and investigating the logic of TiDB's graceful shutdown, here is a brief introduction to the routine and how to avoid this issue. The shutdown process will have three stages:
If we don't expect to leak locks, we can have two choices:
The second goal is relatively hard to reach. However, the first one should be able to handle most of the cases.
And finally, no matter which options you choose, make sure to also increase |
After fixing the auto-commit issues, I found that the background async-commit goroutines are not waited. Therefore, if most of the workload is async-commit transactions, TiDB may exit too early and kill background async-commit goroutines and leak the lock. (Also, committing secondary keys happens in background, so if there are many transactions with more than one keys, it'll cause similar issues). I've submitted PRs to wait for the goroutines which are used to async commit or commit secondary keys: ref tikv/client-go#1432 and #55608 |
For the most of the cases (the transactions which last for shorter than 15s), the lock will not leak after merging #55608. Therefore I'll close this issue. |
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
3 CDC changefeed running to sync 4000 tables.
2. What did you expect to see? (Required)
CDC lag should not be <10s
3. What did you see instead (Required)
TiKV resolved ts lag increases and cdc lag increases as a results.
CDC log indicates that cdc tries to resolved lock.
4. What is your TiDB version? (Required)
The text was updated successfully, but these errors were encountered: