-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
snap_backup: snapshot backup isn't compatible with importing #46850
Comments
Applying the raft command For example, one of the error-prone event sequence (order is defined by the wall clock) would be:
Here, the |
Trivially wait lightning exit will be fine because So, given the "importing context"(The state of SSTs to be ingested and the
|
So |
And lightning task may run for multiple hours, is it acceptable that RPO is larger due to import? Backup or import, which has higher priority? |
I think given taking snapshot backup lasts for a tiny time period(the CreateVolumeSnapshot request usually response within seconds), It might be acceptable to temporarily stop importing? (thanks to checkpoints) |
LGTM, I think you can ask PM to make a final decision. Maybe let SSTImporter return some error message to let lightning restart from |
Unfortunately, not for now. It just print errors and retry to register itself. I think an |
/component br |
@BornChanger: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
@YuJuncen I think we can close this issue. |
@BornChanger let me check if lightning handles this new behaviour tomorrow |
lightning side will see a |
lightning see RPC error instead of a RPC response with error fields. Although lightning can handle it as a default error, some unnecessary retry can be skipped and we should record this error to display to user. |
It retry from beginning instead of checkpoint which impact the speed of ingestion. This problem is severe because backup is taken every 30 min. |
…napshot_backup (pingcap#47001) (pingcap#47341) (pingcap#20) ref pingcap#46850 Co-authored-by: Ti Chi Robot <[email protected]>
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
2. What did you expect to see? (Required)
The restore should be success, because the backup has succeeded.
3. What did you see instead (Required)
The restored cluster (sometimes) keep panicking due to
ingest sst not found
.4. What is your TiDB version? (Required)
current master.
The text was updated successfully, but these errors were encountered: