-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore from S3 repo - failed with error: found a datadownload with status "InProgress" during the node-agent starting, mark it as cancel #7761
Comments
That error means that the node agent restarted in the middle of processing a datadownload. |
Hmmm... no restarts visible: |
Changed the Deamonset of the node agents to debug log level. time="2024-05-02T08:02:48Z" level=debug msg="enqueueing resources ..." controller=DataDownload logSource="pkg/util/kube/periodical_enqueue_source.go:71" resource="*v2alpha1.DataDownloadList" The restore already is canceled but the datadownload still is trying to "do something"... Details of the datadownload: kubectl -n velero get datadownloads -l velero.io/restore-name=backuptest-01-20240430120059 -o yaml
|
Hi @thomasklosinsky, from debug bundle you provided, you can find out that node agent |
backuptest01-20240506133312-wjdq4 1/1 Running 0 2m35s Normal Pulled pod/backuptest01-lq75c Means, that the IO is too slow...... right? |
|
Added some space to k8s worker vm disk and it's working now. thx a lot! |
What steps did you take and what happened:
1 pod with a pvc mounted via ceph-csi in namespace backuptest
Ran:
velero backup create NAME --snapshot-data-move --include-namespace backuptest << successful
kubectl delete namespace backuptest << of course, successful
velero describe backup NAME --detail << all data looking good
velero repo get << all data there
velero restore create --from backup NAME
What did you expect to happen:
I expected the pvc to be restored from S3 repo.
But the datadownload failed with this error:
found a datadownload with status "InProgress" during the node-agent starting, mark it as cancel
The following information will help us better understand what's going on:
Log from the node-agent:
time="2024-04-30T10:00:59Z" level=info msg="Restore PVC is created" logSource="pkg/exposer/generic_restore.go:103" owner=backuptest-01-20240430120059-2x42c pvc name=backuptest-01-20240430120059-2x42c source namespace=backuptest target PVC=test-pvc-2
time="2024-04-30T10:00:59Z" level=info msg="Restore is exposed" controller=datadownload datadownload=velero/backuptest-01-20240430120059-2x42c logSource="pkg/controller/data_download_controller.go:195"
time="2024-04-30T10:00:59Z" level=info msg="Reconcile backuptest-01-20240430120059-2x42c" controller=datadownload datadownload=velero/backuptest-01-20240430120059-2x42c logSource="pkg/controller/data_download_controller.go:101"
time="2024-04-30T10:01:11Z" level=info msg="Reconcile backuptest-01-20240430120059-2x42c" controller=datadownload datadownload=velero/backuptest-01-20240430120059-2x42c logSource="pkg/controller/data_download_controller.go:101"
time="2024-04-30T10:01:11Z" level=info msg="Data download is prepared" controller=datadownload datadownload=velero/backuptest-01-20240430120059-2x42c logSource="pkg/controller/data_download_controller.go:226"
time="2024-04-30T10:01:11Z" level=info msg="Reconcile backuptest-01-20240430120059-2x42c" controller=datadownload datadownload=velero/backuptest-01-20240430120059-2x42c logSource="pkg/controller/data_download_controller.go:101"
time="2024-04-30T10:01:11Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/backuptest-01-20240430120059-2x42c logSource="pkg/controller/data_download_controller.go:285"
time="2024-04-30T10:10:55Z" level=info msg="Reconcile backuptest-01-20240430120059-2x42c" controller=datadownload datadownload=velero/backuptest-01-20240430120059-2x42c logSource="pkg/controller/data_download_controller.go:101"
time="2024-04-30T10:10:55Z" level=info msg="Data download is in progress" controller=datadownload datadownload=velero/backuptest-01-20240430120059-2x42c logSource="pkg/controller/data_download_controller.go:285"
time="2024-04-30T10:10:55Z" level=info msg="Data download is being canceled" controller=datadownload datadownload=velero/backuptest-01-20240430120059-2x42c logSource="pkg/controller/data_download_controller.go:287"
time="2024-04-30T10:10:55Z" level=warning msg="Async fs backup data path canceled" controller=DataDownload datadownload=backuptest-01-20240430120059-2x42c logSource="pkg/controller/data_download_controller.go:398"
time="2024-04-30T10:10:55Z" level=info msg="Reconcile backuptest-01-20240430120059-2x42c" controller=datadownload datadownload=velero/backuptest-01-20240430120059-2x42c logSource="pkg/controller/data_download_controller.go:101"
time="2024-04-30T10:10:55Z" level=info msg="Reconcile backuptest-01-20240430120059-2x42c" controller=datadownload datadownload=velero/backuptest-01-20240430120059-2x42c logSource="pkg/controller/data_download_controller.go:101"
Environment:
velero version
): 1.13.2velero client config get features
): NOT SETkubectl version
): 1.30.0/etc/os-release
): Ubuntu 22.04 LTSVote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: