Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backups with errors aren't marked as partially complete #509

Closed
carlpett opened this issue May 21, 2018 · 8 comments
Closed

Backups with errors aren't marked as partially complete #509

carlpett opened this issue May 21, 2018 · 8 comments

Comments

@carlpett
Copy link
Contributor

carlpett commented May 21, 2018

I had a permission issue with my cloud credentials which resulted in Ark not being able to snapshot volumes. However, without going through the logs, it seems like there is no indication of this failure: ark backup get lists it as Completed, and ark backup describe my-backup only has a field for Validation errors, which is empty (reasonably enough).

Shouldn't there be some more indication that errors happened at least? Since I didn't see any errors, I restored the backup, and get to a semi-broken state since the pv/pvc objects aren't created.

@ncdc
Copy link
Contributor

ncdc commented May 21, 2018 via email

@carlpett
Copy link
Contributor Author

There were two issues, actually. Some setup-info for context:
We're running on Azure, with acs-engine to create our clusters (currently k8s v1.9.2). I've set up the Ark storage account in a resource group k8s-infra, and created a service principal k8s-ark. Clusters are running in a separate resource group per environment, eg k8s-dev, k8s-staging, etc.

First, I had not given the service principal access on the right resource groups (I cannot follow the guide entirely, since I cannot give subscription-wide access to service accounts, and made a manual mistake here). So this caused a bunch of 403:s (but still successful backup state).

Then, I had misconfigured the Secret with the environment vars, so AZURE_RESOURCE_GROUP pointed to k8s-infra, rather than the resource group of the cluster. This caused 404:s, since it couldn't find the pvc:s. This also didn't cause the backup to fail.

I'll sanitize the logs and post them as a gist in a few minutes

@carlpett
Copy link
Contributor Author

Here's the output from the first scenario: https://gist.github.com/carlpett/dbb1d336461a6de8207facdd7685b2c1

@ncdc
Copy link
Contributor

ncdc commented May 22, 2018

@carlpett I'm surprised this showed up as Completed given the snapshot errors. We'll see if we can reproduce and track down where the issue is.

@carlpett
Copy link
Contributor Author

@ncdc From a quick read through the code, could it be that the issue is that any error returned is ignored here? (Line 214 in the below snippet)

https://github.com/heptio/ark/blob/dc8c66b30525e54df4d8e177f84b54655cc762f1/pkg/backup/item_backupper.go#L195-L215

@ncdc
Copy link
Contributor

ncdc commented May 22, 2018

@carlpett this is it! Thank you! We'll get this fixed in master asap (unless you want to do a PR?).

@carlpett
Copy link
Contributor Author

Nice! I'll gladly make a PR :)

@ncdc
Copy link
Contributor

ncdc commented May 22, 2018

Somewhat related: #511

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants