Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support creating backups in kind clusters #4962

Closed
jsanda opened this issue Jun 6, 2022 · 12 comments
Closed

Support creating backups in kind clusters #4962

jsanda opened this issue Jun 6, 2022 · 12 comments

Comments

@jsanda
Copy link

jsanda commented Jun 6, 2022

Describe the problem/challenge you have
I am unable to backup PVs in kind clusters. I understand that Restic backups are not supported out of the box with hostPath volumes. The project I work on (as well as many others) use kind clusters extensively for development and testing. kind uses local-path-provisioner for dynamic volume provisioning. These are hostPath volumes.

I have velero installed with the restic daemonset. I setup the minio storage provider. Here is an example command line I used for creating a backup:

velero backup create backup-4 --include-namespaces k8ssandra-operator --include-resources PersistentVolumeClaims,PersistentVolumes --snapshot-volumes=false

The backup completes without error, but the contents of the PV are not backed up.

Here is the relevant part of the logs:

time="2022-06-03T21:30:24Z" level=info msg="Getting items for resource" backup=velero/backup-4 group=config.k8ssandra.io/v1beta1 logSource="pkg/backup/item_collector.go:170" resource=clientconfigs
time="2022-06-03T21:30:24Z" level=info msg="Skipping resource because it's excluded" backup=velero/backup-4 group=config.k8ssandra.io/v1beta1 logSource="pkg/backup/item_collector.go:207" resource=clientconfigs
time="2022-06-03T21:30:24Z" level=info msg="Collected 1 items matching the backup spec from the Kubernetes API (actual number of items backed up may be more or less depending on velero.io/exclude-from-backup annotation, plugins returning additional related items to back up, etc.)" backup=velero/backup-4 logSource="pkg/backup/backup.go:264" progress=
time="2022-06-03T21:30:24Z" level=info msg="Processing item" backup=velero/backup-4 logSource="pkg/backup/backup.go:340" name=server-data-test-dc1-default-sts-0 namespace=k8ssandra-operator progress= resource=persistentvolumeclaims
time="2022-06-03T21:30:24Z" level=info msg="Backing up item" backup=velero/backup-4 logSource="pkg/backup/item_backupper.go:122" name=server-data-test-dc1-default-sts-0 namespace=k8ssandra-operator resource=persistentvolumeclaims
time="2022-06-03T21:30:24Z" level=info msg="Executing custom action" backup=velero/backup-4 logSource="pkg/backup/item_backupper.go:311" name=server-data-test-dc1-default-sts-0 namespace=k8ssandra-operator resource=persistentvolumeclaims
time="2022-06-03T21:30:24Z" level=info msg="Executing PVCAction" backup=velero/backup-4 cmd=/velero logSource="pkg/backup/backup_pv_action.go:49" pluginName=velero
time="2022-06-03T21:30:24Z" level=info msg="Backing up item" backup=velero/backup-4 logSource="pkg/backup/item_backupper.go:122" name=pvc-0d0d7ac0-7cb6-478a-b537-3bf466dedff5 namespace= resource=persistentvolumes
time="2022-06-03T21:30:24Z" level=info msg="Executing takePVSnapshot" backup=velero/backup-4 logSource="pkg/backup/item_backupper.go:395" name=pvc-0d0d7ac0-7cb6-478a-b537-3bf466dedff5 namespace= resource=persistentvolumes
time="2022-06-03T21:30:24Z" level=info msg="Backup has volume snapshots disabled; skipping volume snapshot action." backup=velero/backup-4 logSource="pkg/backup/item_backupper.go:398" name=pvc-0d0d7ac0-7cb6-478a-b537-3bf466dedff5 namespace= resource=persistentvolumes
time="2022-06-03T21:30:24Z" level=info msg="Backed up 2 items out of an estimated total of 2 (estimate will change throughout the backup)" backup=velero/backup-4 logSource="pkg/backup/backup.go:380" name=server-data-test-dc1-default-sts-0 namespace=k8ssandra-operator progress= resource=persistentvolumeclaims
time="2022-06-03T21:30:24Z" level=info msg="Backed up a total of 2 items" backup=velero/backup-4 logSource="pkg/backup/backup.go:405" progress=
time="2022-06-03T21:30:24Z" level=info msg="Setting up backup store to persist the backup" backup=velero/backup-4 logSource="pkg/controller/backup_controller.go:663"
time="2022-06-03T21:30:25Z" level=info msg="Backup completed" backup=velero/backup-4 controller=backup logSource="pkg/controller/backup_controller.go:673"

I was expecting to see this warning in backupper.go about hostPath volumes not being supported get logged.

Here is the spec of the PV to confirm it is a hostPath volume:

spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 5Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: server-data-test-dc1-default-sts-0
    namespace: k8ssandra-operator
    resourceVersion: "33849"
    uid: 0d0d7ac0-7cb6-478a-b537-3bf466dedff5
  hostPath:
    path: /var/local-path-provisioner/pvc-0d0d7ac0-7cb6-478a-b537-3bf466dedff5_k8ssandra-operator_server-data-test-dc1-default-sts-0
    type: DirectoryOrCreate
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - k8ssandra-0-worker3
  persistentVolumeReclaimPolicy: Delete
  storageClassName: standard
  volumeMode: Filesystem

Describe the solution you'd like
I would like backups to be supported with kind clusters. We don't use kind for production deployments, but we use it extensively for dev/testing. I have been working on a prototype to integrate velero into my project. I did my initial dev/testing with GKE. Now I am blocked since backups don't work with kind.

Anything else you would like to add:
See #3053 for related discussion.

Environment:

  • Velero version (use velero version): v1.8.1
  • Kubernetes version (use kubectl version): 1.22.7
  • Kubernetes installer & version: N/A
  • Cloud provider or hardware configuration: N/A
  • OS (e.g. from /etc/os-release): Ubuntu 21.10

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "The project would be better with this feature added"
  • 👎 for "This feature will not enhance the project in a meaningful way"
@Lyndon-Li
Copy link
Contributor

Please add --default-volumes-to-restic option in your backup command

@jsanda
Copy link
Author

jsanda commented Jun 6, 2022

I ran velero backup create backup-6 --include-namespaces k8ssandra-operator --include-resources PersistentVolumeClaims,PersistentVolumes --snapshot-volumes=false --default-volumes-to-restic=true.

The log output doesn't appear any different:

time="2022-06-06T13:06:46Z" level=info msg="Backing up item" backup=velero/backup-6 logSource="pkg/backup/item_backupper.go:122" name=pvc-0d0d7ac0-7cb6-478a-b537-3bf466dedff5 namespace= resource=persistentvolumes
time="2022-06-06T13:06:46Z" level=info msg="Executing takePVSnapshot" backup=velero/backup-6 logSource="pkg/backup/item_backupper.go:395" name=pvc-0d0d7ac0-7cb6-478a-b537-3bf466dedff5 namespace= resource=persistentvolumes
time="2022-06-06T13:06:46Z" level=info msg="Backup has volume snapshots disabled; skipping volume snapshot action." backup=velero/backup-6 logSource="pkg/backup/item_backupper.go:398" name=pvc-0d0d7ac0-7cb6-478a-b537-3bf466dedff5 namespace= resource=persistentvolumes
time="2022-06-06T13:06:46Z" level=info msg="Backed up 2 items out of an estimated total of 2 (estimate will change throughout the backup)" backup=velero/backup-6 logSource="pkg/backup/backup.go:380" name=server-data-test-dc1-default-sts-0 namespace=k8ssandra-operator progress= resource=persistentvolumeclaims
time="2022-06-06T13:06:46Z" level=info msg="Backed up a total of 2 items" backup=velero/backup-6 logSource="pkg/backup/backup.go:405" progress=
time="2022-06-06T13:06:46Z" level=info msg="Setting up backup store to persist the backup" backup=velero/backup-6 logSource="pkg/controller/backup_controller.go:663"
time="2022-06-06T13:06:46Z" level=info msg="Backup completed" backup=velero/backup-6 controller=backup logSource="pkg/controller/backup_controller.go:673"

@jsanda
Copy link
Author

jsanda commented Jun 7, 2022

Based on this block in item_backupper.go` it looks like Restic is used only when Pods are in the list of resource types to be backed up:

	var (
		backupErrs            []error
		pod                   *corev1api.Pod
		resticVolumesToBackup []string
	)

	if groupResource == kuberesource.Pods {
		// pod needs to be initialized for the unstructured converter
		pod = new(corev1api.Pod)
		if err := runtime.DefaultUnstructuredConverter.FromUnstructured(obj.UnstructuredContent(), pod); err != nil {
			backupErrs = append(backupErrs, errors.WithStack(err))
			// nil it on error since it's not valid
			pod = nil
		} else {
			// Get the list of volumes to back up using restic from the pod's annotations. Remove from this list
			// any volumes that use a PVC that we've already backed up (this would be in a read-write-many scenario,
			// where it's been backed up from another pod), since we don't need >1 backup per PVC.
			for _, volume := range restic.GetPodVolumesUsingRestic(pod, boolptr.IsSetToTrue(ib.backupRequest.Spec.DefaultVolumesToRestic)) {
				if found, pvcName := ib.resticSnapshotTracker.HasPVCForPodVolume(pod, volume); found {
					log.WithFields(map[string]interface{}{
						"podVolume": volume,
						"pvcName":   pvcName,
					}).Info("Pod volume uses a persistent volume claim which has already been backed up with restic from another pod, skipping.")
					continue
				}

				resticVolumesToBackup = append(resticVolumesToBackup, volume)
			}

			// track the volumes that are PVCs using the PVC snapshot tracker, so that when we backup PVCs/PVs
			// via an item action in the next step, we don't snapshot PVs that will have their data backed up
			// with restic.
			ib.resticSnapshotTracker.Track(pod, resticVolumesToBackup)
		}
	}

resticVolumesToBackup is only updated inside that if block. Based on this analysis, I created another backup that includes Pod resources with this command:

velero backup create backup-8 --include-namespaces k8ssandra-operator --include-resources Pods --snapshot-volumes=false --default-volumes-to-restic=true --selector "cassandra.datastax.com/cluster=test"

Now I see some of the expected log output:

time="2022-06-07T00:09:28Z" level=info msg="Initializing restic repository" controller=restic-repo logSource="pkg/controller/restic_repository_controller.go:158" name=k8ssandra-operator-default-crk7g namespace=velero
time="2022-06-07T00:09:31Z" level=warning msg="Volume server-data in pod k8ssandra-operator/test-dc1-default-sts-0 is a hostPath volume which is not supported for restic backup, skipping" backup=velero/backup-8 logSource="pkg/restic/backupper.go:156" name=test-dc1-default-sts-0 namespace=k8ssandra-operator resource=pods

If someone could offer me a bit of guidance, I would be happy to contribute a PR for this!

@Lyndon-Li
Copy link
Contributor

@jsanda yes, the restic backup is called PodVolumeBackup formally, therefore, it works on pods, this is the expected behavior.
For the backup command, people usually want to backup all the resources under a namespace instead of just the pods, therefore, --include-namesapces backup option is usually used instead of --include-resources Pods.

Go back to the original problem of this issue:

Practically, we don't use issues to discuss development or PR, if you want to contribute to Velero, please join the "velero-dev" slack channel.

Thanks.

@jsanda
Copy link
Author

jsanda commented Jun 7, 2022

I am not interested in supporting hostPath in general, only for the purposes of dev/testing. This isn't exclusive to kind. If I were to use k3d for example, I would hit this same issue as it also uses local-path-provisioner.

Other volumes do exist under /var/lib/kubelet/pods. PVs created by local-path-provision (which are hostPath) are stored under /var/local-path-provisioner. I tried following the instructions in 2767 for configuring the hostPath var in the restic daemonset to point to /var/local-path-provisioner. This causes the restic pods to fail on start up with this error:

An error occurred: unexpected directory structure for host-pods volume, ensure that the host-pods volume corresponds to the pods subdirectory of the kubelet root directory.

3053 was closed as a duplicate of 2767. Here's why I think this is a different issue. It seems to me that the restic server needs volume mounts for both /var/lib/kubelet/pods and for /var/local-path-provisioner. It looks like there can only be one host-pods volume though.

If my analysis is anywhere near correct I will happily move the discussion to the velero-dev channel :) If my analysis is wrong, then it doesn't look like I will be able to use velero with kind :(

@Lyndon-Li
Copy link
Contributor

@jsanda Got you. The problem is similar to 3053/2767, that requires to support non-standard mount path. However, it is more complicated than them and cannot be solved by their solution:

  • There are multiple PVs
  • Some PVs have standard mount path /var/lib/kubelet/pods
  • Some PVs have non-standard mount path /var/local-path-provisioner

Therefore, Velero needs to support to search /var/lib/kubelet/pods and /var/local-path-provisioner and the same time for Restic backup.

The requirement is clear now, so we can keep this issue open.
However, we won't simply fix it by adding /var/local-path-provisioner to the searching paths, because local-path-provision is only one case, other cases may use different path. Therefore, when we fix it, we will consider a comprehensive fix.

@reasonerjt reasonerjt added env/kind Restic Relates to the restic integration backlog labels Jun 12, 2022
@alekc
Copy link

alekc commented Feb 3, 2023

Just to add to this:

local-path-provisioner in itself supports dynamic mapping based on the nodes (https://github.com/rancher/local-path-provisioner#customize-the-configmap), so you cannot simply rely on /var/local-path-provisioner.

Having said that, is there a reason we cannot rely on the persistent volume data in case of the hostpath? I.e. this is an excerpt from one of my pv which I would like to backup:

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: cluster.local/local-path-provisioner
  creationTimestamp: "2023-02-02T22:49:03Z"
  finalizers:
  - kubernetes.io/pv-protection
  name: pvc-b687054c-0ce8-4443-aeed-0cdacc7fcf86
  resourceVersion: "313243510"
  uid: 232120db-6396-4dee-a598-89526910d771
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 5Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: home-assistant
    namespace: hass
    resourceVersion: "313243426"
    uid: b687054c-0ce8-4443-aeed-0cdacc7fcf86
  hostPath:
    path: /4t/k8s-data/local-path-prov/pvc-b687054c-0ce8-4443-aeed-0cdacc7fcf86_hass_home-assistant
    type: DirectoryOrCreate

as you can see, file location are clearly defined in spec.hostPath.path, could that path be used for the backup? @Lyndon-Li

@Lyndon-Li Lyndon-Li added PodVolume and removed Restic Relates to the restic integration labels Feb 3, 2023
@Lyndon-Li
Copy link
Contributor

@alekc
For host path PV, we can solve it by checking the hostPath in the PV's Spec, though this is for host path PV only

@stale
Copy link

stale bot commented Apr 7, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the staled label Apr 7, 2023
@Lyndon-Li Lyndon-Li removed the staled label Apr 7, 2023
@stale
Copy link

stale bot commented Jun 10, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the staled label Jun 10, 2023
@Lyndon-Li Lyndon-Li removed the staled label Jun 11, 2023
@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

@github-actions
Copy link

This issue was closed because it has been stalled for 14 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants