Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting the /spec/accessModes of a PVC created for the CSI snapshot data movement #7747

Closed
Borrelhapje opened this issue Apr 27, 2024 · 14 comments · Fixed by #8109
Closed

Comments

@Borrelhapje
Copy link

Borrelhapje commented Apr 27, 2024

Describe the problem/challenge you have

I'm using rook-ceph as a storage backend. It can take snapshots in O(1) time, but creating a new PVC from this snapshot takes much longer IF the PVC needs has write capabilities (as then rook-ceph goes and copies the complete data to a new PV). I just tried creating a new PVC from a volumesnapshot with /spec/accessModes set to ReadOnlyMany, and that is instant as rook-ceph just serves the snapshot. However, there seems to be no option to set the PVC accessModes in the Backup CRD.

Describe the solution you'd like

I'd like to be able to specify which accessModes the volumes created by the csi data movement plugin have (similarly to #7700 which would like to set the storageclass for performance reasons). If I understand the csi data movement plugin setup it should not need write access to a PVC being sent to s3.

Anything else you would like to add:

One could set ReadOnlyMany as the default value, however not all CSI providers may support it so I think it's fine to default to the accessModes of the original volume.

Environment:

  • Velero version (use velero version): 1.13.0 (velero-plugin-for-csi 0.7.1, velero-plugin-for-aws 1.9.2)
  • Kubernetes version (use kubectl version): 1.30
  • Kubernetes installer & version: kubeadm
  • Cloud provider or hardware configuration: debian bookworm
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "The project would be better with this feature added"
  • 👎 for "This feature will not enhance the project in a meaningful way"
@Lyndon-Li Lyndon-Li self-assigned this Apr 28, 2024
@Lyndon-Li
Copy link
Contributor

Lyndon-Li commented Apr 28, 2024

The PVC created from the snapshot is indeed used in readOnly way, let me check if we can add something like this --- use ReadOnlyMany first, if it fails, fallback to ReadWriteOnce

@reasonerjt
Copy link
Contributor

Let's combine this requirement with #7700, that we design a way to allow users to set the attributes of PVC created by datamover.

@reasonerjt
Copy link
Contributor

The design is merged, but per discussion we wanna leave it in backlog due to limited bandwidth, if later we have more time we'll handle it in v1.15

@shubham-pampattiwar
Copy link
Collaborator

@reasonerjt @Lyndon-Li We need this feature for OADP as well, I can work on this for 1.15
cc: @weshayutin @sseago

@Lyndon-Li Lyndon-Li removed the backlog label Jul 25, 2024
@Lyndon-Li Lyndon-Li added this to the v1.15 milestone Jul 25, 2024
@Lyndon-Li
Copy link
Contributor

Lyndon-Li commented Jul 25, 2024

@shubham-pampattiwar
Thanks for the help. I have assigned the issue to you and move it back to 1.15 milestone.
Just be reminded that the changes conflict with data mover micro service (or we should make the changes based on the new code with data mover micro service), so please just wait some time until we have a stable build after data mover micro service changes.

@shubham-pampattiwar
Copy link
Collaborator

@Lyndon-Li yeah makes sense, I will start working on this feature once the data mover micro service changes are merged.

@realjump
Copy link

@Borrelhapje

If you need a quick workaround for the described situation, you could possibly try the following:

We are currently facing the same issue in the context of backups of a very large application with hundreds of PVs - as a quick workaround until the functionality is available in Velero, you could also use the k8s dynamic admission control by using a MutatingWebhook that mutates the AccessMode to ReadOnlyMany in the Velero namespace for all new PVCs.

We tested this in our case via Kyverno and a corresponding ClusterPolicy and were able to avoid copying the snapshot data to a new pvc even with the current Velero version.

@sseago
Copy link
Collaborator

sseago commented Jul 29, 2024

@realjump I can see this working on backup, of course -- but for the PVCs created on restore for data movement -- wouldn't those need to be writable?

@realjump
Copy link

realjump commented Jul 29, 2024

@realjump I can see this working on backup, of course -- but for the PVCs created on restore for data movement -- wouldn't those need to be writable?

@sseago: Good point.. yes, I think the policy must be deleted for a possible restore - should only serve as a workaround for the backup, since with many PVs the bottleneck seems to be the copying of the whole snapshot data before the final DataUpload takes place.
The restore is not the problem in our case, as DataDownload works relatively efficiently in conjunction with parallelism for the new PVs.

@larssb
Copy link

larssb commented Jul 29, 2024

@Borrelhapje

If you need a quick workaround for the described situation, you could possibly try the following:

We are currently facing the same issue in the context of backups of a very large application with hundreds of PVs - as a quick workaround until the functionality is available in Velero, you could also use the k8s dynamic admission control by using a MutatingWebhook that mutates the AccessMode to ReadOnlyMany in the Velero namespace for all new PVCs.

We tested this in our case via Kyverno and a corresponding ClusterPolicy and were able to avoid copying the snapshot data to a new pvc even with the current Velero version.

@realjump oh that's nice! Would you be so kind to share the Kyverno Mutating policy you used? If Kyverno !

Thank you very much.

@realjump
Copy link

@Borrelhapje
If you need a quick workaround for the described situation, you could possibly try the following:
We are currently facing the same issue in the context of backups of a very large application with hundreds of PVs - as a quick workaround until the functionality is available in Velero, you could also use the k8s dynamic admission control by using a MutatingWebhook that mutates the AccessMode to ReadOnlyMany in the Velero namespace for all new PVCs.
We tested this in our case via Kyverno and a corresponding ClusterPolicy and were able to avoid copying the snapshot data to a new pvc even with the current Velero version.

@realjump oh that's nice! Would you be so kind to share the Kyverno Mutating policy you used? If Kyverno !

Thank you very much.

For example, you could do something like this - you might have to adapt it for your purposes, but it should work like that in the default case.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: mutate-pvc-accessmode
spec:
  rules:
    - name: mutate-pvc-accessmode
      match:
        resources:
          kinds:
            - PersistentVolumeClaim
          namespaces:
            - velero
      mutate:
        patchStrategicMerge:
          spec:
            accessModes: ["ReadOnlyMany"]

@Borrelhapje
Copy link
Author

Thanks for this feature! I just tried it and it works very well.

@larssb
Copy link

larssb commented Nov 14, 2024

Oh yeah! It's finally here. Going to upgrade and get going on v1.15 ASAP.

@larssb
Copy link

larssb commented Dec 16, 2024

Agree this works. Wonderful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment