Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply backupPVCConfig to backupPod volume spec #8141

Merged

Conversation

shubham-pampattiwar
Copy link
Collaborator

Thank you for contributing to Velero!

Please add a summary of your change

  • Modified createBackupPod function to accept backupPVCConfig and make changes wherever applicable to pod spec
    • Pod VolumeSource PVC
  • Modified MakePodPVCAttachment to accept readOnly volume boolean and apply the flag to PersistentVolumeFilesystem type volumeMounts
  • Added more unit tests to the csi exposer module

Please indicate you've done the following:

  • Accepted the DCO. Commits without the DCO will delay acceptance.
  • Created a changelog file or added /kind changelog-not-required as a comment on this pull request.
  • Updated the corresponding documentation in site/content/docs/main.

Copy link

codecov bot commented Aug 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 59.07%. Comparing base (934b3ea) to head (f6e2b01).
Report is 13 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8141   +/-   ##
=======================================
  Coverage   59.07%   59.07%           
=======================================
  Files         364      364           
  Lines       30298    30324   +26     
=======================================
+ Hits        17899    17915   +16     
- Misses      10956    10967   +11     
+ Partials     1443     1442    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Lyndon-Li
Copy link
Contributor

@shubham-pampattiwar
Do you see any problem without this PR? As far as I understand, the readOnly setting for the pod's volume is more like a clarification, it won't block the write IO.

@shubham-pampattiwar
Copy link
Collaborator Author

@shubham-pampattiwar Do you see any problem without this PR? As far as I understand, the readOnly setting for the pod's volume is more like a clarification, it won't block the write IO.

@Lyndon-Li yes there were some defects identified in e2e testing by @msfrucht and these changes helped fixed those.

@msfrucht
Copy link
Contributor

msfrucht commented Aug 22, 2024

@shubham-pampattiwar Do you see any problem without this PR? As far as I understand, the readOnly setting for the pod's volume is more like a clarification, it won't block the write IO.

I found this issue in our end-to-end testing.

One cannot mount the device if the only access mode of the PVC is ReadOnlyMany as the only available option and Pod volumes and volumeMount is not read-only on selinux enabled clusters.

Pod spec without read-only

Read-only PVC

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-pvc-snapshot-restore
  namespace: michael
spec:
  accessModes:
    - ReadOnlyMany
  resources:
    requests:
      storage: 10Gi
  volumeName: pvc-da4a500d-02fe-4e90-94c9-d6b17f3ea77c
  storageClassName: ocs-storagecluster-ceph-rbd
  volumeMode: Filesystem
  dataSource:
    apiGroup: snapshot.storage.k8s.io
    kind: VolumeSnapshot
    name: test-pvc-snapshot
  dataSourceRef:
    apiGroup: snapshot.storage.k8s.io
    kind: VolumeSnapshot
    name: test-pvc-snapshot
kind: Pod
apiVersion: v1
metadata:
  name: no-read-only-mount
  namespace: michael
spec:
  restartPolicy: Always
  serviceAccountName: default
  imagePullSecrets:
    - name: default-dockercfg-fw8mq
  priority: 0
  schedulerName: default-scheduler
  enableServiceLinks: true
  terminationGracePeriodSeconds: 30
  preemptionPolicy: PreemptLowerPriority
  nodeName: 10.240.0.4
  securityContext:
    seLinuxOptions:
      level: 's0:c26,c20'
    runAsNonRoot: true
    fsGroup: 1000690000
    seccompProfile:
      type: RuntimeDefault
  containers:
    - resources: {}
      terminationMessagePath: /dev/termination-log
      name: http
      securityContext:
        capabilities:
          drop:
            - ALL
        runAsUser: 1000690000
        allowPrivilegeEscalation: false
      ports:
        - containerPort: 8080
          protocol: TCP
      imagePullPolicy: IfNotPresent
      volumeMounts:
        - name: test-pvc-no-read-only
          mountPath: /mnt/test
      terminationMessagePolicy: File
      image: 'registry.redhat.io/rhel8/httpd-24@sha256:20e24126039c99a8424966fa9c1e47ddb089d7804f714d18235fea8d7b1475bb'
  serviceAccount: default
  volumes:
    - name: test-pvc-no-read-only
      persistentVolumeClaim:
        claimName: test-pvc-snapshot-restore

Pod is stuck in ContainerCreating due to the volume mount.

containerStatuses:
   - name: busyboxy
     state:
       waiting:
         reason: ContainerCreating
     lastState: {}
     ready: false
     restartCount: 0
     image: 'busybox: latest'
     imageID: ''
     started: false
 qosClass: BestEffort

@Lyndon-Li Yes, it is taken as a suggestion by Kubernetes, not a requirement The failure arises in Kubernetes and Openshift clusters with selinux enabled.

Error: relabel failed /var/data/kubelet/pods/27feba75-95ff-46d3-87cd-aca518c18a59/volumes/kubernetes.io~csi/pvc-56820518-52a7-4e1f-a9a3-49cf0283987d/mount: lsetxattr /var/data/kubelet/pods/27feba75-95ff-46d3-87cd-aca518c18a59/volumes/kubernetes.io~csi/pvc-56820518-52a7-4e1f-a9a3-49cf0283987d/mount: read-only file system

Kubernetes systems and Openshift with selinux relabeling on mount fail without the datamover pod set to mount volumes as read-only.

There are mechanisms to bypass the relabeling, but all of them are implementation specific to the Kubernetes distribution.

Setting the Pod to make use of that the volume is read-only is not.

@Lyndon-Li
Copy link
Contributor

Lyndon-Li commented Aug 23, 2024

@shubham-pampattiwar @msfrucht

This is a good catch.
Could you test one more thing --- make the PVC as ReadWriteOnce but set the pod volume as readOnly.
The reason is that literally the backupPod should go with readOnly volume all the time since the backupPod should not do any change to the snapshot volume.
If the test passes as well, we should adopt the best approach --- always setting the backupPod volume as readOnly instead of setting it according to the PVC's accessMode.

@msfrucht
Copy link
Contributor

msfrucht commented Aug 26, 2024

@Lyndon-Li That configuration has several different forms.

  1. The volumeMount for volumeMode: Filesystem can be set as readOnly true or false.
  2. The volume can be set as readOnly true or false.
  3. volumeDevices for volumeMode: Block does not have a readOnly setting.

From our experience with our own backup and restore datamovers on IBM Fusion that taking a ReadWriteOnce, ReadWriteMany, and ReadWriteOncePod, can be mounted as read-only. And the resulting writes to the volume will fail with the read-only error as expected. The shallow-copy behavior is only triggered in CSI drivers when they find the ROX mode and a VolumeSnapshot or PVC is specified as the dataSourceRef.

Setting the volume as readOnly in the Pod volumes prevents the multiple container in a pod situation where one container would mount the volume readOnly as true and the other container as false. When that is performed kubelet will return an error and refuse to start the container though the mount succeeds. It is irrelevant in single container pods.

Subject PVC restored from VolumeSnapshot.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-pvc-snapshot-restore
  namespace: michael
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  volumeName: pvc-b2fa4a05-c6d4-47db-9a16-1a5cf1e820b3
  storageClassName: ocs-storagecluster-ceph-rbd
  volumeMode: Filesystem
  dataSource:
    apiGroup: snapshot.storage.k8s.io
    kind: VolumeSnapshot
    name: test-pvc-snapshot
  dataSourceRef:
    apiGroup: snapshot.storage.k8s.io
    kind: VolumeSnapshot
    name: test-pvc-snapshot
status:
  phase: Bound
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 10Gi

For ReadWriteMany, volumeMode: Filesystem tests the original PVC was removed and replaced with one from storageclass "ocs-storagecluster-cephfs".

For volumeMode: Block. The RBD PVC in volumeMode: Filesystem was restored as volumeMode: Block from snapshot. This is how "volume image" backups are generally performed as a way to prevent "high file count, small file size" volumes backups and restores from stalling from file system overhead.

And the Pod volumeMounts replaced with volumeDevices. In default Openshift security settings the devicePath must be in /dev. Higher levels of SELinux permissions from SecurityContextConstraints (similar to the now removed PodSecurityPolicy Kubernetes <1.25) objects allow for attachment to other locations.

      volumeDevices:
        - devicePath: /dev/test-pvc
          name: test-pvc

Subject Pod with selinux enabled. Kube API projected volume api access token is not shown as it is not related to this test. ServiceAccount default doesn't have any RBAC permissions anyway. Shown in Test 1 volumeMode: Filesystem with volumeMounts readOnly configuration.

kind: Pod
apiVersion: v1
metadata:
  name: example
  namespace: michael
spec:
  restartPolicy: Always
  serviceAccountName: default
  imagePullSecrets:
    - name: default-dockercfg-fw8mq
  priority: 0
  schedulerName: default-scheduler
  enableServiceLinks: true
  terminationGracePeriodSeconds: 30
  preemptionPolicy: PreemptLowerPriority
  nodeName: 10.240.0.4
  securityContext:
    seLinuxOptions:
      level: 's0:c26,c20'
    runAsNonRoot: true
    fsGroup: 1000690000
    seccompProfile:
      type: RuntimeDefault
  containers:
    - resources: {}
      terminationMessagePath: /dev/termination-log
      name: httpd
      securityContext:
        capabilities:
          drop:
            - ALL
        runAsUser: 1000690000
        allowPrivilegeEscalation: false
      ports:
        - containerPort: 8080
          protocol: TCP
      imagePullPolicy: IfNotPresent
      volumeMounts:
        - name: test-pvc
          readOnly: true
          mountPath: /mnt/test
      terminationMessagePolicy: File
      image: 'registry.redhat.io/rhel8/httpd-24@sha256:20e24126039c99a8424966fa9c1e47ddb089d7804f714d18235fea8d7b1475bb'
  serviceAccount: default
  volumes:
    - name: test-pvc
      persistentVolumeClaim:
        claimName: test-pvc-snapshot-restore
  dnsPolicy: ClusterFirst

Configuration PVC volumeMode: Filesystem AccessModes: ReadWriteOnce

Tests.

  1. volumeMount: readOnly

Pod starts and runs and is in status running with volume mounted. File writes fail due to read-only filesystem.

sh-4.4$ echo "the cow jumped over the moon" > /mnt/test/test-file-write.txt
sh: /mnt/test/test-file-write.txt: Read-only file system
  1. volumeMount: readOnly volume: readOnly

Pod mounts the volume and starts and is in running state. File writes fail due to read-only filesystem.

sh-4.4$ echo "the cow jumped over the moon" > /mnt/test/test-file-write-2.txt
sh: /mnt/test/test-file-write-2.txt: Read-only file system
  1. volume: readOnly

Pod mounts the volume and start is in running state. File writes fail due to read-only filesystem.

sh-4.4$ echo "the cow jumped over the moon" > /mnt/test/file-write-test-3.txt
sh: /mnt/test/file-write-test-3.txt: Read-only file system

This test is repeated for PVC accessModes configurations ReadWriteOnce, ReadWriteMany, and [ReadWriteOnce, ReadWriteMany].

For volumeDevices there is just one test.

  1. volumes readOnly true
sh-4.4$ dd if=/dev/urandom of=/dev/test-pvc count=100 bs=512 
dd: failed to open '/dev/test-pvc': Operation not permitted

This operation succeeds with readOnly removed.

sh-4.4$ dd if=/dev/urandom of=/dev/test-pvc count=100 bs=512 
100+0 records in
100+0 records out
51200 bytes (51 kB, 50 KiB) copied, 0.000462672 s, 111 MB/s
sh-4.4$ 

The test is repeated on volume accessMode configuration ReadWriteOnce, ReadWriteMany, and [RWO, RWX].

@Lyndon-Li
Copy link
Contributor

Lyndon-Li commented Aug 27, 2024

@msfrucht Thanks a lot for the detailed tests.
@shubham-pampattiwar
Considering this fact (and I believe it is true for all envs) datamovers on IBM Fusion that taking a ReadWriteOnce, ReadWriteMany, and ReadWriteOncePod, can be mounted as read-only, I think we should directly set volumeMount as readOnly for backupPod, without checking the mode of the backupPVC.

@msfrucht
Copy link
Contributor

The only combination that fails on Kubernetes without SELinux I could force were

  1. Pod has more than one container, volumes is set to readOnly. One Pod is volumeMounts[].readOnly true and the other container is false.
  2. A PVC accessMode contains: [ReadWriteOnce] and attaching to multiple Pods on different nodes. I had to set a nodeSelector to force this scenario. Kubernetes scheduling is intelligent these days and will find and use the attachment node provided no blocking constraints.

I'm sure there are others. But none that I could find with a single PVC attached to a Pod.

StorageClass volumeBindingMode: WaitForFirstConsumer can have some interesting side-effects, but they all require multiple PVCs attached to a Pod to activate.

@shubham-pampattiwar
Copy link
Collaborator Author

@Lyndon-Li @msfrucht @blackpiglet PTAL ! Updated the PR to always set VolumeMount for backuPod as ReadOnly.

@blackpiglet
Copy link
Contributor

@shubham-pampattiwar
LGTM, but please address the reported linter issue.

@shubham-pampattiwar
Copy link
Collaborator Author

@shubham-pampattiwar LGTM, but please address the reported linter issue.

Done, fixed.

blackpiglet
blackpiglet previously approved these changes Aug 29, 2024
@@ -360,6 +360,7 @@ func MakePodPVCAttachment(volumeName string, volumeMode *corev1api.PersistentVol
volumeMounts = []corev1api.VolumeMount{{
Name: volumeName,
MountPath: volumePath,
ReadOnly: true,
Copy link
Contributor

@Lyndon-Li Lyndon-Li Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not be hardcoded, MakePodPVCAttachment is a shared function and is also used by generic restore.
So we need add a readOnly parameter to this function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, updated the PR.

Signed-off-by: Shubham Pampattiwar <[email protected]>

add changelog file

Signed-off-by: Shubham Pampattiwar <[email protected]>

make backupPod volume mount always readOnly

Signed-off-by: Shubham Pampattiwar <[email protected]>

use assert.True()

Signed-off-by: Shubham Pampattiwar <[email protected]>

Add readOnly param for MakePodPVCAttachment func

lint fix

Signed-off-by: Shubham Pampattiwar <[email protected]>
@Lyndon-Li Lyndon-Li merged commit 3408ffe into vmware-tanzu:main Aug 30, 2024
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants