Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore in GKE is not working as expected, folder called "mount" is created and all the content is restored inside this folder #5149

Closed
mcortinas opened this issue Jul 25, 2022 · 11 comments · Fixed by #5181
Assignees
Labels
Area/CSI Related to Container Storage Interface support Bug CSI Migration Restic Relates to the restic integration
Milestone

Comments

@mcortinas
Copy link

Started working on this in slack community, let me share the thread https://kubernetes.slack.com/archives/C6VCGP4MT/p1658307341547949
previous issues related in this other slack channel https://kubernetes.slack.com/archives/C6VCGP4MT/p1658003621239549

What steps did you take and what happened:
Basically I'm trying to do a backup in one GKE cluster (K8s in GCP) in one GCP project and I'm trying to restore in other GCP project.
My backup and restore it in one k8s namespaces and I want to restore Redis and Elasticsearch mainly.
Origin GKE in one gpc project
velero backup create redis-restic --include-namespaces redis-restic -n velero-restic
Target GKE in other gcp project
velero restore create --from-backup redis-restic -n velero-restic
Both gke clusters has been sharing the same GCS bucket and the installation procedure, described below this.

I saw this bad behavior applying in two examples, redis and mysql-galera
Example1 : redis
Example 2: mariadb-galera

What did you expect to happen:
Restore all the objects in the namespaces and also all the PV restoring each pv from the restic repository respecting the same hierarchy of the source, it means restoring all the content in the root path on the PV mounted in the POD, NOT inside a folder created and called mount.
Let me share a screenshot of one of Redis PODS, this screenshot describes very well my issues
image

Environment:

  • Velero version (use velero version):
    Client: Version: v1.9.0 Git commit: 6021f148c4d7721285e815a3e1af761262bff029 Server: Version: v1.9.0
  • Velero features (use velero client config get features):
    features: <NOT SET>
  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3", GitCommit:"aef86a93758dc3cb2c658dd9657ab4ad4afc21cb", GitTreeState:"clean", BuildDate:"2022-07-13T14:30:46Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"} Kustomize Version: v4.5.4 Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.12-gke.1500", GitCommit:"6c11aec6ce32cf0d66a2631eed2eb49dd65c89f8", GitTreeState:"clean", BuildDate:"2022-05-11T09:25:37Z", GoVersion:"go1.16.15b7", Compiler:"gc", Platform:"linux/amd64"} WARNING: version difference between client (1.24) and server (1.21) exceeds the supported minor version skew of +/-1
  • Cloud provider or hardware configuration:
    Google kubernetes Engine
  • Velero Restic Installation
    velero install \ --use-restic \ --provider gcp \ --plugins velero/velero-plugin-for-gcp:v1.5.0 \ --namespace velero-restic\ --bucket edo-platform-lab01-velero-marc1 \ --use-volume-snapshots=false \ --default-volumes-to-restic \ --secret-file ./credentials-velero
  • OS (e.g. from /etc/os-release):
    Redis PODS:
    PRETTY_NAME="Debian GNU/Linux 11 (bullseye)" NAME="Debian GNU/Linux" VERSION_ID="11" VERSION="11 (bullseye)"
    K8s NODES
    VERSION OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME v1.20.12-gke.1500 Container-Optimized OS from Google 5.4.144+ docker://20.10.3

Restore Logs:
Let me attache the budle file from velero debug --backup redis --restore redis-20220725112348 -n velero-restic
bundle-2022-07-25-11-31-31.tar.gz

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@qiuming-best qiuming-best added the Restic Relates to the restic integration label Jul 26, 2022
@qiuming-best
Copy link
Contributor

I’ve followed your step of backup & restore redis example, there is no mount directory when I restored the whole redis cluster. Is there some steps that I’m missing when reproducing? @mcortinas
bellow it is the directory that I restored
image

@mcortinas
Copy link
Author

seriouslly?! i tried two times with redis and finally i also checked with mysql once and i saw mount folder all the times... I always did this from one gcp_project/gke restoring to another gcp_project/gke using the same gcs bucket wth Restic.... maybe I'm doing something bad.... I shared all the logs in this issue, could you help me checking the logs if you can see something wrong restoring with restic? do you know if I can share something more in my side?

@qiuming-best
Copy link
Contributor

@mcortinas it's really strange. Also I created a flag file in /bitnami/redis/data directory. I just cannot reproduced it...

through the log you provided, I cannot find something about creating mount directory

@mcortinas
Copy link
Author

maybe in my scenario the origin and the target is a different gpc projects and gke clusters, both are using the same gcs bucket and the same IAM roles, maybe this is the difference...

@blackpiglet
Copy link
Contributor

@mcortinas
Could you help to check the volume is mounted to which directory by command like fdisk -l or mount?
Want to find out which part added the mount into the path.

@blackpiglet
Copy link
Contributor

blackpiglet commented Aug 2, 2022

@mcortinas
After checking, I think this is related to CSI plugin version.
I check the mount path on my TKG AWS worker node's PV mounting directory

/dev/nvme2n1 on /var/lib/kubelet/pods/d8f08da1-35ce-4aea-96b1-0f39623c546e/volumes/kubernetes.io~csi/pvc-1b4ced18-74b5-449f-ae80-01d58ff0a58d/mount type ext4 (rw,relatime)

PV is mounted to a sub-directory called mount by kubelet CSI. I think if provider CSI plugin or
k8s old version doesn't handle it well, it's possible mount all data under the PV name directory. We can also see the vol_data.json file in the screenshot. That is the description file used by CSI mount.
I suggest to upgrade your k8s version to newest in corresponding release.

@blackpiglet
Copy link
Contributor

blackpiglet commented Aug 3, 2022

@mcortinas
Sorry for the my inaccurate reply.
I think I found the reason for your case.
This is because k8s CSI volume is mounted to different directory than before.
It used to be something like
/var/lib/kubelet/pods/d8f08da1-35ce-4aea-96b1-0f39623c546e/volumes/kubernetes.io~csi/pvc-1b4ced18-74b5-449f-ae80-01d58ff0a58d/, but now it is something like
/var/lib/kubelet/pods/d8f08da1-35ce-4aea-96b1-0f39623c546e/volumes/kubernetes.io~csi/pvc-1b4ced18-74b5-449f-ae80-01d58ff0a58d/mount.

Velero Restic doesn't adopt that. It still goes for
/var/lib/kubelet/pods/d8f08da1-35ce-4aea-96b1-0f39623c546e/volumes/kubernetes.io~csi/pvc-1b4ced18-74b5-449f-ae80-01d58ff0a58d/.

I think your case is that backing up on a k8s cluster that enabled CSI migration or using CSI in volume, and restore into a k8s cluster didn't enable CSI migration yet.

I am working on a fix, and it should be included in Velero v1.10.

@reasonerjt
Copy link
Contributor

Let‘s make sure this is fixed in v1.9.1

@blackpiglet blackpiglet added Area/CSI Related to Container Storage Interface support CSI Migration labels Aug 5, 2022
@blackpiglet
Copy link
Contributor

Some related documents related to CSI provision and CSI migration.

CSI migration design updated to add annotation "pv.kubernetes.io/migrated-to": persistent-volume-controller

Dynamically provision PV with in-tree volume when CSIMigration is on has no spec.csi : Dynamically Provisioned Volumes

Annotation "pv.kubernetes.io/provisioned-by": volume provisioning

@blackpiglet
Copy link
Contributor

blackpiglet commented Aug 14, 2022

Test case

---
apiVersion: v1
kind: Namespace
metadata:
  name: test
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
  namespace: test
spec:
  storageClassName: standard
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
  namespace: test
spec:
  storageClassName: standard
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
root@jxun-jumpserver:~# cat deploy.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-app
  namespace: test
spec:
  selector:
    matchLabels:
      app: hello-app
  template:
    metadata:
      labels:
        app: hello-app
    spec:
      containers:
      - name: hello-app
        image: nginx
        args: [ "sleep", "3600" ]
        volumeMounts:
        - name: sdk-volume
          mountPath: /usr/share/hello/
        - name: empty
          mountPath: /usr/share/empty/
      volumes:
      - name: sdk-volume
        persistentVolumeClaim:
          claimName: my-pvc
      - name: empty
        emptyDir: {}
wget https://github.com/vmware-tanzu/velero/releases/download/v1.9.1-rc.2/velero-v1.9.1-rc.2-linux-amd64.tar.gz

tar zxvf velero-v1.9.1-rc.2-linux-amd64.tar.gz 

cp velero-v1.9.1-rc.2-linux-amd64/velero /usr/local/bin/

velero install \
    --provider gcp \
    --bucket jxun \
    --secret-file ~/Documents/credentials-velero-gcp \
    --image velero/velero:v1.9.1-rc.2 \
    --plugins velero/velero-plugin-for-gcp:v1.5.0 \
    --use-restic

velero backup create restic-csi-migration --include-namespaces=test --default-volumes-to-restic

velero restore create --from-backup restic-csi-migration --namespace-mappings=test:test1

@mcortinas
Copy link
Author

Hi, this sounds great! thank you @blackpiglet !
apologies for my delay answer, I was on vacations.... Yes, you're right, my backup source is a new :k8s: cluster and we want to restore in a old :k8s: cluster....
Awesome @blackpiglet , thank you very much for your help and your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/CSI Related to Container Storage Interface support Bug CSI Migration Restic Relates to the restic integration
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants