Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make kopia repo cache place configurable #7725

Open
Lyndon-Li opened this issue Apr 23, 2024 · 22 comments
Open

Make kopia repo cache place configurable #7725

Lyndon-Li opened this issue Apr 23, 2024 · 22 comments

Comments

@Lyndon-Li
Copy link
Contributor

Related to issue #7499 and #7718.
The cache policy decides the root file system disk usage in the pod where data movement is running, and on the other hand, it also impact the restore performance significantly.
Therefore, besides storing the cache in the root file system, we should allow users to add a dedicate volume to hold the cache.

Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

@blackpiglet
Copy link
Contributor

unstale

@github-actions github-actions bot removed the staled label Jul 2, 2024
@itayrin
Copy link

itayrin commented Sep 3, 2024

Hi @Lyndon-Li and hi all,
I'm trying currently to use Velero 6.7.0 with CLI 13.2 and need it to work for production.
I'm experiencing the same issue described in:
#7620 (comment)

I saw that in the next version of Velero (15) there should be a fix by allowing configuration of the Kopia cache - but until then is there a recommended WA?
Currently the only thing I managed to do that allowed both the restore to succeed and the ephemeral storage not to explode is:
Polling on '/var/lib/containerd' for the biggest disk consuming locations, and recursively reaching the path of "../root/.cache/kopia//contents" and deleting that directory.

Other things I tried which solved the disk space issue but caused the restore operations to fail:

  • Assigning ephemeral-storage requests and limits on the node-agent pod
  • Assigning a PVC to the daemonset of the node-agent, in order to mount the cache directory on CSI-Ceph. Couldn't make the PVC to be dynamic, I don't want to create a one-off PVC.
  • Can't access the node-agent pod as it does not have any shell
  • AFAIK Can't configure Velero charts to somehow give Kopia the relevant flags to limit the cache.

If you have any better ideas for a WA I would be happy to hear, thanks.

@itayrin
Copy link

itayrin commented Sep 4, 2024

Update - This solution of mine also doesn't work - it makes the restore to fail on:
Operation Error: error to initialize data path: error to boost backup repository connection default-xdr-mt-kopia: error to connect backup repo: error to connect repo with storage: error to connect to repository: unable to create shared content manager: error setting up read manager caches: unable to initialize content cache: unable to create base cache: error during initial scan of contents: error listing contents: error processing directory shards: error reading directory: readdirent /root/.cache/kopia/b3adfaac80582fd3/contents: no such file or directory Progress description: Failed

@s4ndalHat
Copy link

Hi, does this enhancement can solve the DiskPressure problem that occurs when restoring large data (like 500 Go as example) and if we do not have 500 Go free space on nodes ?

@Lyndon-Li
Copy link
Contributor Author

Lyndon-Li commented Sep 12, 2024

Hi, does this enhancement can solve the DiskPressure problem that occurs when restoring large data (like 500 Go as example) and if we do not have 500 Go free space on nodes ?

Issue #7620 fixes it in 1.15. This issue is a further enhancement.

@s4ndalHat
Copy link

@Lyndon-Li thanks for your response, waiting for the v1.15 ;) may I ask if there is a programmed release date for this version ?

@Lyndon-Li
Copy link
Contributor Author

See 1.15 roadmap https://github.com/vmware-tanzu/velero/wiki/1.15-Roadmap

@reasonerjt
Copy link
Contributor

reasonerjt commented Oct 12, 2024

tentatively move it out of the milestone for now, b/c there may be more complexities.

We need a design b/c we need to handle the cases for velero pod, node-agent, and data-mover pod.

@reasonerjt
Copy link
Contributor

The imapct is low given we have limitation to the size fo the cache.

@reasonerjt reasonerjt removed this from the v1.16 milestone Oct 31, 2024
@msfrucht
Copy link
Contributor

msfrucht commented Nov 8, 2024

@reasonerjt Our systems have some workloads that cause the index cache to get to be very large, 10+GB. The existing metadata and content cache limits don't cover this from kopia.

Admittedly, these workload are unusual and approach worst case possible for deduplication with very high counts of unique blocks.

Combining with Job ttl (required k8s 1.23 minimum) is likely required. As during testing of local implementation found that Jobs without this and a PVC cache the PVC remains past Job completion. Excessive amounts of reserved storage may otherwise be going unused attached to maintenance jobs.

@Lyndon-Li
Copy link
Contributor Author

@shubham-pampattiwar
I am adding you to the assignee of this issue. There are several details to be covered, e.g., how to differentially assign volumes to backupPods/restorePods and how/whether to allow shared volumes among multiple backupPods/restorePods, so we need a design for it.
Feel free to add it to 1.16 milestone if you regard this as high priority from your side.

@shubham-pampattiwar
Copy link
Collaborator

@mpryc will be assisting with this issue. Thank you @mpryc !

@Lyndon-Li
Copy link
Contributor Author

Lyndon-Li commented Dec 11, 2024

@mpryc Please comment in this issue, I couldn't add you to assignee. Please also share your plan about this issue, if you want to make some progress (e.g., pure design or design + implementation) in 1.16, we can move this issue to 1.16 milestone.

@msfrucht
Copy link
Contributor

Cache location divergence for datamovers Restic and Velero to be resolved would be a useful element to this item.

The default Restic cache location needed for PVC mounts is environmental variable VELERO_SCRATCH_DIR or /scratch by default.
The default Kopia cache location needed for PVC mount is whatever HOME is set to, typically /home/velero

Option during install --cache-dir will override for restic, but not for kopia.

@Lyndon-Li
Copy link
Contributor Author

Cache location divergence for datamovers Restic and Velero to be resolved would be a useful element to this item.

The default Restic cache location needed for PVC mounts is environmental variable VELERO_SCRATCH_DIR or /scratch by default. The default Kopia cache location needed for PVC mount is whatever HOME is set to, typically /home/velero

Option during install --cache-dir will override for restic, but not for kopia.

Restic is announced as deprecation since 1.15, so we will try to cover this gap with the design, but if we finally find we need to do special things for Restic path, we probably would drop it.

@mpryc
Copy link
Contributor

mpryc commented Dec 12, 2024

Based on this answer from the Kopia's founder, it seems feasible to share Kopia's cache across node-agents using the same persistent volume (PV).

If separate caches are required for each node-agent, PVs would need to be created before deployment with clear nodeAffinity, which would ensure that specific PVs are accessible only by specific nodes (pinned to node-agents).

Let me know if above makes sense, however I don't know if the part of sharing same cache would be feasible for restic and if the node affinity is worth exploring further.

Challenge here is that the node-agents are created using DeploymentConfig so we are limited in how the PVs can be dynamically attached to the pods, that's why I am thinking of above approach.

@Lyndon-Li
Copy link
Contributor Author

Lyndon-Li commented Dec 12, 2024

@mpryc I think we only need to consider data mover for now, fs-backup which requires node-agent changes is with lower priority. cc @reasonerjt

For the question you mentioned, the answer is yes, we can share the same volume for multiple backupRepository and it is actually the recommended way, as it saves the PVC/PV resource created. But for sure, it requires RWX volumes, if they don't exist, we have to place them into separate volumes.

@msfrucht
Copy link
Contributor

msfrucht commented Dec 13, 2024

ReadWriteOnce access mode does not stop multiple Pods from attaching to the same volume. Just forces the scheduler to attach the Pods to the same node in order for the additional Pods to start.

Even with ReadWriteOnce sharing the same volume is possible provided one keeps track of which node is running what datamover for which backup repository repository. It would be a major increase in implementation difficulty.

@reasonerjt
Copy link
Contributor

Based on the discussion we don't have agreement on the scope and solution for this issue.
I'm removing the "candidate" label before "Feature Freeze" of v1.16, and leave it in "backlog".
It doesn't block us continue discussing and writing a design though.

@mpryc
Copy link
Contributor

mpryc commented Dec 19, 2024

There are two parts of this, each rather separate option:

  1. Datamover with option to attach external storage (as @Lyndon-Li wrote)
    Here we have two approaches, if more then please let me know also let me know if my thinking is right:

    • Allowing users to specify an existing Persistent Volume that will be mounted to the new microservice
    • Allowing users to create a new PV and then mount it to the new microservice. This however might add significant complexity, especially if it involves user-specified parameters like size / storage class and other sub-options that PVc allows. Eventually handling cases when the PVc already exists - use it or bail out.
  2. Node agent using similarly mounted PVs - @Lyndon-Li why you think this is much more complex?

@Lyndon-Li
Copy link
Contributor Author

Node agent using similarly mounted PVs - @Lyndon-Li why you think this is much more complex?

Whenever we attach a new volume to node-agent, all node-agent pods will be restarted, this is not accepted if we do this out of controller, because once node-agent pods restart, all fs-backup and data mover backup/restore are affected.
Therefore, we need to design a sophisticated user interaction, so that when users configure the volumes, no running fs-backup and data mover backup/restore are running.

Additionally, I think doing this for node-agent is with lower priority since from the backup method data mover is preferred, fs-backup is not used unless data mover is unavailable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants