-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to save last checkpoint as copy instead of symlinking #18995
Comments
I didn't fully understand the motivation. What if you added a step at the end of your script that moves/copies the symlink target to the symlink location? This should give you the behaviour that you want without having to copy every in-between last checkpoint along the way. |
I'm also wondering what the challenge is here. A symbolic link is a file too, so it can be backed up. And it (normally) is a relative path, so if you download the checkpoint folder from your backup to a different location, the link will continue to just work. |
there are several symlink files that can be output to an experiment folder (e.g. W&B run symlinks). Managing symlinks independently of one another is an additional overhead for certain libraries (e.g. Having an option |
it would be helpful to be able to do this during training. A workaround is to either
Both of these seem more involved than adding a |
This comment was marked as abuse.
This comment was marked as abuse.
Not all commands are designed to handle symlinks. For instance, I just faced an issue with Therefore, in a workflow, when we want to check if a checkpoint path exists before loading from it, this will not work! |
To add to my previous comment, it is possible to get the actual path using the To get the actual path from the symlink, we can use Let's say, that the user has renamed one of the parent directories in the path of last.ckpt. This means, I think it is a great option to consider this feature request. If required, I am willing to contribute to this issue. Although, I need to understard the entire workflow! Looking forward to your sugestions! |
@bgswaroop This feature request is completed. And #19303 made the link consistently relative. |
Description & Motivation
Saving the
last.ckpt
as a symlink on local file systems makes a lot of sense for most workflows. However, in a several cases, users often back up their checkpoints to cloud storage (AWS, GCP, etc.). In these scenarios, it is difficult to manage symlinks because they are often an all-or-nothing upload -- i.e. we cannot choose which symlinks to upload without being highly prescriptive on upload.Checkpoints, especially
last.ckpt
, are critical for resuming runs, fine-tuning, etc. So we often want to back these up. However, whenlast.ckpt
is a symlink, the backup process to cloud becomes much more involved.Pitch
Add option `save_last=copy', where we save a copy of the last checkpoint
Alternatives
No response
Additional context
No response
cc @Borda @carmocca @awaelchli
The text was updated successfully, but these errors were encountered: