-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zstash seems to archive "symlink names" but not values. #247
Comments
@TonyB9000 From https://github.com/E3SM-Project/zstash/blob/main/zstash/hpss_utils.py#L187:
If there are two hard links pointing to the same file, the result will be two separate files; we have no way to prevent that. Symbolic links get reproduced, but the file they're pointing to may be missing, as you discovered. We could add a command line option to include the real file, if that would be useful. |
I think I would make this the default behavior - and issue an error (or at least, a warning) if a symlink points to nothing. Instead of "tar cvf tarfile targetfile", I would employ the bash "realpath" function: tar cvf tarfile `realpath targetfile` or similar. There is no use I can think of for tarring-up broken links. Granted, I don't quite know how to do this "in bulk". You might need to run a separate script:
and then
|
Correction: I can think of a case where you have a directory containing symlinks, and you want to "tar it" (with other stuff) and move it to a new location on the same file system. Then, it would be reasonable to tar only the links - not the actual files. But this use-case is very unusual. I might make THAT a command-line option (tar links as links, not the files they refer to.) Do we know why zstash was designed this way (only files and hardlinks)? What was the rationale? The major problem is that a cursory inspection fo a zstash archive (consulting only the "index.db") gives the impression that files exists, when only the symlinks exist. |
Yes, that makes sense.
From a cursory search, it looks like we could run It looks like (Also note that if two symlinks point to an identical file, my understanding is that you would end up with two copies of that file in the @golaz Let us know if you have any input on this, thanks! |
The python function "os.path.realpath()" "Return the canonical path of the specified filename, eliminating any symbolic links encountered in the path (if they are supported by the operating system)." (https://docs.python.org/3/library/os.path.html). When it comes to cataloging the contents of a zstash archive, using "zstash ls" is much simpler than "zstash ls -l", but only the latter will reveal an "empty" symlink. Avoiding them up-front would be preferable, if possible. It looks like any file being referenced in the hpss_utils.py function that adds them to a tar archive would need to replace file with os.path.realpath(file). |
From @TonyB9000:
It turns out that with “zstash ls -l”, you can identify symlinks, as they list 0 filesize (and “None” for md5).
Still, if the symlink points to an actual file (where the zstash archive is created), you should (I think) tar up the real file (target), not the link.
The text was updated successfully, but these errors were encountered: