-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
worktree add: dvc add
removes valid metadata for files which are already tracked
#8293
Comments
In this scenario where there is a cloud-versioned remote and no version id in the .dvc file, should If Maybe worth discussing with @dmpetrov . |
I'm not sure what the behavior should be for fetch/pull with no etag or version ID, but that's a separate question from this issue. |
Sorry, I missed that this was without modifying foo. |
dvc add
always generates new stages, overwrites any existing stages and dumps a new .dvc file. For regular DVC workflows this is not a problem, but for cloud versioning/worktree remotes this causes a loss of metadata.consider this workflow where I add one file and then push it:
If I were to run
dvc pull
now, DVC would pull that specific version of the file (2022-09-15T07:45:55.1712903Z
) from the remote.If I now run
dvc add
without modifyingfoo
, I get a .dvc file that does not contain any version/etag metadata, even thoughfoo
has not changed, and the metadata for the version I already pushed should still be valid:Now if I run
dvc pull
, DVC will just pull the latest version offoo
from the remote, which may not necessarily be the version that I initially pushed (since the .dvc file no longer contains a known version_id to pull)Note that if I had modified foo, this metadata-clearing behavior is expected, since the newly modified version has not been pushed (and we do not know what the versionid/etag for that new version will be until we push).
The same problem applies to directories, where individual file entries that have not changed still lose metadata on
dvc add
.For the worktree workflow,
dvc add
cannot just remove existing stages and re-write them.dvc add
actually needs to check for outputs that are already tracked, and preserve the old metadata for file entries that have not changed.For individual files, if the output hash (oid) has not changed, we need to preserve the existing metadata for the entire output.
For dirs, we need to do an actual tree merge, where we check the oids for each entry in the new tree. If we have meta for that oid in the pre-existing tree, it needs to be merged into the new one.
The text was updated successfully, but these errors were encountered: