Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote: separate worktree vs version_aware behavior #8634

Merged
merged 2 commits into from
Nov 29, 2022

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Nov 28, 2022

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Will close #8409

  • worktree = true now implies version_aware = true (a config error will be raised if a user explicitly configures a remote with worktree = true and version_aware = false)
  • version_aware = true (and worktree = false) will result in pushing human-readable filenames to a cloud-versioned remote. DVC will not set DELETE flags and will not update the latest version of a file on the remote if an older version of the file has has already been pushed
  • worktree = true will force dvc push to ensure that the "latest" version of a remote always reflects the pushed workspace. DVC will set DELETE flags for files that do not exist in the current workspace, and will force pushing an updated copy of a file if the latest version on the remote does not match what is in the user's workspace (even if a previously pushed version ID already exists in the .dvc file)
    • DVC will attempt to reduce pushing of duplicate copies when possible by checking etags (i.e. if the latest version of a file does not have the same version ID as the one in my .dvc file, but the etags match, DVC will not push another copy of the file)

cloud-versioned fetch/pull behavior is unchanged and essentially does the same thing regardless of whether or not worktree is set - DVC will always fetch the version IDs specified in the .dvc file.

@pmrowla pmrowla added A: data-sync Related to dvc get/fetch/import/pull/push A: cloud-versioning Related to cloud-versioned remotes labels Nov 28, 2022
@pmrowla pmrowla self-assigned this Nov 28, 2022
@pmrowla pmrowla requested a review from dberenbaum November 28, 2022 07:17
@codecov
Copy link

codecov bot commented Nov 28, 2022

Codecov Report

Base: 93.79% // Head: 94.17% // Increases project coverage by +0.37% πŸŽ‰

Coverage data is based on head (6a75f3c) compared to base (e95ae56).
Patch has no changes to coverable lines.

❗ Current head 6a75f3c differs from pull request most recent head 3e934d1. Consider uploading reports for the commit 3e934d1 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8634      +/-   ##
==========================================
+ Coverage   93.79%   94.17%   +0.37%     
==========================================
  Files         432      432              
  Lines       33141    33055      -86     
  Branches     4661     4638      -23     
==========================================
+ Hits        31084    31128      +44     
+ Misses       1602     1501     -101     
+ Partials      455      426      -29     
Impacted Files Coverage Ξ”
dvc/repo/experiments/queue/utils.py 77.41% <0.00%> (-5.51%) ⬇️
dvc/repo/experiments/queue/base.py 85.41% <0.00%> (-0.64%) ⬇️
dvc/repo/experiments/queue/workspace.py 81.45% <0.00%> (-0.30%) ⬇️
dvc/repo/experiments/refs.py 90.00% <0.00%> (-0.25%) ⬇️
dvc/repo/experiments/queue/tempdir.py 91.30% <0.00%> (-0.19%) ⬇️
dvc/repo/index.py 95.08% <0.00%> (-0.05%) ⬇️
tests/unit/test_rwlock.py 100.00% <0.00%> (ΓΈ)
dvc/repo/experiments/remove.py 95.23% <0.00%> (ΓΈ)
dvc/repo/experiments/queue/tasks.py 96.07% <0.00%> (ΓΈ)
tests/func/experiments/test_show.py 98.81% <0.00%> (ΓΈ)
... and 30 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

β˜” View full report at Codecov.
πŸ“’ Do you have feedback about the report comment? Let us know in this issue.

@dberenbaum
Copy link
Collaborator

  • DVC will attempt to reduce pushing of duplicate copies when possible by checking etags (i.e. if the latest version of a file does not have the same version ID as the one in my .dvc file, but the etags match, DVC will not push another copy of the file)

Not a blocker, but a possible optimization: If there is a version ID in the .dvc file, it implies that this data has already been pushed. Can we assume no push is needed without checking the etags?

@dberenbaum
Copy link
Collaborator

  • DVC will set DELETE flags for files that do not exist in the current workspace,

I don't see DELETE flags being set for files on the cloud but missing in the workspace:

# Set up remote
$ dvc remote add -f -d s3versioned s3://dave-sandbox-versioning/worktree-dir
$ dvc remote modify s3versioned version_aware true
$ dvc remote modify s3versioned worktree true
# Make directory files
$ mkdir dir
$ echo image1 > dir/image1
$ echo image2 > dir/image2
# Track and push
$ dvc add dir
$ dvc push
$ git add .
$ git commit -m "add and push dir"
# Remove a file from dir
$ rm dir/image1
$ dvc add dir
# Push updates
$ dvc push
Everything is up to date.
$ aws s3 ls s3://dave-sandbox-versioning/worktree-dir/dir/
2022-11-28 10:27:14          7 image1
2022-11-28 10:27:14          7 image2

@pmrowla
Copy link
Contributor Author

pmrowla commented Nov 29, 2022

Not a blocker, but a possible optimization: If there is a version ID in the .dvc file, it implies that this data has already been pushed. Can we assume no push is needed without checking the etags?

Should have clarified, the etag check is only done with worktree = true. For the version_aware push without worktree, we make the assumption based on whether or not the .dvc file already has a version_id

I'll double check what's going on with deletes

@pmrowla
Copy link
Contributor Author

pmrowla commented Nov 29, 2022

@dberenbaum delete issue should be resolved now

@pmrowla pmrowla merged commit 25b1ef9 into iterative:main Nov 29, 2022
@pmrowla pmrowla deleted the worktree-push-diff branch November 29, 2022 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: cloud-versioning Related to cloud-versioned remotes A: data-sync Related to dvc get/fetch/import/pull/push
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cloud versioning: clarify config options
2 participants