Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output: support version ID #8223

Merged
merged 1 commit into from
Sep 1, 2022
Merged

output: support version ID #8223

merged 1 commit into from
Sep 1, 2022

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Sep 1, 2022

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

related to #8164

This PR does not address import behavior. When using dvc import-url azure://container/file?versionid=1234, the pre-existing behavior will still occur, so the resulting import will be permanently pinned to azure://container/file?versionid=1234. dvc update will not check for a new version of azure://container/file.

The only import related change is that the version_id will be captured in the .dvc file like:

md5: b332f7591398ef3d5e32c97f2ff689f9
frozen: true
deps:
- etag: '0x8DA79D118EC7839'
  size: 2
  version_id: '2022-08-09T06:33:06.7654238Z'
  path: azure://test-container/dir/file?versionid=2022-08-09T06:33:06.7654238Z
outs:
- md5: 6654c734ccab8f440ff0825eb443dc7f
  size: 2
  path: file

(whereas the old .dvc file would just contain path: azure://container/dir/file?versionid=2022-08-09T06:33:06.7654238Z and no version_id field)

@pmrowla pmrowla self-assigned this Sep 1, 2022
@pmrowla pmrowla requested a review from efiop September 1, 2022 07:03
@pmrowla pmrowla added A: data-management Related to dvc add/checkout/commit/move/remove enhancement Enhances DVC labels Sep 1, 2022
dvc/output.py Outdated Show resolved Hide resolved
dvc/output.py Outdated Show resolved Hide resolved
Comment on lines +507 to +497
def changed_meta(self) -> bool:
if self.fs.version_aware and self.meta.version_id:
return self.meta.version_id == self.get_meta().version_id
return False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently unused but will be needed for updating imports

@pmrowla pmrowla force-pushed the output-version branch 2 times, most recently from 4064bb6 to a124f08 Compare September 1, 2022 07:30
@pmrowla pmrowla marked this pull request as draft September 1, 2022 10:18
@pmrowla pmrowla force-pushed the output-version branch 2 times, most recently from 2e97e3c to 64d1ed8 Compare September 1, 2022 10:38
@pmrowla
Copy link
Contributor Author

pmrowla commented Sep 1, 2022

After discussion with @efiop we concluded that

  • explicitly versioned paths should stay explicitly versioned paths
  • explicitly unversioned paths should stay unversioned
  • output.fs_path will contain whatever path the user gave us
  • we should treat captured version IDs purely as metadata (and should never be merging path+version_id in Output behavior)

This means that these two cases in a .dvc file are not the same thing:

path: s3://bucket/foo
version_id: 1234
path: s3://bucket/foo?versionid=1234
version_id: 1234

In the first case, I am tracking the explicit path s3://bucket/foo, and at the time I tracked it the version ID was 1234. changed_meta() will return True if the file is ever updated on S3, since S3 will then return a new "latest version ID" for foo.

For calls like out.exists() we would end up checking that foo still exists in the bucket. We would not be checking to see whether or not version 1234 of foo still exists. So in the event that a user deleted foo (but did not purge old versions), out.exists() will return False.

In the second case, I am tracking the explicit path s3://bucket/foo?versionid=1234. changed_meta() should never return True for this case, since I am tracking an explicitly version-pinned file (foo?versionid=1234).

For calls like out.exists() we would check that the specific version 1234 still exists in the bucket. If that version has been purged, out.exists() would return False, even a current version of foo is still present in the bucket.

@dberenbaum this seems like something which will not be obvious to users at all but we will eventually have to document/explain it, please take a look and make sure you're on the same page for now

@pmrowla pmrowla marked this pull request as ready for review September 1, 2022 10:46
Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pmrowla ! πŸ™

@efiop efiop merged commit 660c17f into iterative:main Sep 1, 2022
@pmrowla pmrowla deleted the output-version branch September 2, 2022 03:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-management Related to dvc add/checkout/commit/move/remove enhancement Enhances DVC
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants