-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] import-url
: cloud versioned imports
#8164
Conversation
Thanks, @pmrowla! Some questions and my thoughts on them:
|
@pmrowla Do you expect we can get this merged with the questions above resolved this sprint (Sep 6)? |
This currently only supports files and not directories (how to handle/store dirs requires more discussion).
We cannot enable this by default right now, because we have to explicitly enable versioning when we connect with fsspec's API. (Currently fsspec does not support any way to detect whether or not an s3 or azure bucket has versioning support enabled, and it relies on the caller telling it that the bucket supports versioning). So right now, in this PR we only use fsspec in cloud-version aware mode if we are using a DVC remote that has the What we could do is assume that if the user passes an explicit versioned path ( One option would be to add something like an optional
This is not currently implemented in this PR (due to the fsspec explicit version-aware issue from the last question), but in theory this would work without configuring a DVC remote if either:
i.e. for azure we could support the following without using the DVC
If the user does not have the right s3/az CLI environment to access the bucket (and wants to specify some non-default AWS profile or azure connection string), they would have to use the DVC remote syntax to provide specific credentials. All of this is the same as existing behavior/limitations for
I'm not sure that whether or not there is a cloud version ID matters here. This is basically the same as the If we intend to support both push/pull to and from original source location and push/pull to and from DVC remotes, we need to decide which is the default and how to tell DVC when you don't want the default One thing to remember is that we do not support push/pull to a DVC remote for DVC repo imports (#4527). DVC-import data always comes from the original source repo. I'm assuming that whatever we decide to do, ideally we want to move toward consistent/unified import/import-url behavior, so if we support both "original source" and DVC remotes for import-url, we would probably want a solution that addresses #4527 for regular import as well.
This PR uses One thing to consider is that in the potential future "DVC without Git" scenario, there will probably be actual "DVC revisions/revs" (for an entire "cloud versioned DVC repo" state) which would be a separate concept from individual file version-IDs within the same bucket. |
We can go with a
To me, it's another signal that this shouldn't depend on a remote-specific config option.
If it works without a remote as long as the appropriate environment is set (like |
With the current set of PR's this works without DVC remotes if the user provides a versioned path ( |
fbdc582
to
e895699
Compare
dvc/dependency/base.py
Outdated
current = self.version_id | ||
fs_path = self.fs.path.version_path(self.fs_path, None) | ||
updated = self.fs.info(fs_path)["version_id"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record: discussed that we can capture this during object build()
in meta
. Also might want to add something like a no_hash=True
flag to build
, to avoid computing hashes.
6a55756
to
00c96fc
Compare
this PR is going to be split into separate PRs for the base output changes and import specific changes |
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. π