-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pull: clones repositories for imported files #9738
Comments
Thanks for the issue @peper0. The current behavior of |
@dberenbaum what about the |
Yes, ideally cc @efiop |
If you are open to a hacky workaround for now, you could make a dvc stage that does |
@dberenbaum Yes, that's the direction that I'm going to migrate. But it has considerable drawbacks, like no support for |
@peper0 If your stage cmd looks like |
Description
dvc pull
clones repositories from which files were imported, even though they are cached (havecache: true
implicitly or explicitly).Reproduce
At step 5 the repository is being cloned.
Expected
I expect data to be pushed to the remote in
dvc push
and pulled from the remote indvc pull
since the data is cached by default without accessing the git repository it was imported from (unlessdvc update
is called).This is a big problem, since the git repo may be not accessible when
dvc pull
is called (e.g. when it is called by CI server). Moreover, it takes a lot of time if data is imported from several repositories with some large ones among them.In my understanding, outputs are synced with the repository only in
dvc update
anddvc import
. Not atdvc pull
ordvc repro
. Therefore I don't see why the repo would need to be accessible when callingdvc pull
Environment information
Output of
dvc doctor
:The text was updated successfully, but these errors were encountered: