Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc status: import stages takes long time #9304

Closed
ostromann opened this issue Apr 4, 2023 · 2 comments
Closed

dvc status: import stages takes long time #9304

ostromann opened this issue Apr 4, 2023 · 2 comments
Labels
A: status Related to the dvc diff/list/status awaiting response we are waiting for your reply, please respond! :) optimize Optimizes DVC performance improvement over resource / time consuming tasks

Comments

@ostromann
Copy link

Bug Report

dvc status: import stages take very long time

Description

We have a dataset repository, which contains a processed version of OpenImages. Overall, we have 17 archive files of ~2 GB size each, containing training, validation and test data. We also have a model repository which imports these archive files and some additional metadata files.

Example dvc import file test.tar.gz.dvc:

md5: 44575b5d08348d3ff3d7fe3618828004
frozen: true
deps:
- path: data/processed/tar/test.tar.gz
  repo:
    url: ssh://git@<our_gitlab>/openimages.git
    rev: v7-1.0.0
    rev_lock: 36552956a6d044029e02a7066f461bff04ab40eb
outs:
- md5: b5edd31da7bdcca297733ea3964a76c7
  size: 2055081698
  path: test.tar.gz

Of these files we have 17 (15 for train, 1 val, 1 test).

Running dvc status takes about a minute due to these import stages. As one can see in the log, repeatedly checking out the data-repo is what consumes time.

We do think, we want to have individual imports in order to allow for running tests on subsets of the data. If we e.g. instead import the entire tar directory, dvc status takes 10 sec, which I find acceptable.

Example dvc import file tar.dvc:

md5: 24933d28dffbf53a2c564e240f2748b3
frozen: true
deps:
- path: data/processed/tar
  repo:
    url: ssh://git@<our_gitlab>/openimages.git
    rev: v7-1.0.0
    rev_lock: 36552956a6d044029e02a7066f461bff04ab40eb
outs:
- md5: 4b5ad079448147f4ec6c77e245ee89cc.dir
  size: 35954704921
  nfiles: 55

Running dvc status -c takes a few seconds in either case.

Reproduce

Have one data repo with several large files. Import these files into another repo. Run dvc stats

Expected

I would expect the check to happen faster. E.g. by collecting imports from the same repo first and checking out the data-repo only once.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.50.0 (pip)
-------------------------
Platform: Python 3.9.16 on Linux-4.15.0-197-generic-x86_64-with-glibc2.27
Subprojects:
        dvc_data = 0.44.1
        dvc_objects = 0.21.1
        dvc_render = 0.3.1
        dvc_task = 0.2.0
        scmrepo = 0.1.15
Supports:
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2023.1.0, boto3 = 1.24.59)
Cache types: reflink, hardlink, symlink
Cache directory: btrfs on /dev/bcache0
Caches: local
Remotes: s3
Workspace directory: btrfs on /dev/bcache0
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/c48378976d67136a7899849e475ea512

Additional Information (if any):

Output of dvc status

$ dvc status -v
2023-04-04 08:58:24,633 DEBUG: v2.50.0 (pip), CPython 3.9.16 on Linux-4.15.0-197-generic-x86_64-with-glibc2.27
2023-04-04 08:58:24,633 DEBUG: command: <my_home>/.conda/envs/py3.9/bin/dvc status -v
2023-04-04 08:58:25,037 WARNING: stage: 'create_tfrecords' is frozen. Its dependencies are not going to be shown in the status output.
2023-04-04 08:58:25,117 DEBUG: built tree 'object b806472486703f4823be96f5440eb212.dir'                                                                                                                        
2023-04-04 08:58:25,175 DEBUG: built tree 'object b806472486703f4823be96f5440eb212.dir'                                                                                                                        
2023-04-04 08:58:25,179 DEBUG: built tree 'object 93df3acddf9fd38998e589ca82940281.dir'                                                                                                                        
2023-04-04 08:58:25,182 DEBUG: built tree 'object 1a9444c33cda52b9bb97b449e130f4f1.dir'                                                                                                                        
2023-04-04 08:58:25,241 DEBUG: built tree 'object b806472486703f4823be96f5440eb212.dir'                                                                                                                        
2023-04-04 08:58:25,245 DEBUG: built tree 'object fbc47a1b0fe2d81053866d27647c2fce.dir'                                                                                                                        
2023-04-04 08:58:25,249 DEBUG: built tree 'object 307f8833e78b20ffdd18bd65b3938db5.dir'                                                                                                                        
2023-04-04 08:58:25,256 DEBUG: built tree 'object 1a9444c33cda52b9bb97b449e130f4f1.dir'                                                                                                                        
2023-04-04 08:58:25,263 DEBUG: built tree 'object 307f8833e78b20ffdd18bd65b3938db5.dir'                                                                                                                        
2023-04-04 08:58:25,269 DEBUG: built tree 'object 93df3acddf9fd38998e589ca82940281.dir'                                                                                                                        
2023-04-04 08:58:25,274 DEBUG: built tree 'object 5e50969ddab7056c3515afb45115f201.dir'                                                                                                                        
2023-04-04 08:58:25,282 DEBUG: built tree 'object fbc47a1b0fe2d81053866d27647c2fce.dir'                                                                                                                        
2023-04-04 08:58:25,287 DEBUG: built tree 'object 3a2c1028ca0f0e6e28fe46eb2d60a6e8.dir'                                                                                                                        
2023-04-04 08:58:25,293 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:58:25,293 DEBUG: erepo: git clone 'ssh://git@<our_gitlablab>openimages.git' to a temporary dir
2023-04-04 08:58:29,197 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]                                                                              
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:58:29,197 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:58:32,623 DEBUG: Computed stage: 'data/cropped_data/class-descriptions-filtered.csv.dvc' md5: 'af670068a3c161b1590d11aaf7a20f71'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/class-descriptions-filtered.csv.dvc' md5: 'af670068a3c161b1590d11aaf7a20f71'
2023-04-04 08:58:32,629 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:58:32,755 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:58:32,756 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:58:36,900 DEBUG: Computed stage: 'data/cropped_data/tar/train_0.tar.gz.dvc' md5: '1b49b258899c34e0616d436b44a670e8'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/train_0.tar.gz.dvc' md5: '1b49b258899c34e0616d436b44a670e8'
2023-04-04 08:58:36,906 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:58:36,999 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:58:37,000 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:58:39,192 DEBUG: Computed stage: 'data/cropped_data/tar/train_1.tar.gz.dvc' md5: 'bcc59c0501e4745de19b68e971eb797a'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/train_1.tar.gz.dvc' md5: 'bcc59c0501e4745de19b68e971eb797a'
2023-04-04 08:58:39,197 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:58:39,327 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:58:39,328 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:58:41,478 DEBUG: Computed stage: 'data/cropped_data/tar/train_2.tar.gz.dvc' md5: 'b6702704ed87f91d28fcf2b68372eded'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/train_2.tar.gz.dvc' md5: 'b6702704ed87f91d28fcf2b68372eded'
2023-04-04 08:58:41,484 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:58:41,575 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:58:41,575 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:58:43,720 DEBUG: Computed stage: 'data/cropped_data/tar/train_3.tar.gz.dvc' md5: 'a83261f83c51621fb70f57468f3ab12c'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/train_3.tar.gz.dvc' md5: 'a83261f83c51621fb70f57468f3ab12c'
2023-04-04 08:58:43,726 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:58:43,858 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:58:43,858 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:58:45,973 DEBUG: Computed stage: 'data/cropped_data/tar/train_4.tar.gz.dvc' md5: '36763fb7184f7c1883383cd6fa2fa39a'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/train_4.tar.gz.dvc' md5: '36763fb7184f7c1883383cd6fa2fa39a'
2023-04-04 08:58:45,976 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:58:46,073 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:58:46,073 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:58:48,257 DEBUG: Computed stage: 'data/cropped_data/tar/train_5.tar.gz.dvc' md5: '0c93eb5909af82f36b860c4ccabb11cd'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/train_5.tar.gz.dvc' md5: '0c93eb5909af82f36b860c4ccabb11cd'
2023-04-04 08:58:48,261 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:58:48,397 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:58:48,397 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:58:50,581 DEBUG: Computed stage: 'data/cropped_data/tar/train_6.tar.gz.dvc' md5: '35c4a9cdcadc71cb62d59b77427a1155'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/train_6.tar.gz.dvc' md5: '35c4a9cdcadc71cb62d59b77427a1155'
2023-04-04 08:58:50,587 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:58:50,683 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:58:50,683 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:58:52,852 DEBUG: Computed stage: 'data/cropped_data/tar/train_7.tar.gz.dvc' md5: '3365050b21fec0a5dddba4bb7fc4fc29'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/train_7.tar.gz.dvc' md5: '3365050b21fec0a5dddba4bb7fc4fc29'
2023-04-04 08:58:52,857 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:58:52,991 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:58:52,992 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:58:55,161 DEBUG: Computed stage: 'data/cropped_data/tar/train_8.tar.gz.dvc' md5: '9071dd0874bafeb79cee0a668d9856b2'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/train_8.tar.gz.dvc' md5: '9071dd0874bafeb79cee0a668d9856b2'
2023-04-04 08:58:55,166 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:58:55,260 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:58:55,260 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:58:57,402 DEBUG: Computed stage: 'data/cropped_data/tar/train_9.tar.gz.dvc' md5: 'c0a61bef3d1c1809736b29b1bc1b6d79'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/train_9.tar.gz.dvc' md5: 'c0a61bef3d1c1809736b29b1bc1b6d79'
2023-04-04 08:58:57,406 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:58:57,538 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:58:57,539 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:58:59,639 DEBUG: Computed stage: 'data/cropped_data/tar/train_a.tar.gz.dvc' md5: '537cc122f80040d1bbb9207e65648760'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/train_a.tar.gz.dvc' md5: '537cc122f80040d1bbb9207e65648760'
2023-04-04 08:58:59,644 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:58:59,730 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:58:59,730 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:59:01,838 DEBUG: Computed stage: 'data/cropped_data/tar/train_b.tar.gz.dvc' md5: '0be3517c90880d99c4f1648fc17c5ba9'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/train_b.tar.gz.dvc' md5: '0be3517c90880d99c4f1648fc17c5ba9'
2023-04-04 08:59:01,844 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:59:01,979 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:59:01,980 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:59:04,081 DEBUG: Computed stage: 'data/cropped_data/tar/train_c.tar.gz.dvc' md5: '85965b955290a6ffeb5e95e8a434e4bd'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/train_c.tar.gz.dvc' md5: '85965b955290a6ffeb5e95e8a434e4bd'
2023-04-04 08:59:04,087 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:59:04,179 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:59:04,179 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:59:06,392 DEBUG: Computed stage: 'data/cropped_data/tar/train_d.tar.gz.dvc' md5: 'f6f47b9bf6ad777bd13a92ac3e1e13b8'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/train_d.tar.gz.dvc' md5: 'f6f47b9bf6ad777bd13a92ac3e1e13b8'
2023-04-04 08:59:06,398 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:59:06,490 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:59:06,491 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:59:08,642 DEBUG: Computed stage: 'data/cropped_data/tar/train_e.tar.gz.dvc' md5: 'c184aa61f6854e56019b3f002f3d8aca'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/train_e.tar.gz.dvc' md5: 'c184aa61f6854e56019b3f002f3d8aca'
2023-04-04 08:59:08,646 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:59:08,743 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:59:08,743 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:59:11,177 DEBUG: Computed stage: 'data/cropped_data/tar/train_f.tar.gz.dvc' md5: 'a113cf590c2cfae92f21ce4f3676a7ad'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/train_f.tar.gz.dvc' md5: 'a113cf590c2cfae92f21ce4f3676a7ad'
2023-04-04 08:59:11,183 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:59:11,276 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:59:11,277 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:59:13,364 DEBUG: Computed stage: 'data/cropped_data/tar/validation.tar.gz.dvc' md5: '4516c1bdbee4d36788b375a3a87b3ce0'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/validation.tar.gz.dvc' md5: '4516c1bdbee4d36788b375a3a87b3ce0'
2023-04-04 08:59:13,370 DEBUG: Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>openimages.git@36552956a6d044029e02a7066f461bff04ab40eb
2023-04-04 08:59:13,517 DEBUG: Creating external repo ssh://git@<our_gitlablab>[email protected]
DEBUG:dvc.external_repo:Creating external repo ssh://git@<our_gitlablab>[email protected]
2023-04-04 08:59:13,517 DEBUG: erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
DEBUG:dvc.external_repo:erepo: git pull 'ssh://git@<our_gitlablab>openimages.git'
2023-04-04 08:59:15,609 DEBUG: Computed stage: 'data/cropped_data/tar/test.tar.gz.dvc' md5: '44575b5d08348d3ff3d7fe3618828004'
DEBUG:dvc.stage:Computed stage: 'data/cropped_data/tar/test.tar.gz.dvc' md5: '44575b5d08348d3ff3d7fe3618828004'
2023-04-04 08:59:15,615 DEBUG: built tree 'object 5e50969ddab7056c3515afb45115f201.dir'                                                                                                                        
DEBUG:dvc_data.hashfile.build:built tree 'object 5e50969ddab7056c3515afb45115f201.dir'
2023-04-04 08:59:15,622 DEBUG: built tree 'object 3a2c1028ca0f0e6e28fe46eb2d60a6e8.dir'                                                                                                                        
DEBUG:dvc_data.hashfile.build:built tree 'object 3a2c1028ca0f0e6e28fe46eb2d60a6e8.dir'
2023-04-04 08:59:15,748 DEBUG: built tree 'object d62c891ced2a46a088a80ac03a4f01f3.dir'                                                                                                                        
DEBUG:dvc_data.hashfile.build:built tree 'object d62c891ced2a46a088a80ac03a4f01f3.dir'
2023-04-04 08:59:15,756 DEBUG: built tree 'object d62c891ced2a46a088a80ac03a4f01f3.dir'                                                                                                                        
DEBUG:dvc_data.hashfile.build:built tree 'object d62c891ced2a46a088a80ac03a4f01f3.dir'
2023-04-04 08:59:15,770 DEBUG: Analytics is disabled.
DEBUG:dvc.analytics:Analytics is disabled.

Output of dvc status -c

2023-04-04 14:32:18,132 DEBUG: v2.50.0 (pip), CPython 3.9.16 on Linux-4.15.0-197-generic-x86_64-with-glibc2.27
2023-04-04 14:32:18,132 DEBUG: command: <my_home>/.conda/envs/py3.9/bin/dvc status -v -c
2023-04-04 14:32:18,551 DEBUG: Preparing to collect status from 'model-repo'
2023-04-04 14:32:18,552 DEBUG: Collecting status from 'model-repo'
2023-04-04 14:32:18,554 DEBUG: Querying 6 oids via object_exists                                                                                                                                               
2023-04-04 14:32:18,793 DEBUG: Querying 0 oids via object_exists                                                                                                                                               
2023-04-04 14:32:18,962 DEBUG: Estimated remote size: 4096 files                                                                                                                                               
2023-04-04 14:32:18,962 DEBUG: Querying '37' oids via traverse                                                                                                                                                 
2023-04-04 14:32:21,778 DEBUG: Preparing to collect status from <our_shared_cache>
2023-04-04 14:32:21,778 DEBUG: Collecting status from <our_shared_cache>
Cache and remote 's3_remote' are in sync.                                                                                                                                                                     
2023-04-04 14:32:21,783 DEBUG: Analytics is disabled.
@aguschin aguschin added performance improvement over resource / time consuming tasks optimize Optimizes DVC A: status Related to the dvc diff/list/status labels Apr 5, 2023
@aguschin
Copy link
Contributor

aguschin commented Apr 5, 2023

Thanks for reporting this! I'll take a look why it happens.

@dberenbaum
Copy link
Collaborator

We do think, we want to have individual imports in order to allow for running tests on subsets of the data.

Could you explain more? What operations are you doing at a granular level that won't work if you track all of data/processed/tar as one output? It's a valid request regardless, but just trying to get more info and wondering if there's any other workarounds.

@dberenbaum dberenbaum added the awaiting response we are waiting for your reply, please respond! :) label Jul 4, 2023
@efiop efiop closed this as completed Jul 17, 2023
@efiop efiop closed this as not planned Won't fix, can't repro, duplicate, stale Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: status Related to the dvc diff/list/status awaiting response we are waiting for your reply, please respond! :) optimize Optimizes DVC performance improvement over resource / time consuming tasks
Projects
None yet
Development

No branches or pull requests

4 participants