-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I access the model files from Studio model registry? #75
Comments
Thanks for reaching out @haimat - I'll start discussing if we can support a something like this. I've reached out to some colleagues to see if we can help you out right now, we'll get back to you. |
If you have pushed your model file to the DVC remote, you can download the trained model from any git revision with https://dvc.org/doc/command-reference/get: $ dvc get ${GIT_REPO} ${MODEL_PATH} --rev ${GIT_REV} For example: # 10-bigrams-experiments is a git tag present in the repo
$ dvc get https://github.com/iterative/example-get-started model.pkl --rev 10-bigrams-experiment When you register a new model version through Studio UI, it actually creates a git tag in the repo, so you can use it as For example, in the public demo project https://github.com/iterative/demo-bank-customer-churn which has some model versions registered: $ dvc get https://github.com/iterative/demo-bank-customer-churn .mlem/model/lightgbm-model --rev [email protected] In the command above, |
@daavoo Thanks, that could be a workaround for now indeed. I tried it right away but got a Python When I am in the DVC project's root and run Note: In my outs:
- training/${dataset}
- models/${dataset}/yolov8-${dataset}-${training.image_size}.pt I was under the impression, that with this DVC automatically tracks the |
Yes, it should be pushed. Can you try |
@daavoo Thanks, that helped. Now I get this: $ dvc list . --recursive --dvc-only | grep models
models/ski-defects-small-labels/yolov8-ski-defects-small-labels-320.pt This file is from my local DVC experiment, I |
@haimat could you please share how your CI looks like?
do you run it locally (after a pass on CI)?
I would check the |
@daavoo @shcheklein So it seems there may be a bug in DVC, or I am missing something. First I checked in the Github commit related to the model I added via Studio: - path: models/ski-defects-small-labels/yolov8-ski-defects-small-labels-1280.pt
md5: 40ab7fd5fa07b74f81a78bcef65685c6
size: 87709448 This is the file I am looking for, so it has been part of this commit in $ tree 40
40
├── 1b37696cbe1766a18296d520870b28
├── 41331c2ea39b415839c8af4d7e5624
├── 498e0aa229e6280cde397b34d79cc7
├── 7c42ee007c3db6a6547fec9af975c9
├── 82ada49323a148f50ba61be9f047ab
├── ab7fd5fa07b74f81a78bcef65685c6 <-- here it is
├── b100f4f999d7390851b9c13b620651
├── b5687a07b2f783d7c14bf1f0796835
├── c4f58f48c45904762f0ab8169b3f08
├── c8e0ea5a384b60fbb40b25ccaf3940
├── d15429e27f54d32be07b649468ac61
├── ddcbeb2970d186ba1adce6f0a3d1d1
└── e8fee04ba173d176c5fb24e03d3c92
0 directories, 13 files Howver, from my local machine I still geht this error: $ dvc get https://github.com/<company>/<repo.git> models/ski-defects-small-labels/yolov8-ski-defects-small-labels-1280.pt --rev [email protected]#1
ERROR: unexpected error - : ('models', 'ski-defects-small-labels', 'yolov8-ski-defects-small-labels-1280.pt')
$ dvc get https://github.com/<company>/<repo.git> models/ski-defects-small-labels/yolov8-ski-defects-small-labels-1280.pt --rev [email protected]#1 --show-url
ERROR: failed to show URL - The path 'models/ski-defects-small-labels/yolov8-ski-defects-small-labels-1280.pt' does not exist in the target repository 'https://github.com/<company>/<repo.git>' neither as a DVC output nor as a Git-tracked file.: [Errno 2] No such file or directory: '/models/ski-defects-small-labels/yolov8-ski-defects-small-labels-1280.pt' Is this a bug or a feature? |
@haimat it looks like a bug, but let's try to double check a few things:
Also, what version of DVC do you use? (share I've tried to reproduce it with my demo repo, all of these:
worked fine for me. Could you please try to run them (the last one might require S3 access but if it fails on trying to get to S3 I think it's fine). |
@shcheklein Sure, so first I have only one version for this model registered: That commit And that is also the one with this tag Then here is the verbose output of $ dvc get -v https://github.com/<company>/<repo.git> models/ski-defects-small-labels/yolov8-ski-defects-small-labels-1280.pt --rev [email protected]#1
2023-03-01 11:15:35,155 DEBUG: v2.43.2 (pip), CPython 3.10.6 on Linux-5.15.0-60-generic-x86_64-with-glibc2.35
2023-03-01 11:15:35,155 DEBUG: command: /home/mfb/.local/bin/dvc get -v https://github.com/<company>/<repo.git> models/ski-defects-small-labels/yolov8-ski-defects-small-labels-1280.pt --rev [email protected]#1
2023-03-01 11:15:35,266 DEBUG: Creating external repo https://github.com/<company>/<repo.git>@[email protected]#1
2023-03-01 11:15:35,266 DEBUG: erepo: git clone 'https://github.com/<company>/<repo.git>' to a temporary dir
2023-03-01 11:15:36,411 DEBUG: erepo: using shallow clone for branch '[email protected]#1'
2023-03-01 11:15:36,565 DEBUG: Removing '/data/head/dev/.aRF7iimTs2Fg8f6iY4ksnH'
2023-03-01 11:15:36,565 ERROR: unexpected error - : ('models', 'ski-defects-small-labels', 'yolov8-ski-defects-small-labels-1280.pt')
------------------------------------------------------------
Traceback (most recent call last):
File "/home/mfb/.local/lib/python3.10/site-packages/dvc_data/index/index.py", line 219, in info
entry = self[key]
File "/home/mfb/.local/lib/python3.10/site-packages/dvc_data/index/index.py", line 274, in __getitem__
return self._trie[key]
File "/home/mfb/.local/lib/python3.10/site-packages/sqltrie/pygtrie.py", line 41, in __getitem__
return self._trie[key]
File "/home/mfb/.local/lib/python3.10/site-packages/pygtrie.py", line 937, in __getitem__
node, _ = self._get_node(key_or_slice)
File "/home/mfb/.local/lib/python3.10/site-packages/pygtrie.py", line 630, in _get_node
raise KeyError(key)
KeyError: ('models', 'ski-defects-small-labels', 'yolov8-ski-defects-small-labels-1280.pt')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/mfb/.local/lib/python3.10/site-packages/dvc/cli/__init__.py", line 207, in main
ret = cmd.do_run()
File "/home/mfb/.local/lib/python3.10/site-packages/dvc/cli/command.py", line 40, in do_run
return self.run()
File "/home/mfb/.local/lib/python3.10/site-packages/dvc/commands/get.py", line 33, in run
return self._get_file_from_repo()
File "/home/mfb/.local/lib/python3.10/site-packages/dvc/commands/get.py", line 39, in _get_file_from_repo
Repo.get(
File "/home/mfb/.local/lib/python3.10/site-packages/dvc/repo/get.py", line 73, in get
fs.get(
File "/home/mfb/.local/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 626, in get
return get_file(from_info, to_info)
File "/home/mfb/.local/lib/python3.10/site-packages/dvc_objects/fs/callbacks.py", line 69, in func
return wrapped(path1, path2, **kw)
File "/home/mfb/.local/lib/python3.10/site-packages/dvc_objects/fs/callbacks.py", line 41, in wrapped
res = fn(*args, **kwargs)
File "/home/mfb/.local/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 613, in get_file
self.fs.get_file(rpath, lpath, **kwargs)
File "/home/mfb/.local/lib/python3.10/site-packages/fsspec/spec.py", line 859, in get_file
with self.open(rpath, "rb", **kwargs) as f1:
File "/home/mfb/.local/lib/python3.10/site-packages/fsspec/spec.py", line 1135, in open
f = self._open(
File "/home/mfb/.local/lib/python3.10/site-packages/dvc/fs/dvc.py", line 274, in _open
return dvc_fs.open(dvc_path, mode=mode)
File "/home/mfb/.local/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 193, in open
return self.fs.open(path, mode=mode, **kwargs)
File "/home/mfb/.local/lib/python3.10/site-packages/dvc_data/fs.py", line 72, in open
fs, fspath = self._get_fs_path(path, **kwargs)
File "/home/mfb/.local/lib/python3.10/site-packages/dvc_data/fs.py", line 46, in _get_fs_path
info = self.info(path)
File "/home/mfb/.local/lib/python3.10/site-packages/dvc_data/fs.py", line 111, in info
info = self.index.info(key)
File "/home/mfb/.local/lib/python3.10/site-packages/dvc_data/index/index.py", line 223, in info
raise FileNotFoundError from exc
FileNotFoundError
------------------------------------------------------------
2023-03-01 11:15:36,617 DEBUG: link type reflink is not available ([Errno 95] no more link types left to try out)
2023-03-01 11:15:36,617 DEBUG: Removing '/data/head/.2UE8UkWtnULiS2CWf8HTnq.tmp'
2023-03-01 11:15:36,617 DEBUG: Removing '/data/head/.2UE8UkWtnULiS2CWf8HTnq.tmp'
2023-03-01 11:15:36,617 DEBUG: Removing '/data/head/.2UE8UkWtnULiS2CWf8HTnq.tmp'
2023-03-01 11:15:36,617 DEBUG: Removing '/data/head/dev/.dvc/cache/.J5X4StrSDfEHCjwHMDcXvM.tmp'
2023-03-01 11:15:36,643 DEBUG: Version info for developers:
DVC version: 2.43.2 (pip)
-------------------------
Platform: Python 3.10.6 on Linux-5.15.0-60-generic-x86_64-with-glibc2.35
Subprojects:
dvc_data = 0.37.3
dvc_objects = 0.19.1
dvc_render = 0.1.0
dvc_task = 0.1.11
dvclive = 1.4.0
scmrepo = 0.1.7
Supports:
http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
ssh (sshfs = 2023.1.0)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p2
Caches: local
Remotes: ssh, ssh
Workspace directory: ext4 on /dev/nvme0n1p2
Repo: dvc, git
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2023-03-01 11:15:36,645 DEBUG: Analytics is enabled.
2023-03-01 11:15:36,687 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpw7hnridq']'
2023-03-01 11:15:36,689 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpw7hnridq']' To your third question, no it is not a monorepo, its just this single project there. My root # Add patterns of files dvc should ignore, which could improve
# the performance. Learn more at
# https://dvc.org/doc/user-guide/dvcignore
/data/workspace
/training No other $ find . -name ".dvcignore"
./.dvcignore DVC version info is listed above. |
@haimat thanks a lot! I'll try to reproduce it. One quick question - is there something specific about the DVC config in this repo |
No nothing, the only config is our default remote. It's our first DVC project, we don't have anything fancy here - not that I am aware of at least. |
@shcheklein btw. I'm happy to send you our DVC configs via email if that helps. |
@haimat thanks. yes, that would be helpful - my email is ivan at iterative.ai (or shcheklein at gmail). One line looks suspicious to me |
Also, could you try to run:
I've tried to reproduce the same layout that you have (still can't reproduce the bug though ⌛ ) |
@shcheklein I tried your last $ dvc get https://github.com/shcheklein/hackathon models/ski-defects-small-labels/yolov8-ski-defects-small-labels-1280.pt --rev [email protected]#1
ERROR: unexpected error - Forbidden: An error occurred (403) when calling the HeadObject operation: Forbidden As for the link types, no I don't think that I have ever changed the link type in the repo. I checked the |
I think, in this case, is just a misleading debug message by DVC |
@daavoo Yes, I'm trying to understand why it is happening though. I don't see it on my machine with a file that exists and with file that doesn't exist. anyways, @haimat thanks. There is something specific about this repo structure, or remote storage since everything else work fine. Two things to try:
Thanks for staying with me on this. |
@shcheklein Sure, here are the results: ad 1) This works, however, with one errror: $ dvc pull
WARNING: Output 'models/ski-defects-small-labels/yolov8-ski-defects-small-labels-320.pt'(stage: 'training') is missing version info. Cache for it will not be collected. Use `dvc repro` to get your pipeline up to date.
WARNING: No file hash info found for '/tmp/dvc/models/ski-defects-small-labels/yolov8-ski-defects-small-labels-320.pt'. It won't be created.
A data/ski-defects/
A data/ski-defects-small-labels/
A data/_yolo_config.yaml
A data/ski-defects-sample/
A training/ski-defects-small-labels/
A evaluation/results.json
6 files added and 1 file failed
ERROR: failed to pull data from the cloud - Checkout failed for following targets:
/tmp/dvc/models/ski-defects-small-labels/yolov8-ski-defects-small-labels-320.pt
Is your cache up to date? Interestingly it is referring to the "-320.pt" version here, not the "-1280.pt" version of the model. Seems maybe something happened with the ad 2) Does not work either. |
hmm, interesting. I'll try to do templating. It might be the reason. Good thing that we simplified it to |
@shcheklein Yes in the - path: models/ski-defects-small-labels/yolov8-ski-defects-small-labels-1280.pt
md5: 40ab7fd5fa07b74f81a78bcef65685c6
size: 87709448 That is also the file MD5 that I can see on the DVC remote file system, as mentioned earlier (40ab7fd5fa07b74f81a78bcef65685c6). The 320 version was part of some (local) experiments with DVC. Quite possible that at some point I also DVC- and & Git-pushed that file, but as said above, the checked-out project version from Git tag |
Okay, I was able to reproduce this. I think the reason might be that in the params file you have 320. Could you please confirm that? It's a bit of a question what is the right behavior here (and we'll discuss this), but from the DVC perspective there is a pipeline that is not up to date if params file is different from the Could you please confirm this? |
@shcheklein No, at the moment - and and also when this model tag has been created on Github - we have "1280" in the params file, not "320". Now I am a bit surprised, I was under the impression that these parameters are matter to change on purpose and that is why we have this But let me explain my use case, maybe I am using DVC not in the way it is supposed to be used. Say we have a dataset "ski-defects" and we need to train an object detection model for it. So our workflow is to first create a smaller sample dataset out of it, called "ski-defects-sample", then use that to locally experiment with various training parameters. Often we also train with smaller image sizes locally first, e.g. 320 pixels, in order to train faster. Then later when we are happy with the local training results, we want to train the "real" model with the full dataset and a larger image size, e.g. 1280, on our server. Note that the results of the training, i.e. the trained model, should always end up in the So I thought now with DVC+CML I could do exactly that - first I would define all required training parameters (like dataset name and image size) in the If not the way I have set up my DVC project, how else am I supposed to achieve what I have described above with DVC and CML? |
@haimat what you are describing is totally fine, and this is exactly how DVC, CML, other tools can be used. It's expected and absolutely fine to change Could do If that is not the case, it means I'll try to reproduce again and find another possible reason. |
@shcheklein The more I think of all this, the more I guess this all might be related to the various tryouts I did when I was playing around with DVC (this is our first project where we use DVC). I am willing to start from scratch though, so maybe that solves all these issues. What would be your recommended way to get 100% rid of DVC in the Github repo, and start over from scratch with all the DVC yaml files I have now? |
remove Btw, as @daavoo pointed out to me in our private conversation, it's not very clear if you actually need / want to template the model file name. It can stay the same across multiple commits, tags, branches.
This is will work better also if you have the same name (it's the same model after all, just different params). We are incorporating now metrics and plots into the Studio Model Registry page, so it will be possible to find the best model right there, or it's now possible via the project page (table with commits, metrics, etc). |
@shcheklein I have followed your instructions and created a new Github and DVC project. Unfortunately I cannot test that now in regard of this issue here, since I hit another issue now with our SSH remote for DVC: iterative/dvc-ssh#33 |
@shcheklein Heureka! After everything discussed in the other issues here and setting up the project from scratch again, I can confirm that I was able to So this issue is solved from my perspective, thanks for your support 👍 I have a few other issues and thoughts regarding the model registry and its usage in Studio in general. Should I post them here or do you want me to create another issue for that? |
@haimat hey, it's great to hear that! Let's create an issue, it's fine even it's just a feedback / thoughts. We would appreciate that a lot! We can close this one! I'm looking into some edge cases, and we'll try to improve the error message handling. |
Our workflow is as follows: We train our models for DVC projects using CML on our Github runner. After training, the new model file is moved to the repo's
/model
folder, which is tracked by DVC. So we always have the latest model version after training on our DVC remote. Now I was expecting to access these artifacts from the "Models" section in Studio. However, now I learned that I cannot directly access these trained model files from within Studio.This is a bummer, because it would be one of the main reasons for us using Studio. We want it to a) track our experiments, and b) register the models as result from these experiments. I was expecting to have a feature in Studio like "sort all experiments by this or that metric, then download the trained model from the best experiment".
Which brings me to two questions:
The text was updated successfully, but these errors were encountered: