-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show missing data for pushed experiments #81
Comments
It would be useful to have a way to show what data is missing from the remote for any given experiment. I don't think we could show this for all experiments in the DVC CLI, but possibly in VS Code and definitely in Studio it would be useful to see. @mattseddon Let me know if I understood your request correctly. It might be too expensive to check the remote for every local experiment in VS Code by default. On the other hand, Studio has a more static state and might be able to show missing data for every experiment/commit. |
Definite value add/collaboration multiplier for Studio. This would be another way to address iterative/vscode-dvc#1769 by using the three products. |
IMO DVC/VS Code can reflect remote status for the current workspace but I would only expect Studio to show it for all experiments and commits. |
@tapadipti @shcheklein Should we move this to a Studio feature request? |
yes, not sure what is the user scenario and idea in Studio tbh, but I'm fine to move it |
@mattseddon First raised the issue, but IMO the idea would be to show some indicator for files which can't be found in the remote. It could look similar to how we show commits that failed to parse, except for specific file columns. By showing in Studio what's missing from the commit, we make it more obvious to users that something is missing and what it is. For example, the recent issue you were helping with @shcheklein where the user updated the |
From what I know, Studio does not try to access the remote files, except for files with metrics/params. @iterative/studio-backend can confirm. Is this issue about Studio checking access to all the files and displaying whether or not each file was found? |
Yes, I think it should be like running |
How important is this going to be - do we expect the user to want to download those files at some point? |
I don't think it's p1, but I agree with @mattseddon above:
Re:
Not from Studio. I think it saves the user hassle in knowing which commits are "complete" and whether they or their colleague forgot to push something, or which old commits have been garbage collected. I mentioned above that I think it would have been useful for #75 (comment), for example. There was confusion about whether model 320 or model 1280 was expected (only model 1280 was pushed). With this warning in Studio, the user would have been alerted that DVC saw model 320 as missing in the latest commit, and while they probably would have still been confused why DVC expected model 320, it would have saved time in narrowing down the problem. |
This would basically take care of the "pushed" columns in this request: https://discord.com/channels/485586884165107732/563406153334128681/1085582627806920714. It also relates to iterative/dvc#5369 (checking that everything is up to date without having to pull anything). |
I have continued working on this (me = Le_Olleaux on Discord). There are several complications to this matter, when it comes to frozen stages, import stages, sub-pipelines etc.. But, using a similar approach like the It checks for clean and pushed deps and outs, changed commands, and invalidates stages downstream to put them into a "grey/unknown" state. It supports multiple pipelines, specifying targets, ignoring deps-checking on import stages (which can be slow). Adding a simple check and proper return statements should enable a Right now, we are using this tool a as pre-commit hook (https://pre-commit.com/) to auto-generate documentation about our pipeline states on each commit. I think it is a feature that belongs to DVC rather than Studio. |
@ostromann Could you please add that comment to iterative/dvc#5369? It is in our plans to add something like this DVC, so it would be great to hear your thoughts.
Yes, sorry for any confusion. There's no intention to limit this functionality to Studio. The reason I mentioned it here is because of my comment above:
In Studio, it can be helpful to show a similar status check for every commit in your repo, so that you are assured that each commit is "clean" without having to pull and run a DVC command, and you can quickly spot issues with older commits that have some "unclean" status. |
I think this is important for model registry artifacts. Right now, I can register a model version and copy the |
Thanks to @dberenbaum for pointing me to this. I have a similar need for cache in https://github.com/iterative/studio/pull/6541, so thinking that we could introduce
to suit both needs in CLI and API. |
@dberenbaum it might not fall under this issue but we need the same thing for data, right? Should be covered under https://github.com/iterative/studio/issues/5050 / iterative/dvc#9074?
Originally posted by @mattseddon in https://github.com/iterative/studio/issues/4940#issuecomment-1457461091
The text was updated successfully, but these errors were encountered: