Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove: granular support #5772

Closed
daavoo opened this issue Apr 6, 2021 · 3 comments
Closed

remove: granular support #5772

daavoo opened this issue Apr 6, 2021 · 3 comments
Labels
question I have a question?

Comments

@daavoo
Copy link
Contributor

daavoo commented Apr 6, 2021

After implementing and reviewing #5757 we come to the conclusion that we should further discuss about how to approach adding "granular support" to dvc remove or even if it is needed.


Context

Given the following example yaml:

train:
  cmd: python train.py data.csv
  deps:
    - data.csv
    - train.py
  outs:
    - logs
    - model.h5
$ ls
dvc.lock  dvc.yaml  data.csv  datacsv.dvc  logs  model.h5  train.py

With #5757 you can now run (previously raised StageFileDoesNotExistError):

dvc remove model.h5

And successfully remove only that output file.

$ ls
dvc.lock  dvc.yaml  data.csv  data.csv.dvc  logs  train.py

Howeve, in the same scenario, if we have several output files under logs:

$ ls logs
logs.csv logs.json

And run:

dvc remove logs.json

The current behavior (after #5757) will remove the full logs directory. See test case:

def test_remove_file_in_subdir_as_target(tmp_dir, scm, dvc):


Discussion

Given the previous context, the question that arise is:

  • Is this (from an user perspective) the expected behavior?

And further:

If this is the expected behavior, should we:

A) Leave the implementation as it is.
B) Add a warning to inform users that might not expect the bevaiour
C) other ideas

On the other hand, if this is not the expected behavior, should we:

A) Prevent the user from running the command (i.e. raising an exception)
B) Add support for granular remove, so that dvc remove logs.json only removes that specific file and not the full folder.
C) other ideas

@daavoo
Copy link
Contributor Author

daavoo commented Apr 6, 2021

Given the previous context, the question that arise is:

* Is this (from an user perspective) the expected behavior?

My personal opinion is that this is not the behavior I would expect.

On the other hand, if this is not the expected behavior, should we:

A) Prevent the user from running the command (i.e. raising an exception)
B) Add support for granular remove, so that dvc remove logs.json only removes that specific file and not the full folder.
C) other ideas

I think i would go with A and also include, as part of the exception message, an example command to explicitly remove the full directory (i.e. "You can run dvc remove logs in order to remove the full directory")

@skshetry
Copy link
Member

skshetry commented Apr 6, 2021

cc'ing @dberenbaum

@skshetry skshetry added the question I have a question? label Apr 6, 2021
@dberenbaum
Copy link
Collaborator

Agree with @daavoo. Including a suggestion in the exception message is definitely a good idea. The message probably also needs to explain the issue first (something like "DVC is tracking logs and cannot remove a subset of files within that directory").

daavoo added a commit to daavoo/dvc that referenced this issue Apr 8, 2021
@mattseddon mattseddon closed this as not planned Won't fix, can't repro, duplicate, stale Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question I have a question?
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants