Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: add: add an --output argument #5284

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions dvc/command/add.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@ def run(self):
if len(self.args.targets) > 1 and self.args.file:
raise RecursiveAddingWhileUsingFilename()

if len(self.args.targets) != 1 and self.args.output:
logger.error(
"cannot add more than one file when using --output"
)
return 1

self.repo.add(
self.args.targets,
recursive=self.args.recursive,
Expand All @@ -26,6 +32,7 @@ def run(self):
external=self.args.external,
glob=self.args.glob,
desc=self.args.desc,
output=self.args.output,
)

except DvcException:
Expand Down Expand Up @@ -83,6 +90,13 @@ def add_parser(subparsers, parent_parser):
"This doesn't affect any DVC operations."
),
)
parser.add_argument(
"-o",
"--output",
type=str,
metavar="<path>",
help="Output path for the file, if it lives outside the repo.",
)
Comment on lines +93 to +99
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will clash with --out being introduced in #5198

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be --out, but it won't clash otherwise. They are actually the same thing :)

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jan 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. looking at this to try to understand iterative/dvc.org/pull/2101

Can someone explain what this does with an example, perhaps?

It's also not obvious to me how this is a step towards #4657 which doesn't seem to involve adding external data (but this does).

Thanks!

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jan 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed your answer @efiop but my questions remain because

They are actually the same thing

--out (from #5198) is not stand-alone, it can only be used along with --to-remote.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW should this maybe be a stand-alone, general option that add can always use (whether the target is external or not)? As disucssed in #5198 (review) πŸ™‚

parser.add_argument(
"targets", nargs="+", help="Input files/directories to add.",
).complete = completion.FILE
Expand Down
18 changes: 18 additions & 0 deletions dvc/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,24 @@ def __init__(self, output, repo=None):
)


class OutputNotCreatedError(DvcException):
"""Thrown if the specified output file could not be created.

Args:
output (unicode): path to the file
"""

def __init__(self, output):
from dvc.utils import relpath

self.output = output
super().__init__(
"Unable to create output file '{path}'".format(
path=relpath(self.output)
)
)


class StagePathAsOutputError(DvcException):
"""Thrown if directory that stage is going to be saved in is specified as
an output of another stage.
Expand Down
17 changes: 17 additions & 0 deletions dvc/repo/add.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
import logging
import os
import shutil
from typing import TYPE_CHECKING

import colorama

from ..exceptions import (
CacheLinkError,
OutputDuplicationError,
OutputNotCreatedError,
OverlappingOutputPathsError,
RecursiveAddingWhileUsingFilename,
)
Expand All @@ -30,6 +32,7 @@ def add(
recursive=False,
no_commit=False,
fname=None,
output=None,
**kwargs,
):
from dvc.utils.collections import ensure_list
Expand All @@ -38,6 +41,20 @@ def add(
raise RecursiveAddingWhileUsingFilename()

targets = ensure_list(targets)

if output:
try:
dname = os.path.dirname(output)

if dname and not os.path.isdir(dname):
os.makedirs(dname)

shutil.copy2(targets[0], output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be really slow with big files/dirs. We could be able to add it without this uneeded copy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you are checking that there is only one target in CLI, but not in API (they could be used separately), so targets[0] needs some reassurance.


targets = [output]
except OSError:
raise OutputNotCreatedError(output)

link_failures = []
stages_list = []
num_targets = len(targets)
Expand Down
35 changes: 35 additions & 0 deletions tests/func/test_add.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import os
import shutil
import stat
import tempfile
import textwrap
import time

Expand Down Expand Up @@ -59,6 +60,40 @@ def test_add(tmp_dir, dvc):
}


def test_add_output(tmp_dir, dvc):
(fd, fname) = tempfile.mkstemp()
os.write(fd, b"foo")
os.close(fd)

ret = main(["add", "--output", "foo", "bar", "baz"])
assert ret == 1

md5, _ = file_md5(fname)
stage = dvc.add(fname, output="foo")[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are actually implementing straight-to-cache functionality #4520 for local files here πŸ™‚ We have a WIP for a general straight-to-remote, where we also use -o in add, they are in no way contradictory though.


assert stage is not None

assert isinstance(stage, Stage)
assert os.path.isfile(stage.path)
assert len(stage.outs) == 1
assert len(stage.deps) == 0
assert stage.cmd is None
assert stage.outs[0].hash_info == HashInfo("md5", md5)
assert stage.md5 is None

assert load_yaml("foo.dvc") == {
"outs": [
{
"md5": "acbd18db4cc2f85cedef654fccc4a4d8",
"path": "foo",
"size": 3,
}
],
}

os.unlink(fname)
Copy link
Contributor

@efiop efiop Jan 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe let's just use tmp_path fixture or smth like that to avoid doing this manually and using file descriptors?



@pytest.mark.skipif(os.name == "nt", reason="can't set exec bit on Windows")
def test_add_executable(tmp_dir, dvc):
tmp_dir.gen("foo", "foo")
Expand Down