Skip to content
This repository has been archived by the owner on Dec 18, 2023. It is now read-only.

iroh get command #180

Closed
10 of 15 tasks
faassen opened this issue Oct 3, 2022 · 9 comments
Closed
10 of 15 tasks

iroh get command #180

faassen opened this issue Oct 3, 2022 · 9 comments
Assignees

Comments

@faassen
Copy link
Contributor

faassen commented Oct 3, 2022

Discussion proposal

iroh-ctl get <ipfs-path> [<path>]

  • ipfs-path can be a true IPFS path, or a CID
  • path is a filesystem path and is optional. If it’s not supplied, path is set to the CID.

Download file or directory from IPFS into path. If path already exists and is a file then it’s overwritten with the new downloaded file. If path already exists and is a directory, the command fails with an error. If path already exists, is a file and the downloaded data is a directory, that’s an error

Differences with Kubo:

  • ipfs get has an --output flag, instead we have a second optional path argument.
  • ipfs get downloads into --output if that output path already exists and is a directory. We consider that an error. The motivation is that it’s while it mirrors the behavior of Unix cp, we can’t fully mirror cp because we don’t always have the source file name or directory, and it risks creating an accidental nested directory structure when you repeatedly download.
  • ipfs get silently overwrites the directory if no --output was given. Instead of silently overwriting, we choose to present the user with an error so that the behavior of the omitted path is identical to when you supply one.
  • We don’t support compression of downloaded content; you can do that with a separate tool.

Test cases

  • get with CID -> ok
  • get with something that looks like a CID but isn't -> error
  • get with IPFS path that points to a CID -> ok
  • get with IPFS path that doesn't point to a CID -> error
  • get wrapped file
  • get unwrapped file
  • get unwrapped file, overwrites previous
  • get directory without explicit path
  • get directory with explicit path
  • get directory without explicit path, previous file/directory already there -> error
  • get directory with explicit path, previous file/directory already there -> error
  • get a file with an ipfs path: iroh get cid/a/b where b is a file
  • get a directory with an ipfs path iroh get cid/a/b where b is a directory
  • fail to get a file/directory with an ipfs path iroh get cid/a/b where b doesn't exist

Questions

  • get with CID that doesn't exist -> not sure how to test this; I tried a made up CID with Kubo and it's just hanging looking for it. Can we determine a CID doesn't exist?
@faassen faassen changed the title iroh get command iroh-ctl get command Oct 3, 2022
@faassen faassen changed the title iroh-ctl get command iroh get command Oct 4, 2022
@faassen
Copy link
Contributor Author

faassen commented Oct 4, 2022

We want progress bars, but we focus on adding this to the #179 first.

@b5
Copy link
Member

b5 commented Oct 5, 2022

Looks great to me. I'd like to advocate for one flag that will help users see just how terrible network fetching is:

--force-fetch  ignore local cache & fetch all data from the network

Often new users to IPFS will be confused by the wild swings in fetching time, and then can't reproduce their problem once they actually manage to resolve the content once (b/c cache is permanent). Adding this flag makes that conversation easier. Also, iroh developer friendliness.

@b5
Copy link
Member

b5 commented Oct 5, 2022

Full help text, updated on this branch:

iroh-get
Fetch IPFS content and write it to disk

USAGE:
    iroh get [OPTIONS] <ipfs-path> [output]

ARGS:
    <ipfs-path>    CID or CID/with/path/qualifier to get
    <output>       filesystem path to write to. Default: $CID

OPTIONS:
        --force-fetch    ignore local cache & fetch all content from the network
    -h, --help           Print help information


Download file or directory specified by <ipfs-path> from IPFS into [path]. If
path already exists and is a file then it's overwritten with the new downloaded
file. If path already exists and is a directory, the command fails with an
error. If path already exists, is a file and the downloaded data is a directory,
that's an error.

By default, the output will be written to './<ipfs-path>'.

If <ipfs-path> is already present in the iroh store, no network call will
be made.

@faassen
Copy link
Contributor Author

faassen commented Oct 11, 2022

I had a bunch of test cases I've moved into the main issue text as a task list here. My comment ended with this:

Any other tests require a refactoring: the API should not write files but instead deliver a stream of files which can then be mocked, and then the CLI actually writes the files itself. The code that actually writes the files too should also be accessible in the API, because that's a common use case for the API. The code to get the stream and write the files already lives in the API, so it should be possible to rewrite it.

Update This refactoring is starting to work now, so I can write trycmd test cases

@faassen
Copy link
Contributor Author

faassen commented Oct 12, 2022

@b5 regarding --force-fetch, I don't know how to control this yet so I'm leaving that out for now.

@b5
Copy link
Member

b5 commented Oct 12, 2022

regarding --force-fetch, I don't know how to control this yet so I'm leaving that out for now.

Absolutely. Feel free to punt on any flags that seem even remotely complex for now. We can break them out into separate issues later, and either they make the release, or they don't.

@faassen
Copy link
Contributor Author

faassen commented Oct 13, 2022

It turns out that getting unwrapped files overwriting an existing file without erroring out (like any other scenario) is harder to implement than I anticipating, and it makes me question whether it's worthwhile and whether we shouldn't simply fail in that case too. What do you think @b5?

The reason it's harder to implement is because we get a stream and we don't know in advance whether it contains just a single file or a bunch files and directories. Only in the form case when the only element is a file, should we overwrite. But knowing that requires consuming the stream. So it turns out it's easier to either always overwrite or never. I choose never overwrite as always overwrite is more dangerous and there are some tricky issues (do we remove files and directories that don't exist in what we're downloading?).

@b5
Copy link
Member

b5 commented Oct 13, 2022

It turns out that getting unwrapped files overwriting an existing file without erroring out (like any other scenario) is harder to implement than I anticipating

Yep, let's totally punt & just error out.

@faassen
Copy link
Contributor Author

faassen commented Oct 14, 2022

Clap derive by default produces this help text:

Usage: iroh get <IPFS_PATH> [OUTPUT]

Arguments:
  <IPFS_PATH>  CID or CID/with/path/qualifier to get
  [OUTPUT]     filesystem path to write to. Default: $CID

Options:
  -h, --help     Print help information
  -V, --version  Print version information

One difference here is that in our spec we have ipfs-path as opposed to IPFS_PATH. I'm looking for a way to control how clap derive generates this but no luck yet.

faassen referenced this issue in n0-computer/iroh Oct 14, 2022
This focuses on creating tests for the `iroh get` CLI command. The Iroh API is also improved along the way.

This required more work in the underlying API to make it more testable. What's tested here is everything *except* the actual IPFS behavior -- the focus is on the behavior of the CLI and API and its interactions with the filesystem.

 ## trycmd tests

In `iroh/tests/cmd` you can see quite a few `.trycmd` files. These describes the command-line interaction. Stuff behind `$ is a command, there's a special `? failed` indicator if the command is considered to have failed, and the output of the command is shown.

New are the `.out` and `.in` directories with the same names as the `.trycmd` files. These describe the filesystem before the command runs (may be missing), and the filesystem after the command has run. This way we can describe the effects of the `iroh get` command - directories and files are supposed to be created. We can also test failure scenarios where we refuse to overwrite a directory that already exists. #269 tracks various test cases.

## `get_stream`

The `get` high level CLI method has been removed from the mockable `Api` trait (read on to see where it went). Instead, a more low level but still useful API method `get_stream` has been added to the `Api` trait. This gets a stream of relative paths and `OutType`, describing the directories and files you can create. 

The big difference with what was there before is that it returns relative paths and doesn't calculate final destination paths -- that's up to the user of the API.

## No `async_trait` macro for `Api` trait

The interactions between the `Api` trait, `mockall` macro and `async_trait` were getting so hairy I couldn't figure out how to express things anymore once I wanted to add `get_stream`. For my sanity and also to learn better how this really works underneath, I've rewritten the `Api` trait to describe itself explicitly in terms of `(Local)BoxFuture` and `(Local)_BoxStream`. It's more verbose but functionally equivalent and I could express what I wanted. 

## `ApiExt` trait

The `ApiExt` trait is a trait that implements the high level `get` command. It puts everything together: it handles various error conditions, accesses the stream and then writes the stream to the filesystem. It's basically `iroh get`.

The `ApiExt` trait is solely intended to contain default trait methods, and is automatically available when the `Api` contract is fulfilled (if you `use` it).

Factored out `save_get_stream` from `getadd.rs` to be solely concerned with turning a stream into files and directories on the files system. That makes it possible to test its behavior in isolation.

## test fixture

The `get` test fixture now mocks `get_stream` and returns a fake stream made from a `Vec`. This defines the actual stuff that the CLI writes to disk.

## relative_path

Now depend on the [`relative-path` crate](https://crates.io/crates/relative-path) because what the stream returns are clearly relative paths, and we want to force the user to do something with them before being able to actually write stuff.
@b5 b5 closed this as completed Oct 21, 2022
@dignifiedquire dignifiedquire transferred this issue from n0-computer/iroh Feb 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants