Skip to content
This repository has been archived by the owner on Mar 10, 2020. It is now read-only.

API proposal for ipfs.files.add #20

Closed
wants to merge 2 commits into from

Conversation

hackergrrl
Copy link
Contributor

@hackergrrl hackergrrl commented May 20, 2016

This is my first draft at merging the wildly different js-ipfs-api and js-ipfs add APIs. Notable things:

  1. I dropped support for adding files from the local filesystem and for adding urls. I've written up a plan for making this change. These are nice utilities, but they don't belong in core.
  2. Both object tuple formats (path, stream) and (path, content) are included here. It'd be nice if we could merge them into one style though. Maybe if content could be a Buffer or Readable?
  3. I made the return object format consistent with what js-ipfs already does. This will require a change to js-ipfs or unixfs-engine.
  4. We also accept a Buffer or a Readable stream.
  5. If no argument is passed in, a Writeable stream is returned. I feel mixed about this. I understand the convenience of being able to pipe directly into ipfs.files.add(), but this style is confusing in the way it overloads the API call. I'm on the preference of dropping it but letting folks develop their own utilities for it if they're so inclined. This also gets ambiguous when I don't provide data or callback -- do I get back a Promise OR a Writeable? :D

ping @diasdavid @nginnever @dignifiedquire

@haadcode
Copy link
Contributor

haadcode commented May 21, 2016

I would like the add to be:
ipfs.files.add('/tmp/myfile.txt') --> 'Qm...Foo'.

It'd be also fine to return a DAGNode, ie ipfs.files.add(path).toJSON().Hash --> 'Qm...Foo`.

If we want to give fine-grained options to define how/what the data is, being able to pass it as @noffle suggested sounds good.

  • ipfs.files.add(path) is consistent with go-ipfs CLI and does exactly what I would expect it to do
  • As a user, I shouldn't have to care how the file is read on the system level, I'm hooking up to a file system so it should function as such and abstract the system level functionality away from me
  • ipfs.files.addFiles(path) adds redundancy in the naming. It'd work better as ipfs.addFiles(path). Or perhaps as convenience method like: ipfs.files.addDirectory(path).
  • Currently ipfs.add(file) returns the full path of the file as an array of hashes and imo this is undesirable default behaviour. If a full path is needed, it should be an option, like recursive is now.

- a `Buffer` instance
- a readable stream

If no value `data` is provided, a Readable stream will be returned, which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Readable/Duplex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops -- I meant Writable. What's the use of this being a Duplex here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to get back the DAGNodes correspondent to the files (and directories) that you are adding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, it's a transform that's happening over the API! Got it. Thanks. :)

@hackergrrl
Copy link
Contributor Author

Thanks for weighing in, @haadcode! Responses:

I would like the add to be:
ipfs.files.add('/tmp/myfile.txt') --> 'Qm...Foo'.

We're really keen on having IPFS Core only expose basic operations that all IPFS nodes will be able to support. Adding files from the local filesystem is something that e.g. the browser can't do, so I think it doesn't belong in core. That's not to say however that APIs on top couldn't add this functionality!

It'd be also fine to return a DAGNode, ie ipfs.files.add(path).toJSON().Hash --> 'Qm...Foo`.

Pinging @diasdavid for thoughts on returning first-class objects vs JSON -- I'm not sure what I think yet.

  • ipfs.files.add(path) is consistent with go-ipfs CLI and does exactly what I would expect it to do

I think we set up a bad expectation here: that CLI should be 1:1 with Core. CLI is a specific interface into the broader world of Core, since CLI has access to things like the local FS, pipes, a shell interpreter, etc. Because of this Core cannot (and shouldn't) match CLI's capabilities. In the case of ipfs adding a file, I think it makes more sense for the CLI to do the glue work of turning the files/dirs into streams and passing them into Core. I think @nginnever did this in unixfs-engine importer?

  • As a user, I shouldn't have to care how the file is read on the system level, I'm hooking up to a file system so it should function as such and abstract the system level functionality away from me

How should the API abstract this on the browser, though? Treat file paths as index-db paths? What about future IPFS nodes that have no storage like this at all?

  • ipfs.files.addFiles(path) adds redundancy in the naming. It'd work better as ipfs.addFiles(path). Or perhaps as convenience method like: ipfs.files.addDirectory(path).

Sure! I'm down for either. :)

  • Currently ipfs.add(file) returns the full path of the file as an array of hashes and imo this is undesirable default behaviour. If a full path is needed, it should be an option, like recursive is now.

Sorry, could you try to explain this again with some more context (maybe an example)? I'm not sure I understand.

@daviddias
Copy link
Contributor

ipfs-core won't make reads (or writes) on the filesystem. Files are passed to ipfs-core, by application code or through the http-api, if using the daemon. This is the same pattern used in go-ipfs.


If no `callback` is passed, a promise is returned.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no data is passed, then, a duplex stream is returned, that pretty much is the duplex stream returned by unixfs-engine Importer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if no data is passed and no callback is passed? The possible return matrix starts to get messy -- you'd want a stream and a promise back. ;)

Copy link
Contributor

@daviddias daviddias May 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a misuse of the API.

Note that we need the ability to return a duplex stream to be able to 'add files as they come', through the http-api or through the cli.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have the callback/promise provide a stream if no data is passed, but we can't have the function actually return a stream ever, since we went down the road of always returning promises.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's return the duplex stream on the callback on the interface-ipfs-core. This does not require changes on unixfs-engine. Sounds good?

@daviddias
Copy link
Contributor

👍 for making the returned objects be first class DAGNodes :)

@hackergrrl
Copy link
Contributor Author

Revisions applied -- just our convo re DAGNodes to resolve.

(Aside: I really wish we could simplify this API method: 4 possible input types, 2 types of return values, and both callback and promise control flows.)

error if the operation was not successful. If no value `data` is provided, `res`
will be a Duplex stream, to which tuples like the above two object formats can
be written and [DAGNode][] objects will be outputted. Otherwise, `res` will be
an array of [DAGNode][]s.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give an example please.

stream.write({path: <path to file>, stream: <readable stream>})
// write as many as you want
stream.end()
stream.on('data', function (fileDAGNode) {
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@daviddias
Copy link
Contributor

Good progress :) Let's have tests :)

@hackergrrl
Copy link
Contributor Author

be written and [DAGNode][] objects will be outputted. Otherwise, `res` will be
an array of [DAGNode][]s.

If no `callback` is passed, a promise is returned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the Promise resolve to? An array of DAGNodes?

We should add examples for the returns

ipfs.files.add(path).then((res) => console.log(res)) // --> [DAGNode1, DAGNode2, ...] ???

@haadcode
Copy link
Contributor

Good progress!

My thoughts atm:

  • I still think we really shouldn't limit the input arguments to an array of objects, each of the form.... We should be able to pass a path (or an array of paths). I understand the argument that the browsers don't have a file system but we also have the Node.js version and js-ipfs-api that both can support it and on browser, you can add files with the array of objects structure and have control over where the "file" data comes from. It's a lot more intuitive: ipfs, add some *files*, from this path as opposed to ipfs, add some *data*, that you should handle as files, and they come in like this....
  • I agree different types of result values is not ideal. However a callback is not a return value and as such should be fine if we always return a Promise and the callback's result argument is always a stream. This way the user has access to both: an actual result (array from the Promise) or a stream for further processing.
  • We should try to combine the input argument's object notation to contain only one type of object: { path: '/path', content: <buffer or stream> } and let the implementation handle the check which one it is.

@daviddias
Copy link
Contributor

daviddias commented May 24, 2016

I still think we really shouldn't limit the input arguments to an array of objects, each of the form.... We should be able to pass a path (or an array of paths). I understand the argument that the browsers don't have a file system but we also have the Node.js version and js-ipfs-api that both can support it and on browser, you can add files with the array of objects structure and have control over where the "file" data comes from. It's a lot more intuitive: ipfs, add some files, from this path as opposed to ipfs, add some data, that you should handle as files, and they come in like this....

The result from the discussion we had (although I see not document) is that there will be, inside js-ipfs-api, extra API calls like:

ipfs.cli.add - which takes an input path, just like the CLI

These calls will only be available on the js-ipfs-api and are not part of the interface-core-spec

I agree different types of result values is not ideal. However a callback is not a return value and as such should be fine if we always return a Promise and the callback's result argument is always a stream. This way the user has access to both: an actual result (array from the Promise) or a stream for further processing.

Agree. However, it is fairly easy to distinguish between a call where data was already passed and a call where data wasn't (meaning that it wants the stream)

We should try to combine the input argument's object notation to contain only one type of object: { path: '/path', content: } and let the implementation handle the check which one it is.

Agree and I think this is what it is at the moment, did I miss something?

@haadcode
Copy link
Contributor

ipfs.cli.add - which takes an input path, just like the CLI

Will this be available in Node.js version of js-ipfs? I think it should.

These calls will only be available on the js-ipfs-api and are not part of the interface-core-spec

I think the namespacing here is what makes it confusing for me. If we move the add under ipfs.cli.add and regardless have ipfs.files.add, people will use the latter as it makes sense intuitively whereas its input arguments imo don't. I think there might be a conflict of perspectives here: I'm looking at it from a higher-level, app developer perspective and perhaps what is proposed here is akin more to a library/module developer perspective. While both important, the top-level API needs to have as simple and intuitive UX as possible. The way I read it atm, with the proposed api, is that I'm adding arbitrary data to ipfs via the files api, not files. A file has a path, a file is located somewhere, so the intuitive thinking says I need to tell the api which file it is I want to add. Do you see what I mean?

Agree. The 'return value' (aka result) should be (and is) always the same, in this case. It should be the stream that will emit the several DAGNodes.

Wait, what's your understanding of the value the returned Promise resolves to? A stream or an array?

We should try to combine the input argument's object notation to contain only one type of object: { path: '/path', content: } and let the implementation handle the check which one it is.
Agree and I think this is what it is at the moment, did I miss something?

It's not reflected (yet?) in the PR, perhaps I'm looking at the wrong PR?

@haadcode
Copy link
Contributor

Discussion today:

dignifiedquire> I think it would be better to have two different methods, than to try and shoehorn everything into one
daviddias> I'm also happy to have two different methods
dignifiedquire> just have a method `createAddStream`
dignifiedquire> then everybody knows what to expect
haad> was thinking the same.
daviddias> Ok, let's do that.
daviddias> Can someone type that while I'm at the subway? So that we don't loose this decision? :)

This sounds really good to me. Separate the stream and Promise control flows and it'll simplify everything nicely.

@hackergrrl
Copy link
Contributor Author

hackergrrl commented May 24, 2016

Cool! Lots of thinking happened here while I was sleeping.

Yes: let's have functions that do one single thing than be a grab bag of inputs and outputs! Here's what I've aggregated from your comments:

  1. break add into add and createAddStream (@dignifiedquire)
  2. have one form of tuple input ({ path, content }) rather than supporting stream and content keys (@haadcode)

I've made these revisions here.


@haadcode and @diasdavid re ipfs.cli.add etc for file adding: I think the point of confusion here is on what interface-ipfs-core is providing. Here are two things this module is:

  1. an interface for core ipfs functions
  2. a way to use either a local JS node or a remote go/js node over HTTP with a single interface

Here is something it is not:

  1. a drop-in replacement for all of the functionality in js-ipfs-api

It's unfortunate, but js-ipfs-api does more things than what "Core IPFS" does, so that overflow needs to go somewhere in its API.

So, and niceties that js-ipfs-api provides (like reading files from the FS) needs to be exposed from a higher level API than Core. Maybe this means a new module that builds atop interface-ipfs-core, like js-node-ipfs-helper that can do all of these local FS helper operations.

@dignifiedquire
Copy link
Contributor

Re part 1: Yes that was what we meant


Re part 2:

I think it's a really good notion of having a module that wraps around both ipfs-core as well as js-ipfs-api to provide additional functionality, like interacting with the file system.

path: `test-folder/${name}`,
content: fs.readFileSync(path.join(base, name))
})
const dirs = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if there was a test for an empty directory by adding one in data/test-folder/empty so that it can check this addition to js-ipfs pr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Delegating this to #23

for (let i = 0; i < res.length; i++) {
console.log('added', res[i].Hash, res[i].Name)
}
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still use this file?

@daviddias
Copy link
Contributor

The tests should cover adding a files:

  • file with less than a block size
  • file bigger than a block size
  • file bigger than a block size, but that is not a multiple of the block size (size % 256KiB !== 0)
  • file that is 'big' (between 10 and 20Mb)
  • directory with files
  • directory with files and empty dirs
  • directory with other directories that have files and other empty dirs (nested dirs)

@daviddias
Copy link
Contributor

Note for the future: @noffle open branches on the main repos, so that others can collaborate without having to PR your PR in order to make a change :)

@hackergrrl
Copy link
Contributor Author

@diasdavid I filed #23 for the tests you mention. I'm going to focus on getting this code in before expanding the test suite.

@hackergrrl
Copy link
Contributor Author

open branches on the main repos

I get used to the general github workflow, where you don't tend to have origin write access. 😀

@hackergrrl
Copy link
Contributor Author

hackergrrl commented Jun 1, 2016

All feedback has been addressed!

@diasdavid:

  • merge this PR
  • npm version minor
  • npm publish

With this, I can update the dep in my js-ipfs and js-ipfs-api PRs 🎉

@daviddias
Copy link
Contributor

Awesome work! SO CLOSE :D

It feels to me that postponing the tests that will check that this API operates as expect, might be a decision that will quickly backfire.

Anything against pushing this tests ahead? All of them are almost created over js-ipfs, js-ipfs-api and js-ipfs-unixfs-engine already, just need to be ported here.

```

If no `callback` is passed, a promise is returned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add: example:

@hackergrrl hackergrrl mentioned this pull request Jun 1, 2016
7 tasks
@hackergrrl
Copy link
Contributor Author

hackergrrl commented Jun 1, 2016

@diasdavid done: these tests are all now present! All of them except for empty dirs were already present.

@daviddias
Copy link
Contributor

👍 @noffle, rad :D

Going to jump into a plane down, wanna squash the commits?

Also replaces the old javascript test data with Project Gutenberg prose.
Also returns object wrapping path+node on ipfs.files.add.
@hackergrrl
Copy link
Contributor Author

Squashed and ready.

const testfileBigPath = path.join(__dirname, './data/15mb.random')
testfileBig = fs.createReadStream(testfileBigPath, { bufferSize: 128 })
} else {
testfile = require('raw!./data/testfile.txt')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now use the browserify transform to support fs.readFileSync on the transpilation process as well. We don't need to use the require('raw

@hackergrrl
Copy link
Contributor Author

@diasdavid This is code I copied out of the existing repos, so these tests never ran in the browser previously either. Agreed 100% that we should make them do so, but given a) what work this PR is blocking, and b) that by not running in them in the browser we're no worse off than we were before (in fact, we're still better off, because there are most tests here than before!), I suggest we merge this now and come back to your those tests later. In fact, I've created a tracking issue for it. SGTM?

@daviddias daviddias mentioned this pull request Jun 5, 2016
3 tasks
@daviddias
Copy link
Contributor

Working on merging this. Made a branch on this repo so that we can all contribute to: #26 , closing this one.

@daviddias daviddias closed this Jun 5, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants