Skip to content
This repository has been archived by the owner on Aug 12, 2020. It is now read-only.

chore: use raw nodes for leaf data #214

Merged
merged 1 commit into from
Jun 12, 2018
Merged

Conversation

achingbrain
Copy link
Collaborator

go uses raw unixfs nodes for leaf data whereas this module uses file nodes - that causes CIDs to differ between the two implementations. I've added a rawLeafNodes option (false by default) that if set to true will cause this module to also use raw nodes for leaf data.

It also seems that setting the reduceSingleLeafToSelf option to false in order to disable that optimisation was being ignored if the optimisation was possible so I made it true by default and respected if explicitly set to false.

@ghost ghost assigned achingbrain May 18, 2018
@ghost ghost added the status/in-progress In progress label May 18, 2018
@achingbrain
Copy link
Collaborator Author

Requires the following to be merged:

ipfs/js-ipfs-unixfs#24
ipfs/js-ipfs-unixfs#25

@achingbrain achingbrain requested review from pgte and daviddias May 29, 2018 10:18
@daviddias
Copy link
Contributor

daviddias commented May 29, 2018

@achingbrain I remember that go-ipfs used file for raw nodes when I implemented this and then things might have changed with the introduction of IPLD to IPLD raw nodes. I know that there are incoherence between regular ipfs add, ipfs files write and when using CIDv1 vs CIDv0.

Can you create a series of interop tests at http://github.com/ipfs/interop to ensure that these things are well captured?

@achingbrain
Copy link
Collaborator Author

@diasdavid aha, I didn't know about that repo. Is it run as part of CI anywhere?

@achingbrain
Copy link
Collaborator Author

@diasdavid

I've added a test here: ipfs/interop#22 - of course, it won't pass for js-ipfs until the mfs functionality has been merged and released.

I found that ipfs add results in file leaf nodes but cat data.file | ipfs files write --create /somepath results in raw leaf nodes.

I asked in the go-ipfs repo about this behaviour and it seems to be because go-ipfs exclusively uses trickle-dags for mfs.

@daviddias
Copy link
Contributor

I asked in the go-ipfs repo about this behaviour and it seems to be because go-ipfs exclusively uses trickle-dags for mfs.

This is a massive loss for deduping. @achingbrain see https://github.com/ipfs/js-ipfs-unixfs-engine/issues/119 and ipfs/kubo#3576

@achingbrain
Copy link
Collaborator Author

I see what you mean, but logically if a file is split up into a bunch of chunks, each chunk is just raw data and not a file so probably shouldn't be marked as such.

That said, it's just a file because it's a named stream of bytes stored in a directory so it could all just be raw and interpreted as a file if we got to the node by following a named DAGLink.

E.g. to extend the example from the ipld spec with a type field:

{
  "files": {
    "cat.jpg": { // give links properties wrapping them in another object
      "link": {"/": "/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k"},
      "mode": 0755,
      "owner": "jbenet",
      "type": "file" // or "symlink" or "directory", etc
    }
  }
}

🤷‍♂️ Just thinking out loud.


ipfs/kubo#3576 has been open for well over a year. We can't really enforce interop without replicating go's behaviour, even if it's wrong. How do you want to proceed?

@daviddias
Copy link
Contributor

@achingbrain target interop first and outline a plan for migration.

@achingbrain
Copy link
Collaborator Author

In that case we're going to have to take the hit on deduping...

achingbrain added a commit to ipfs/js-ipfs that referenced this pull request May 30, 2018
ipfs-inactive/js-ipfs-unixfs-engine#214 changes the algorithm used when adding files to be more
consistent with the go implementation. This means some generated hashes will change.

This PR updates those hashes in our test suite.
achingbrain added a commit to ipfs/js-ipfs that referenced this pull request May 30, 2018
ipfs-inactive/js-ipfs-unixfs-engine#214 changes the algorithm used when adding files to be more
consistent with the go implementation. This means some generated hashes will change.

This PR updates those hashes in our test suite.
achingbrain added a commit to ipfs/js-ipfs that referenced this pull request May 30, 2018
ipfs-inactive/js-ipfs-unixfs-engine#214 changes the algorithm used when adding files to be more
consistent with the go implementation. This means some generated hashes will change.

This PR updates those hashes in our test suite.
achingbrain added a commit to ipfs/js-ipfs that referenced this pull request May 30, 2018
ipfs-inactive/js-ipfs-unixfs-engine#214 changes the algorithm used when adding files to be more
consistent with the go implementation. This means some generated hashes will change.

This PR updates those hashes in our test suite.
achingbrain added a commit to ipfs/js-ipfs that referenced this pull request May 30, 2018
ipfs-inactive/js-ipfs-unixfs-engine#214 changes the algorithm used when adding files to be more
consistent with the go implementation. This means some generated hashes will change.

This PR updates those hashes in our test suite.
achingbrain added a commit to ipfs/js-ipfs that referenced this pull request May 30, 2018
ipfs-inactive/js-ipfs-unixfs-engine#214 changes the algorithm used when adding files to be more
consistent with the go implementation. This means some generated hashes will change.

This PR updates those hashes in our test suite.
@alanshaw
Copy link
Contributor

alanshaw commented Jun 8, 2018

ping @achingbrain what's left to do on this one - flip the rawLeafNodes option default? Code looks sensible, would be nice to have another test or two 😉

@achingbrain achingbrain force-pushed the use-raw-nodes-for-leaves branch 2 times, most recently from 7744b68 to 61e5e4e Compare June 11, 2018 18:25
go uses `raw` unixfs nodes for leaf data whereas this module uses `file` nodes, that causes CIDs to differ between the two implementations
@achingbrain achingbrain force-pushed the use-raw-nodes-for-leaves branch from 61e5e4e to 9d44a75 Compare June 12, 2018 09:46
@achingbrain
Copy link
Collaborator Author

@alanshaw from what I can see the go implementation uses different importers depending on what you are doing.

ipfs add marks leaf nodes as file (which jsipfs has replicated), ipfs files add marks them as raw.

Until someone decides which is correct we should probably leave the defaults as they are.

@achingbrain
Copy link
Collaborator Author

I've added some tests and updated the readme too. The Mac Jenkins slaves are preventing the build from passing as they are out of disk space so I'm going to merge this.

@achingbrain achingbrain merged commit 269b35a into master Jun 12, 2018
@ghost ghost removed the status/in-progress In progress label Jun 12, 2018
@achingbrain achingbrain deleted the use-raw-nodes-for-leaves branch June 12, 2018 09:58
achingbrain added a commit to ipfs/js-ipfs that referenced this pull request Jun 12, 2018
ipfs-inactive/js-ipfs-unixfs-engine#214 changes the algorithm used when adding files to be more
consistent with the go implementation. This means some generated hashes will change.

This PR updates those hashes in our test suite.
achingbrain added a commit to ipfs/js-ipfs that referenced this pull request Jun 12, 2018
ipfs-inactive/js-ipfs-unixfs-engine#214 changes the algorithm used when adding files to be more
consistent with the go implementation. This means some generated hashes will change.

This PR updates those hashes in our test suite.
alanshaw pushed a commit to ipfs/js-ipfs that referenced this pull request Jun 18, 2018
ipfs-inactive/js-ipfs-unixfs-engine#214 changes the algorithm used when adding files to be more
consistent with the go implementation. This means some generated hashes will change.

This PR updates those hashes in our test suite.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants