Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

Blocks should be addressed by Multihash, not CID #259

Closed
Stebalien opened this issue Aug 31, 2017 · 13 comments
Closed

Blocks should be addressed by Multihash, not CID #259

Stebalien opened this issue Aug 31, 2017 · 13 comments

Comments

@Stebalien
Copy link
Member

Given a block of data, it's possible to interpret it different ways assuming different codecs. Currently, in bitswap, the blockstore, and everywhere else for that matter, we track blocks using CIDs instead of using the data's multihash.

That is:

type Block interface {
	// Currently, this is Cid() *cid.Cid
	Multihash() mh.Multihash
	RawData() []byte
}

type Node interface {
	Block

	Cid() *cid.Cid
	// etc...
}

Use-cases/References:

Unfortunately, fixing this will be non trivial. This will likely end up being part of #255.

@Stebalien
Copy link
Member Author

@daviddias
Copy link
Member

Note that CID has multihash inside of it. It is not about not passing CID around, it is ensuring that we compare multihashes

@Stebalien
Copy link
Member Author

Note: Unless I'm mistaken, we should consider switching provider records to using multihashes (without breaking backwards compat too much).

@kevina
Copy link

kevina commented Aug 31, 2017

@Stebalien the block interface should still take CID's (just like @diasdavid said we should still use CIDs on the wire)

@Stebalien
Copy link
Member Author

Note that CID has multihash inside of it. It is not about not passing CID around, it is ensuring that we compare multihashes.

@kevina, @diasdavid I disagree. For bitswap, we should send CIDs because we'll need to do that for DAGSwap (IPLD selectors). For blockstores, blocks really shouldn't understand the concept of CIDs because they're blocks (not at the IPLD level). Yes, we could continue to use CIDs but that's mixing abstraction levels and will tend to lead to bugs (see the associated go-ipfs issue: https://github.com/ipfs/go-ipfs/issues/4189)

@Stebalien
Copy link
Member Author

Nevermind. It's actually quite useful to have blocks that carry information about their codec/CID without actually being able to interpret the underlying data.

@Stebalien
Copy link
Member Author

The datastore (in go, at least) is the layer that uses raw multihashes and doesn't understand CIDs.

@whyrusleeping
Copy link
Member

@Stebalien Yeah, I agree conceptually, but in practice there is near zero overlap. Blocks with the same multihash but different cids are non-existant in practice (please call me out on this if anyone disagrees).

@Stebalien
Copy link
Member Author

@whyrusleeping this is going to change drastically if we switch to base32 by default beacause base32 CIDs will be CIDv1, not CIDv0 (CIDv0, according to @diasdavid, is base58btc only by spec).

@Stebalien
Copy link
Member Author

The reason I closed this bug is that it's still useful to have blocks that carry CID information that can't (or doesn't need to be at this point in time) actually be interpreted.

@daviddias
Copy link
Member

daviddias commented Sep 1, 2017

@whyrusleeping if the multibase or the multicodec is different, then the multihash will be the same.

During the Lisbon Treaty, we were seriously concerned about block duplication (for example importing git blocks as raw and with git multicodec). But now, as @Stebalien points out, if we change the defaults to base32, we will see a ton of blocks being referenced by a new CID that will have the same multihash. Update: See my update below #259 (comment)

Also, if we don't compare by multihash it means that a CIDv0 would never match a CIDv1 for the same block and there for would miss fetching, even though the only things that change is the link and not the content.

@whyrusleeping
Copy link
Member

whyrusleeping commented Sep 1, 2017 via email

@daviddias
Copy link
Member

daviddias commented Sep 1, 2017

(and I sincerely hope the js code doesn't keep them encoded internally).

We use its Buffer format which is version + multicodec + multihash for storing them on the repo:

But you are right, multibase is not an issue. I knew there was something and led myself to believe that multibase was part of the mix too. The multicodec and cid version are the ones creating different keys (for blocks in repo) for the same content.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants