Author: Daniel Martí (@mvdan)
It was suggested that I should capture my perspective as I get up to speed on the IPLD stack, so that we can possibly identify shortcomings with the current material, or topics which can cause problems.
I should note that I was already familiar with hashing, Git, type systems and data structures, encodings like JSON and Protobuf, and compatibility between different programming languages. So the "block" and "data model" layers of IPLD were relatively easy to understand.
The docs are somewhat scattered and unfinished, which does make it a little extra confusing to get started. In chronological order, I read:
-
https://hackmd.io/LHTTmGSWSvem4Wz2h_a39g?both, Eric's "terse primer". Seems to try to cover everything, though it does seem like quite a lot of information to take in all at once.
-
https://ipld.github.io/docs/, sourced at https://github.com/ipld/docs. Seems aimed at getting started with tutorials in JS. I should note that I first read this before Mikeal's "new intro" added on September 2nd 2020.
-
https://github.com/ipld/specs, which seems to contain all the formal specs, but also includes a pretty decent README.
I think all three should probably be unified into two halves:
-
A high-level introduction to IPLD, max 3-4 pages. Probably extending Mikeal's new intro with some material from Eric's primer?
-
The set of spec documents, with a README to classify and introduce each of the groups or layers. I think the specs repo already does a decent job at this.
I already raised some of these as Slack threads or HackMD comments, but for the sake of keeping record, I'm listing the most basic or important ones here.
-
It is said that a link is unlike a URL, since it is merely a hash of data that doesn't statically say where to fetch the data from. So... how would one ever actually fetch data via a link?
-
Out of the three layers (blocks, data model, schemas), Schemas have been by far the hardest to wrap my head around. I think an introduction should contain a very brief example, including how it actually looks like when mapped to the data model and encoded into a block.
The following are more such points, but focused around schemas, once I got to that part of the spec:
-
My first read about ADLs left me very confused, in particular how they're different than Schemas. I found the "Mapped to the Data Model" introduction to ADLs much easier to understand, as it shows reasonable examples.
-
Why are multi-block data structures a separate definition in the spec, and not just part of Schemas?
-
Are blocks generally filled with data nearly completely, or is it normal to have them relatively empty?
-
Wouldn't removing the first byte from a very long multi-block List mean that every block would need to be modified to shift all bytes forward by one? I assume and hope not, but the spec doesn't really give pointers.
-
Since data in blocks is encoded from the data model, how would Iknow if a particular data model value fits in a single block? What about a shema value?
-
Would IPLD be much different if the data model was an internal detail, and not exposed to the user? I imagine that, most of the time, one would interact with schemas and not the data model.
-
When docs say "the IPLD type system", is that in terms of Schemas, or the Data Model, or both? Answer: The schema intro later says that "types" are for schemas, and "kinds" for the data model. That should probably be sooner in the schema spec.
-
For new IPLD team members in the future, it's probably best if their first week is focused on the basics alone - blocks, hashing, linking, encodings, and the data model. That's enough for some realistic demos using IPLD, and can be learned in half a day, allowing the person to start contributing without multiple full days of reading. I should clarify that Eric did give me a data model issue to work on during my first week, but I never picked up on the "you don't need to read about schemas for now" nudge.
-
The specs README hierarchy presents these three concepts in order: multi-block collections, schemas, and ADLs. The docs do explain why schemas and ADLs are different, but not why multi-block collections are also a separate thing. Eric mentioned that multi-block collections are pretty much ADLs; so why are they introduced before schemas?
-
The schema authoring guide talks about "component specifiers" and "component specifications", but never seems to define them.