Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Commit

Permalink
Merge pull request #241 from ipld/library-design-recommendations
Browse files Browse the repository at this point in the history
Library design recommendations.
  • Loading branch information
warpfork authored May 4, 2020
2 parents 2014318 + 965b927 commit 819fcb0
Show file tree
Hide file tree
Showing 3 changed files with 235 additions and 0 deletions.
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,18 @@ In many cases these specifications are not intended to drive new implementations
Documents labelled "Specification" in this repository will also be labelled with a descriptor that indicates the category and status.
e.g. _"Status: Prescriptive - Draft"_ or _"Status: Descriptive - Final"_.

## Design documentation & Library recommendations

Included in this repository are some documents which chronicle our process in developing these specs,
as well as some documents which are advisory to library authors (but not specifications, per se):

- [design/...](/design) -- gathers all such documents
- [design/history/...](/design/history) -- gathers research work and pre-spec content and notes
- [design/libraries/...](/design/libraries) -- gathers recommendations for library authors

These documents may be useful to read for those who want to participate more deeply in the
design and specification processes (as well as implementation processes!) of IPLD.

## Contributing & Discussion

Suggestions, contributions, criticisms are welcome.
Expand Down
10 changes: 10 additions & 0 deletions design/libraries/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Library Design Guidance
=======================

This directory contains some documentation of recommendations for
library authors who want to make IPLD libraries in a new language
(or, perhaps for readers who want to understand an existing library better).

Some of the information expressed here comes down to opinions moreso than specification;
what is good ergonomics may vary wildly per language, so take these as
recommendations rather than strictures.
213 changes: 213 additions & 0 deletions design/libraries/nodes-and-kinds.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
Nodes and Kinds
===============

**Preface: purpose of this document:**

This document is intended for developers of new (or renovating) IPLD libraries.
It contains design suggestions based on the experience of building (and rebuilding)
IPLD libraries in various languages, and reflecting on the lessons learned.
It also contains both notes on practical limitations we've found for implementations,
and reflections on how to express things clearly within the type systems of the
host language a library is implemented in (whatever that may be).

**Preface: limitations of this document:**

Since this document is aimed at _new libraries_, it's also implicitly expecting
that the new library might be in a _new language_.
We can't presume to know precisely what that language will enable or encourage!
Therefore, there will be limits to how transferable the advice in this document may be.
We do expect that the best way to express IPLD concepts may vary based on
the language a library is created in. We accept this and try to write this
document anyway, and make it as useful as it can be.

These guidelines are written with particular attention to the limitations that
are typical to strongly typed languages. (Some of the phrasing used reflect
this -- we refer to "types", "enumerations", "interfaces", "packages", etc.
However, these concepts can still translate even to languages with varying
amounts of compile-time type checking, and indeed even to those with none.
While the concepts are certainly not identical across all languages,
we hope that they're close enough to be meaningful to a thoughtful reader.)

We expect that common concepts for IPLD libraries will emerge across many languages,
and hope that some vocabulary for these concepts is something we can share.
Loosely and untyped languages may need to interpret these guidelines
appropriately while extracting the key concepts; but even among languages with
stricter concepts of compile-time type checking, the meaning of "interface"
can vary greatly -- _all_ readers will need to be ready to use their best judgement.

---

Cornerstone Types
=================

Your IPLD library should have two cornerstone types:

1. `Node`;
2. `Kind`.

`Node` should be an interface -- the membership should be open
(aka, it should be possible for other packages to implement it).

`Kind` should be an enumeration -- a fixed set of named members,
which should not be extendable.


Kind
----

`Kind` maps very directly onto the definition of
[Data Model Kinds](../../data-model-layer/data-model.md#kinds).

`Kind` does not include the Schema layer's concept of "struct", etc.

`Kind` must be an enum, **and not a sum type**. Attempting to implement
kind as a sum type conflates it with `Node`.
(This may be tempting to try to combine `Kind` and `Node` into a single
sum type definition if you're only looking at the Data Model layer,
but it is a mistake: both Schema types and Advanced Layouts require
the ability to add more implementations of `Node`, so this conflation
will cause cataclysmic problems and force a painful refactor
as soon as you get to implementing those systems.
See the [different implementors of Node](#different-implementors-of-node)
section, later in this document, for more information on this.)


Node
----

`Node` is a monomorphized interface for handling data -- in other words,
we make all data look and act like a `Node`, so that we can write all of our
functions against the `Node` interface, and have that work for any sort of data.

`Node` has functions for examining any of the
[Data Model Kinds](../../data-model-layer/data-model.md#kinds).
For example, this means `Node` must be able to
do a key lookup for a map kind,
provide an iterator for a list kind,
or be convertible to a primitive if it's a integer kind.

`Node` is generally implemented by making an interface with the superset of all
these methods needed for the various kinds of data.
Some programming languages may also have a pattern-matching faculty which
may make this nicer; feel free to use it (but mind the caveats issued in the
[Kind](#kind) section above, and the
[different implementors of Node](#different-implementors-of-node) section below:
the membership of `Node` must remain *open*;
you do *not* want to use a sum type with a closed list of concrete members here,
or it will cause other roadblocks later that *will* force a redesign).
For languages where this is most straightforwardly implemented by a single
interface containing the superset of all necessary methods, many of the methods
will error if the `Node` refers to information of the wrong kind for that method;
this is fine.

`Node` should be clear about what sets of methods are valid for acting on it.
Typically, this is done by a `Node.Kind()` method, which should return
a member of the [Kind](#kind) enum.
This information is useful for anyone writing functions which use the `Node`
interface, because it's much more pleasant (and fast) to check the Kind and
know which methods can be expected to work than it is to have to probe every
method individually for failure.
(Again, programming languages with pattern-matching faculties may find
a cleverer way for their compiler and type system to support this.)

### different implementors of Node

Though the methods on the `Node` interface are defined as those necessary for
examining data of the [Data Model Kinds](../../data-model-layer/data-model.md#kinds),
**`Node` is not only implemented by the Data Model**:

- Yes, `Node` is implemented by types that just hold basic Data Model info;
- `Node` is also implemented by [Advanced Data Layouts](../../schemas/advanced-layouts.md) --
- consider a HAMT that spans many separately-seralized chunks of data; it should still be usable as if it's a regular map.
- `Node` is also implemented by [Schema-typed Nodes](../../schemas/) --
- Both if implemented by a single implementation that evalates rules at runtime (so, finite count of implementing types and known at core library compile time)...
- or if handled by codegen/macros (unknown count / open set of implementors of `Node`; not known at core library compile time; may be created in other packages that import the core, rather than core importing them!).

Even further, some libraries may choose to make even more various
implementations of `Node` for optimizing performance of specific tasks:
for example, a `Node` which implements basic Data Model "map" semantics,
but using some internal algorithm for memory layout which is known to be
efficient for certain workloads;
or for another example, a `Node` which is particularly efficient for handling
data of one particular serialization codec, and keeps a lazy-loading skip-tree
over the serialized bytes.
Clearly, neither of these should be the default implementation a library uses,
but clearly, both of them should be able to be used transparently,

With all seven (?! indeed, *seven*) of these different stories,
we can consider it conclusive that the `Node` interface should be ready
to support many, many diverse implementors.

### a default implementation of Node

As an IPLD library author, you may be tempted to make a single, "default"
implementation of `Node`.

Feel free to do so; but be cautious of giving it special privileges.
Try implementing it in a separate package from your core interfaces: this will
be a good exercise to make sure other implementations can later do the same.
(Since in the order of things you'll do when implementing a new IPLD library,
creating this basic default node implementation is likely quite early,
going about it in such a way that it forces design choices you'll need later
anyway will save you from potentially discovering the need for some costly
refactors later!)


Nodes vs NodeBuilders
---------------------

If you choose to pursue a distinction between mutable and immutable data
in the design of your library, it may be useful to create two separate
interfaces for each phase of the data's lifecycle.
These might be called "Node" (for the immutable data)
and "NodeBuilder" (for the mutating/building phase of the data's life).

It is not necessary to have distinct interfaces for this;
a library can also opt to have a mutable concept of "node".
Immutable interfaces can be particularly well-suited to IPLD data, though;
it's worth considering them.


Higher level functions
----------------------

Almost all features should be implemented to take `Node` arguments,
and return `Node` values.

Traversals and walks can be implemented in this way: e.g.
`function walk(start Node, vistorFn func(visted Node))`.

Selectors can be implemented in this way.
(Continue with the idea above for traversals.)

Transformations can be implemented in this way.
(Continue with the idea above for traversals.)

Codecs themselves can be implemented this way:
marshalling is a traverse over nodes, so `func marshal(obj Node) -> bytes`,
and unmarshalling is something like `func unmarshal(bytes, NodeBuilder) -> Node`.

(Note that if your library has a `Node`/`NodeBuilder` split for immutability purposes,
then of course any operation that builds new nodes,
such as transformations or codecs during unmarshalling,
will have a `NodeBuilder` parameter.
If your library has a mutable `Node`, these function signatures might appear differently.)

By defining all these functions in terms of `Node`, they can be used the same
in any of the various contexts described in the
[different implementors of Node](#different-implementors-of-node) section:

- traversals/selectors/transforms/etc work over various codecs (trivially,
by transitive property).
- traversals/selectors/transforms/etc work regardless of in-memory layouts
that may vary per `Node` implementation
- traversals/selectors/transforms/etc work transparently over ADLs!
- traversals/selectors/transforms/etc work transparently over schemas!

It is also useful to note that by implementing these features over the `Node`
interface, rather than *in* the `Node` interface, it becomes much more
possible to implement various kinds of e.g. traversal library
(perhaps you'll discover two different ways to go about it,
one with better ergonomics, and one with better performance?);
and it also requires much less code per `Node` implementation if things
like traversals are implemented from the outside.

0 comments on commit 819fcb0

Please sign in to comment.