Added links to specs, created an explicit identity spec for clarifica…

…tion Closes multiformats#76
sg495 · Oct 20, 2021 · b2cec76 · b2cec76
1 parent bf71203
commit b2cec76
Show file tree

Hide file tree

Showing 2 changed files with 58 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -85,6 +85,23 @@ proquint,          p,    pro-quint https://arxiv.org/html/0901.4016 (see RFC),
 
 **NOTE:** Multibase-prefixes are encoding agnostic: "z" is "z", not 0x7a ("z" encoded as ASCII/UTF-8). For example, in UTF-32, "z" would be `[0x7a, 0x00, 0x00, 0x00]`. In particular, the multibase code 0x00 listed for the identity encoding is the non-printable ASCII/UTF-8 character with codepoint 0x00, while the multibase code 0 listed for base2 is the ASCII/UTF-8 character "0" (which has codepoint 0x30).
 
+## Specifications
+
+Below is a list of specs for the underlying base encodings:
+
+- `identity` [identity RFC](rfcs/identity.md)
+- `base2` [base2 RFC](rfcs/Base2.md)
+- `base8` [base8 RFC](rfcs/Base8.md), similar to [rfc4648](https://datatracker.ietf.org/doc/html/rfc4648.html)
+- `base10` [base10 RFC](rfcs/Base10.md)
+- `base36` [base36 RFC](rfcs/Base36.md)
+- `base16*` [rfc4648](https://datatracker.ietf.org/doc/html/rfc4648.html)
+- `base32*` (except for `base32z`) [rfc4648](https://datatracker.ietf.org/doc/html/rfc4648.html)
+- `base32z` [human-oriented base32 spec](https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt)
+- `base64*` [rfc4648](https://datatracker.ietf.org/doc/html/rfc4648.html)
+- `base58btc` https://datatracker.ietf.org/doc/html/draft-msporny-base58-02
+- `base58flickr` https://datatracker.ietf.org/doc/html/draft-msporny-base58-02, but using alphabet `123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ`
+- `proquint` [proquint RFC](rfcs/PRO-QUINT.md), which is the [original spec](https://arxiv.org/html/0901.4016) with an added prefix for legibility
+
 ## Reserved
 
 The following codes are _reserved_ for backwards compatibility with existing systems.

diff --git a/rfcs/identity.md b/rfcs/identity.md
@@ -0,0 +1,41 @@
+# Identity
+
+The multibase identity prefix is the character non-printable ASCII/UTF-8 character with codepoint 0x00. Note that this is different from the multibase prefix 0 listed for base2, which is the ASCII/UTF-8 character "0" with codepoint 0x30.
+
+
+## Encoding
+
+A byte array `b` is encoded by converting it to the Unicode string `s` having as its UTF-8 bytes the byte array `b` prefixed with a single zero byte.
+
+Below is a minimal implementation in Python, for clarification:
+
+```py
+def encode_identity(b: bytes) -> str:
+    utf8_bytes = b"\x00"+b
+    return utf8_bytes.decode("utf-8")
+```
+
+## Decoding
+
+A Unicode string `s` is decoded by obtaining its UTF-8 bytes and dropping the leading byte. The UTF-8 byte array must be non-empty and the leading byte must be zero.
+
+Below is a minimal implementation in Python, for clarification:
+
+```py
+def decode_identity(s: str) -> bytes:
+    utf8_bytes = s.encode("utf-8")
+    if not utf8_bytes or utf8_bytes[0] != 0:
+        raise ValueError("String not identity-encoded.")
+    return utf8_bytes[1:]
+```
+
+## Examples
+
+```py
+>>> encode_identity(bytes([0x31, 0x63, 0x57]))
+'\x001cW'
+>>> decode_identity("\x001cW")
+b'1cW'
+>>> list(decode_identity("\x001cW"))
+[49, 99, 87] # [0x31, 0x63, 0x57]
+```