Skip to content

Commit

Permalink
Added links to specs, created an explicit identity spec for clarifica…
Browse files Browse the repository at this point in the history
…tion

Closes multiformats#76
  • Loading branch information
sg495 committed Oct 20, 2021
1 parent bf71203 commit b2cec76
Show file tree
Hide file tree
Showing 2 changed files with 58 additions and 0 deletions.
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,23 @@ proquint, p, pro-quint https://arxiv.org/html/0901.4016 (see RFC),

**NOTE:** Multibase-prefixes are encoding agnostic: "z" is "z", not 0x7a ("z" encoded as ASCII/UTF-8). For example, in UTF-32, "z" would be `[0x7a, 0x00, 0x00, 0x00]`. In particular, the multibase code 0x00 listed for the identity encoding is the non-printable ASCII/UTF-8 character with codepoint 0x00, while the multibase code 0 listed for base2 is the ASCII/UTF-8 character "0" (which has codepoint 0x30).

## Specifications

Below is a list of specs for the underlying base encodings:

- `identity` [identity RFC](rfcs/identity.md)
- `base2` [base2 RFC](rfcs/Base2.md)
- `base8` [base8 RFC](rfcs/Base8.md), similar to [rfc4648](https://datatracker.ietf.org/doc/html/rfc4648.html)
- `base10` [base10 RFC](rfcs/Base10.md)
- `base36` [base36 RFC](rfcs/Base36.md)
- `base16*` [rfc4648](https://datatracker.ietf.org/doc/html/rfc4648.html)
- `base32*` (except for `base32z`) [rfc4648](https://datatracker.ietf.org/doc/html/rfc4648.html)
- `base32z` [human-oriented base32 spec](https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt)
- `base64*` [rfc4648](https://datatracker.ietf.org/doc/html/rfc4648.html)
- `base58btc` https://datatracker.ietf.org/doc/html/draft-msporny-base58-02
- `base58flickr` https://datatracker.ietf.org/doc/html/draft-msporny-base58-02, but using alphabet `123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ`
- `proquint` [proquint RFC](rfcs/PRO-QUINT.md), which is the [original spec](https://arxiv.org/html/0901.4016) with an added prefix for legibility

## Reserved

The following codes are _reserved_ for backwards compatibility with existing systems.
Expand Down
41 changes: 41 additions & 0 deletions rfcs/identity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Identity

The multibase identity prefix is the character non-printable ASCII/UTF-8 character with codepoint 0x00. Note that this is different from the multibase prefix 0 listed for base2, which is the ASCII/UTF-8 character "0" with codepoint 0x30.


## Encoding

A byte array `b` is encoded by converting it to the Unicode string `s` having as its UTF-8 bytes the byte array `b` prefixed with a single zero byte.

Below is a minimal implementation in Python, for clarification:

```py
def encode_identity(b: bytes) -> str:
utf8_bytes = b"\x00"+b
return utf8_bytes.decode("utf-8")
```

## Decoding

A Unicode string `s` is decoded by obtaining its UTF-8 bytes and dropping the leading byte. The UTF-8 byte array must be non-empty and the leading byte must be zero.

Below is a minimal implementation in Python, for clarification:

```py
def decode_identity(s: str) -> bytes:
utf8_bytes = s.encode("utf-8")
if not utf8_bytes or utf8_bytes[0] != 0:
raise ValueError("String not identity-encoded.")
return utf8_bytes[1:]
```

## Examples

```py
>>> encode_identity(bytes([0x31, 0x63, 0x57]))
'\x001cW'
>>> decode_identity("\x001cW")
b'1cW'
>>> list(decode_identity("\x001cW"))
[49, 99, 87] # [0x31, 0x63, 0x57]
```

0 comments on commit b2cec76

Please sign in to comment.