forked from multiformats/multibase
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added links to specs, created an explicit identity spec for clarifica…
…tion Closes multiformats#76
- Loading branch information
Showing
2 changed files
with
58 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# Identity | ||
|
||
The multibase identity prefix is the character non-printable ASCII/UTF-8 character with codepoint 0x00. Note that this is different from the multibase prefix 0 listed for base2, which is the ASCII/UTF-8 character "0" with codepoint 0x30. | ||
|
||
|
||
## Encoding | ||
|
||
A byte array `b` is encoded by converting it to the Unicode string `s` having as its UTF-8 bytes the byte array `b` prefixed with a single zero byte. | ||
|
||
Below is a minimal implementation in Python, for clarification: | ||
|
||
```py | ||
def encode_identity(b: bytes) -> str: | ||
utf8_bytes = b"\x00"+b | ||
return utf8_bytes.decode("utf-8") | ||
``` | ||
|
||
## Decoding | ||
|
||
A Unicode string `s` is decoded by obtaining its UTF-8 bytes and dropping the leading byte. The UTF-8 byte array must be non-empty and the leading byte must be zero. | ||
|
||
Below is a minimal implementation in Python, for clarification: | ||
|
||
```py | ||
def decode_identity(s: str) -> bytes: | ||
utf8_bytes = s.encode("utf-8") | ||
if not utf8_bytes or utf8_bytes[0] != 0: | ||
raise ValueError("String not identity-encoded.") | ||
return utf8_bytes[1:] | ||
``` | ||
|
||
## Examples | ||
|
||
```py | ||
>>> encode_identity(bytes([0x31, 0x63, 0x57])) | ||
'\x001cW' | ||
>>> decode_identity("\x001cW") | ||
b'1cW' | ||
>>> list(decode_identity("\x001cW")) | ||
[49, 99, 87] # [0x31, 0x63, 0x57] | ||
``` |