-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CIP-0021: Canonical CBOR serialisation #101
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
--- | ||
CIP: 0021 | ||
Title: Canonical CBOR serialisation | ||
Authors: Gabriel Kerekes <[email protected]>, Rafael Korbas <[email protected]>, Jan Mazak <[email protected]> | ||
Status: Draft | ||
Type: Standards | ||
Created: 2021-06-15 | ||
License: CC-BY-4.0 | ||
--- | ||
|
||
## Abstract | ||
|
||
This CIP defines a canonical format for CBOR serialisation in Cardano by extending the suggestions for canonical CBOR from the [CBOR specification RFC](https://datatracker.ietf.org/doc/html/rfc7049#section-3.9). | ||
|
||
## Motivation | ||
|
||
While Cardano nodes are quite lenient in accepting minor variations in transaction serialisation format allowed by the CDDL specification, certain tools might require additional restrictions ensuring interoperability. This is especially important for tools working with hardware wallets which need to reserialise transactions (due to their memory limitations and certain security considerations). In order to ensure consistency between the transaction body and all the witness signatures, we need to specify a canonical format for CBOR serialisation. | ||
|
||
This is particularly relevant with the introduction of multisig. If the party initially creating the transaction uses e.g. cardano-cli to create the transaction, then other involved multisig parties might not be able to sign the transaction using HW wallets, because HW wallets might order some elements in the transaction differently, ending up with a different transaction hash. The same applies when the transaction is initially created by a HW wallet and requires signatures from parties using cardano-cli or even some custom third-party tool. | ||
|
||
For example, in order to sign a transaction with a HW wallet, cardano-hw-cli takes the serialised transaction body from cardano-cli, parses it and transforms it into input parameters for the HW wallet. Without canonical CBOR serialisation it might happen that the HW wallet will calculate a different transaction hash than cardano-cli and the returned witness signature would thus be invalid for the transaction created by cardano-cli. | ||
|
||
Another motivation for this CIP is that without canonical CBOR serialisation, specifically canonical map key order, HW wallets would not be able to verify that multi-asset policies and asset_names are included only once in a given output or policy respectively, as they don't have enough memory to store all the previously passed tokens for the sake of deduplication. This might then cause the user to sign a transaction with the same token included twice, although at the protocol level, only one use of the token would be accounted for. | ||
|
||
## Specification | ||
|
||
This specification is a modification of the canonical CBOR suggestions taken from the [CBOR specification RFC](https://datatracker.ietf.org/doc/html/rfc7049#section-3.9). Parts not mentioned in this CIP should follow the canonical CBOR suggestions from the CBOR RFC. | ||
|
||
### Map key ordering | ||
|
||
All keys in a map should be ordered based on their real, logical value, not their CBOR representation. Number type keys (uints, ints and floats) should be ordered based on their numerical value. Byte string and text string keys should be ordered based on their lexicographic order. If the keys are of different types then the order described below should be maintained. | ||
|
||
**Ordering of keys with different types** | ||
|
||
The order of types is based on the CBOR major type’s 3 bit header identificator: | ||
|
||
1. unsigned integer - major type 0 | ||
2. negative integer - major type 1 | ||
3. byte string - major type 2 | ||
4. text string - major type 3 | ||
5. floating-point numbers - major type 7 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will there ever be a need for placing floating-point numbers into a map? Perhaps there's a general consensus to avoid floating-point numbers in CBOR? (see pool registration certificate margin) |
||
|
||
I.e. first come the uints, then the ints, byte strings, text strings and floating-point numbers. | ||
|
||
Arrays, maps and tagged items are left out of this specification as including them might unnecessarily complicate the specification or its implementation. Also, these elements aren’t currently used as keys in maps and are unlikely to be used in the future. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We decided to leave arrays, maps and tagged items out of the spec, but there might a use case for them - especially tagged items. Is this perhaps somehow implemented in cardano-cli? How would cardano-cli handle serialising a map with tags as keys? Perhaps @dcoutts you might have some pointers on how to deal with this? |
||
|
||
**Examples of correctly ordered keys** | ||
|
||
- `"aa", "b", "c", "cccc", "d"` | ||
- `h'0000', h'01', h'02020202', h'03'` | ||
- `1, 2, 3, 4, 100, 1000` | ||
- `1, 100, -100, -200, h'01', h'0202020202', "aa", "b", 0.7, 2.4` | ||
|
||
## Rationale | ||
|
||
### Why not follow the canonical CBOR suggestions completely? | ||
|
||
The canonical CBOR suggestions order the keys based on their CBOR representations with the rather weird caveat that shorter keys come first regardless of their lexicographic ordering. Lexicographic ordering is more natural and it might be simpler for many applications to sort based on the real, logical value of the key rather than the serialised representation. | ||
|
||
### Keys with mixed types | ||
|
||
We mustn’t forget that the CBOR map keys can be of different types. The type ordering can be chosen arbitrarily but it seems most natural to choose the ordering based on the CBOR types’ 3 bit header identificator. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't really feel strongly about any of those arguments. Since there already exists a specification for producing canonical CBOR serialization, coming up with a new one does not sound to me like a good idea? The main argument seems to be that it is "most natural to choose ordering based on the CBOR types". It certainly is appealing to the human brain but encoded data are processed by programs anyway. How does the proposal makes it simpler for many applications 🤔 ? |
||
|
||
## Backwards compatibility | ||
|
||
Canonical CBOR serialisation defined in this CIP is not compatible with canonical CBOR serialisation defined by the CBOR RFC. We recommend that all tools migrate to the canonical CBOR serialisation defined by this CIP in order to ensure interoperability across the ecosystem. | ||
|
||
## Test vectors | ||
|
||
### Valid test vector | ||
|
||
**Text strings are correctly ordered lexicographically** | ||
|
||
`"aa", "b", "c", "cccc", "d"` | ||
|
||
**Byte strings are correctly ordered lexicographically** | ||
|
||
`h'0000', h'01', h'02020202', h'03'` | ||
|
||
**Unsigned integers are correctly ordered** | ||
|
||
`1, 2, 3, 4, 100, 1000` | ||
|
||
**Multiple types are correctly ordered** | ||
|
||
`1, 100, -100, -200, h'01', h'0202020202', "aa", "b", 0.7, 2.4` | ||
|
||
### Invalid test vectors | ||
|
||
**Text strings are ordered following canonical CBOR from RFC** | ||
|
||
`"b", "c", "d", "aa", "cccc"` | ||
|
||
**Types are not in the correct order - bytes come first instead of uints** | ||
|
||
`h'01', h'0202020202', 1, 100, -100, -200, "aa", "b", 0.7, 2.4` | ||
|
||
**Types are not in the correct order - strings are in an incorrect order** | ||
|
||
`1, 100, -100, -200, h'01', h'0202020202', "b", "aa", 0.7, 2.4` | ||
|
||
## Copyright | ||
|
||
This CIP is licensed under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/legalcode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ordering based on the 3 bit header would result in positive numbers being ordered before negative numbers which doesn't seem too intuitive and it might add unnecessary logic to the ordering. Perhaps the ordering should simply be something like: First come the number types ordered by their numerical values (regardless of their type) and then the byte strings and the text strings.