cardano-foundation · gabrielKerekes · Jun 14, 2021 · gabrielKerekes · Jun 15, 2021 · gabrielKerekes
diff --git a/CIP-0021/CIP-0021.md b/CIP-0021/CIP-0021.md
@@ -0,0 +1,104 @@
+---
+CIP: 0021
+Title: Canonical CBOR serialisation
+Authors: Gabriel Kerekes <[email protected]>, Rafael Korbas <[email protected]>, Jan Mazak <[email protected]>
+Status: Draft
+Type: Standards
+Created: 2021-06-15
+License: CC-BY-4.0
+---
+
+## Abstract
+
+This CIP defines a canonical format for CBOR serialisation in Cardano by extending the suggestions for canonical CBOR from the [CBOR specification RFC](https://datatracker.ietf.org/doc/html/rfc7049#section-3.9).
+
+## Motivation
+
+While Cardano nodes are quite lenient in accepting minor variations in transaction serialisation format allowed by the CDDL specification, certain tools might require additional restrictions ensuring interoperability. This is especially important for tools working with hardware wallets which need to reserialise transactions (due to their memory limitations and certain security considerations). In order to ensure consistency between the transaction body and all the witness signatures, we need to specify a canonical format for CBOR serialisation.
+
+This is particularly relevant with the introduction of multisig. If the party initially creating the transaction uses e.g. cardano-cli to create the transaction, then other involved multisig parties might not be able to sign the transaction using HW wallets, because HW wallets might order some elements in the transaction differently, ending up with a different transaction hash. The same applies when the transaction is initially created by a HW wallet and requires signatures from parties using cardano-cli or even some custom third-party tool.
+
+For example, in order to sign a transaction with a HW wallet, cardano-hw-cli takes the serialised transaction body from cardano-cli, parses it and transforms it into input parameters for the HW wallet. Without canonical CBOR serialisation it might happen that the HW wallet will calculate a different transaction hash than cardano-cli and the returned witness signature would thus be invalid for the transaction created by cardano-cli.
+
+Another motivation for this CIP is that without canonical CBOR serialisation, specifically canonical map key order, HW wallets would not be able to verify that multi-asset policies and asset_names are included only once in a given output or policy respectively, as they don't have enough memory to store all the previously passed tokens for the sake of deduplication. This might then cause the user to sign a transaction with the same token included twice, although at the protocol level, only one use of the token would be accounted for.
+
+## Specification
+
+This specification is a modification of the canonical CBOR suggestions taken from the [CBOR specification RFC](https://datatracker.ietf.org/doc/html/rfc7049#section-3.9). Parts not mentioned in this CIP should follow the canonical CBOR suggestions from the CBOR RFC.
+
+### Map key ordering
+
+All keys in a map should be ordered based on their real, logical value, not their CBOR representation. Number type keys (uints, ints and floats) should be ordered based on their numerical value. Byte string and text string keys should be ordered based on their lexicographic order. If the keys are of different types then the order described below should be maintained.
+
+**Ordering of keys with different types**
+
+The order of types is based on the CBOR major type’s 3 bit header identificator:
+
+1. unsigned integer - major type 0
+2. negative integer - major type 1
+3. byte string - major type 2
+4. text string - major type 3
+5. floating-point numbers - major type 7
+
+I.e. first come the uints, then the ints, byte strings, text strings and floating-point numbers.
+
+Arrays, maps and tagged items are left out of this specification as including them might unnecessarily complicate the specification or its implementation. Also, these elements aren’t currently used as keys in maps and are unlikely to be used in the future.
+
+**Examples of correctly ordered keys**
+
+- `"aa", "b", "c", "cccc", "d"`
+- `h'0000', h'01', h'02020202', h'03'`
+- `1, 2, 3, 4, 100, 1000`
+- `1, 100, -100, -200, h'01', h'0202020202', "aa", "b", 0.7, 2.4`
+
+## Rationale
+
+### Why not follow the canonical CBOR suggestions completely?
+
+The canonical CBOR suggestions order the keys based on their CBOR representations with the rather weird caveat that shorter keys come first regardless of their lexicographic ordering. Lexicographic ordering is more natural and it might be simpler for many applications to sort based on the real, logical value of the key rather than the serialised representation.
+
+### Keys with mixed types
+
+We mustn’t forget that the CBOR map keys can be of different types. The type ordering can be chosen arbitrarily but it seems most natural to choose the ordering based on the CBOR types’ 3 bit header identificator.
+
+## Backwards compatibility
+
+Canonical CBOR serialisation defined in this CIP is not compatible with canonical CBOR serialisation defined by the CBOR RFC. We recommend that all tools migrate to the canonical CBOR serialisation defined by this CIP in order to ensure interoperability across the ecosystem.
+
+## Test vectors
+
+### Valid test vector
+
+**Text strings are correctly ordered lexicographically**
+
+`"aa", "b", "c", "cccc", "d"`
+
+**Byte strings are correctly ordered lexicographically**
+
+`h'0000', h'01', h'02020202', h'03'`
+
+**Unsigned integers are correctly ordered**
+
+`1, 2, 3, 4, 100, 1000`
+
+**Multiple types are correctly ordered**
+
+`1, 100, -100, -200, h'01', h'0202020202', "aa", "b", 0.7, 2.4`
+
+### Invalid test vectors
+
+**Text strings are ordered following canonical CBOR from RFC**
+
+`"b", "c", "d", "aa", "cccc"`
+
+**Types are not in the correct order - bytes come first instead of uints**
+
+`h'01', h'0202020202', 1, 100, -100, -200, "aa", "b", 0.7, 2.4`
+
+**Types are not in the correct order - strings are in an incorrect order**
+
+`1, 100, -100, -200, h'01', h'0202020202', "b", "aa", 0.7, 2.4`
+
+## Copyright
+
+This CIP is licensed under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/legalcode)