Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP-0021: Canonical CBOR serialisation #101

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions CIP-0021/CIP-0021.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
CIP: 0021
Title: Canonical CBOR serialisation
Authors: Gabriel Kerekes <[email protected]>, Rafael Korbas <[email protected]>, Jan Mazak <[email protected]>
Status: Draft
Type: Standards
Created: 2021-06-15
License: CC-BY-4.0
---

## Abstract

This CIP defines a canonical format for CBOR serialisation in Cardano by extending the suggestions for canonical CBOR from the [CBOR specification RFC](https://datatracker.ietf.org/doc/html/rfc7049#section-3.9).

## Motivation

While Cardano nodes are quite lenient in accepting minor variations in transaction serialisation format allowed by the CDDL specification, certain tools might require additional restrictions ensuring interoperability. This is especially important for tools working with hardware wallets which need to reserialise transactions (due to their memory limitations and certain security considerations). In order to ensure consistency between the transaction body and all the witness signatures, we need to specify a canonical format for CBOR serialisation.

This is particularly relevant with the introduction of multisig. If the party initially creating the transaction uses e.g. cardano-cli to create the transaction, then other involved multisig parties might not be able to sign the transaction using HW wallets, because HW wallets might order some elements in the transaction differently, ending up with a different transaction hash. The same applies when the transaction is initially created by a HW wallet and requires signatures from parties using cardano-cli or even some custom third-party tool.

For example, in order to sign a transaction with a HW wallet, cardano-hw-cli takes the serialised transaction body from cardano-cli, parses it and transforms it into input parameters for the HW wallet. Without canonical CBOR serialisation it might happen that the HW wallet will calculate a different transaction hash than cardano-cli and the returned witness signature would thus be invalid for the transaction created by cardano-cli.

Another motivation for this CIP is that without canonical CBOR serialisation, specifically canonical map key order, HW wallets would not be able to verify that multi-asset policies and asset_names are included only once in a given output or policy respectively, as they don't have enough memory to store all the previously passed tokens for the sake of deduplication. This might then cause the user to sign a transaction with the same token included twice, although at the protocol level, only one use of the token would be accounted for.

## Specification

This specification is a modification of the canonical CBOR suggestions taken from the [CBOR specification RFC](https://datatracker.ietf.org/doc/html/rfc7049#section-3.9). Parts not mentioned in this CIP should follow the canonical CBOR suggestions from the CBOR RFC.

### Map key ordering

All keys in a map should be ordered based on their real, logical value, not their CBOR representation. Number type keys (uints, ints and floats) should be ordered based on their numerical value. Byte string and text string keys should be ordered based on their lexicographic order. If the keys are of different types then the order described below should be maintained.

**Ordering of keys with different types**

The order of types is based on the CBOR major type’s 3 bit header identificator:

1. unsigned integer - major type 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ordering based on the 3 bit header would result in positive numbers being ordered before negative numbers which doesn't seem too intuitive and it might add unnecessary logic to the ordering. Perhaps the ordering should simply be something like: First come the number types ordered by their numerical values (regardless of their type) and then the byte strings and the text strings.

2. negative integer - major type 1
3. byte string - major type 2
4. text string - major type 3
5. floating-point numbers - major type 7
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there ever be a need for placing floating-point numbers into a map? Perhaps there's a general consensus to avoid floating-point numbers in CBOR? (see pool registration certificate margin)


I.e. first come the uints, then the ints, byte strings, text strings and floating-point numbers.

Arrays, maps and tagged items are left out of this specification as including them might unnecessarily complicate the specification or its implementation. Also, these elements aren’t currently used as keys in maps and are unlikely to be used in the future.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to leave arrays, maps and tagged items out of the spec, but there might a use case for them - especially tagged items. Is this perhaps somehow implemented in cardano-cli? How would cardano-cli handle serialising a map with tags as keys? Perhaps @dcoutts you might have some pointers on how to deal with this?


**Examples of correctly ordered keys**

- `"aa", "b", "c", "cccc", "d"`
- `h'0000', h'01', h'02020202', h'03'`
- `1, 2, 3, 4, 100, 1000`
- `1, 100, -100, -200, h'01', h'0202020202', "aa", "b", 0.7, 2.4`

## Rationale

### Why not follow the canonical CBOR suggestions completely?

The canonical CBOR suggestions order the keys based on their CBOR representations with the rather weird caveat that shorter keys come first regardless of their lexicographic ordering. Lexicographic ordering is more natural and it might be simpler for many applications to sort based on the real, logical value of the key rather than the serialised representation.

### Keys with mixed types

We mustn’t forget that the CBOR map keys can be of different types. The type ordering can be chosen arbitrarily but it seems most natural to choose the ordering based on the CBOR types’ 3 bit header identificator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really feel strongly about any of those arguments. Since there already exists a specification for producing canonical CBOR serialization, coming up with a new one does not sound to me like a good idea?

The main argument seems to be that it is "most natural to choose ordering based on the CBOR types". It certainly is appealing to the human brain but encoded data are processed by programs anyway. How does the proposal makes it simpler for many applications 🤔 ?


## Backwards compatibility

Canonical CBOR serialisation defined in this CIP is not compatible with canonical CBOR serialisation defined by the CBOR RFC. We recommend that all tools migrate to the canonical CBOR serialisation defined by this CIP in order to ensure interoperability across the ecosystem.

## Test vectors

### Valid test vector

**Text strings are correctly ordered lexicographically**

`"aa", "b", "c", "cccc", "d"`

**Byte strings are correctly ordered lexicographically**

`h'0000', h'01', h'02020202', h'03'`

**Unsigned integers are correctly ordered**

`1, 2, 3, 4, 100, 1000`

**Multiple types are correctly ordered**

`1, 100, -100, -200, h'01', h'0202020202', "aa", "b", 0.7, 2.4`

### Invalid test vectors

**Text strings are ordered following canonical CBOR from RFC**

`"b", "c", "d", "aa", "cccc"`

**Types are not in the correct order - bytes come first instead of uints**

`h'01', h'0202020202', 1, 100, -100, -200, "aa", "b", 0.7, 2.4`

**Types are not in the correct order - strings are in an incorrect order**

`1, 100, -100, -200, h'01', h'0202020202', "b", "aa", 0.7, 2.4`

## Copyright

This CIP is licensed under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/legalcode)