Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: [ADR-072] Sign Mode Unified #22714

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
349 changes: 349 additions & 0 deletions docs/architecture/adr-076-sign-mode-unified.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,349 @@
# ADR 075: Sign Mode Unified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix inconsistency in ADR number

The ADR number in the title (075) doesn't match the filename (076). Please ensure consistency between the title and filename.

-# ADR 075: Sign Mode Unified
+# ADR 076: Sign Mode Unified
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# ADR 075: Sign Mode Unified
# ADR 076: Sign Mode Unified


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this sign mode unify? I think both Amino JSON and Textual are designed to be self-descriptive documents. The big selling point of this is using on-chain state to mitigate the limitations of the other approach. I would use a great new name for great new tech.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name ideas are welcome! 😃

## Changelog

* December 2nd 2024: Initial Draft (Zondax AG)

## Status

DRAFT Not Implemented

## Abstract

This ADR introduces `SIGN_MODE_UNIFIED`, a new signing mode for the Cosmos SDK. This mode aims to provide a simpler and more future-proof signing process, focusing on ease of maintenance and an improved user and developer experience. At the core of this new mode is a concept for storing message specifications in a verifiable and efficient manner: *runtime metadata*. This metadata contains the definitions of all data types for each message of each module, down to their underlying primitive types. Signers can then use this metadata to accurately decode and display the intended transaction fields to the user in a human-readable format, avoiding blind signing on any Cosmos-based chain.

This metadata does not need to be stored on the signer app since, by using a chunked and merkleized version of the metadata, chunks representing the data types involved in the messages can be delivered on-demand to the signer, and the root hash of the tree can be computed from those chunks and proofs from the remaining chunks, thus verifying the metadata integrity. On the signer app side, what the user ends up signing is the `metadata root hash + blob`, this means that the signed transaction can be verified by the validator by checking that the metadata root hash in the signed transaction matches the one locally stored in the node.

## Context

## Challenges with Current sign modes
raynaudoe marked this conversation as resolved.
Show resolved Hide resolved

raynaudoe marked this conversation as resolved.
Show resolved Hide resolved
1. **Complex codebase:**
Maintaining both sign modes increases the maintenance overhead and the complexity of the codebase.

2. **Inconsistent signing processes across different Cosmos chains:**
Not all Cosmos chains use the same sign mode, which creates inconsistencies in the ecosystem.

3. **Blind signing:**
When wallets cannot decode a transaction or smart contract call, it leads to blind signing. Blind signing exposes users to potential security risks, as they may unknowingly sign malicious transactions.

4. **Cross-language Types Inconsistency:**
raynaudoe marked this conversation as resolved.
Show resolved Hide resolved
Implementing renderers for each language in which a signer would be implemented can lead to inconsistencies in type interpretation.

5. **Upgradeability:**
When new types are added to the system, wallets need to synchronize their renderers to match these new types. This process can lead to delays in supporting new types of transactions.


## Main components

Key components of `SIGN_MODE_UNIFIED` include:

* *Metadata struct*: A JSON file containing the definitions of all data types for each message of each module, down to primitive types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that this is referring to a JSON document, independent of its encoding (I.e. the in-memory data structure)? If so, I suggest avoiding the term "JSON file" as it indicates to something on disk, using some series of bytes. The difference matters a lot, especially as Go by default randomizes map entry order and we have a lot of genesis files where mismatch and confusion is caused by different encodings of the same document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for pointing that out. This is referring to a .json document meant to be stored on disk. This document is the one that the adr mentions that should be created from the .proto files.

The on-memory representation of this struct uses this document as source but then it encodes it (with the proposed codecs, json, cbor, etc) to make a canonical representation and later to build the merkle tree out of it.

Will enhance the docs to reenforce this idea

* *Metadata Digest*: A compact representation of the complete metadata set used for transaction construction. When hashed (digest hash), this struct serves as the root of the merkle tree, enabling efficient verification of metadata integrity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "digest" is usually some sort of summary or even the hash which I think is misleading here. What we have here is a serialization of the document above. I think "Metadata Serialization" or "Metadata file" (or similar) would be better.

* *Merkleized Structure*: The metadata is organized into a merkle tree, allowing for efficient proofs and verification.

### Metadata file
Copy link
Member

@kocubinski kocubinski Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a powerful abstraction and to me seems like a viable pathway to completely decouple a wallet or execution environment from protobuf. Did I get that right? If so we should push that as a big pro! I think proto has been troublesome on front ends, hardware wallets and VMs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right, the wallet / signers should be completely decoupled from the protobuf definition files


The metadata file is a JSON file that contains all the types and messages definitions used in the SDK, is meant to be generated at build time and stored in the node, using the protobuf definition files as source. This file (in fact, chunks of its encoded version) are sent to the signer in order to be used to decode and show the transaction's fields without the need to store locally all definitions of types and messages. In the end, all signers should only be compliant with the primitive types and their string representations. This ensures consistency and reliability in type interpretation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that there is a new "handshake" from the signer to the node they are broadcasting to?

They now much request the relevant metadata for their transaction?

If so, does this mean anything for offline signing?

Copy link
Contributor Author

@raynaudoe raynaudoe Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata chunks are requested to the node on-demand lets say, by the signer.

They now much request the relevant metadata for their transaction?

Yes, the signer asks for the chunks involved in the tx and the proofs to rebuild the tree hash

If so, does this mean anything for offline signing?

For offline signing, one option is that the wallet should store locally the full metadata file, so the entire signing and metadata verification happens offline. The "downside" is that this metadata should be updated every time it changes (a simple call to a node will be enough IMO). If the signer uses an outdated metadata to sign, worst case scenario is that the tx will be rejected at validation time.


Since all nodes should have a copy of this file built from the codebase, it is a requirement that this file be created using the existing protobuf definition files. Ideally, this should be done on every new SDK release through a CI workflow. This process will require building a tool to parse protobuf files into metadata definition types.

#### Formal definition

The structure of the metadata file follows this format:

$$
M = (v, R, P)
$$

$$
R = (r0, ..., rn)
$$

$$
P = (p0, ..., pn)
$$

where:

* M is the main metadata structure.
* v is a string representing the version of the metadata structure.
* R is a sequence of type definitions ri.
* P is a sequence of module metadata pi.

Each type definition ri has the following structure:

$$ri = (id, n, D)$$

where:

* id is an integer representing the unique identifier for this type.
* n is a string representing the name of the type.
* D is the type definition, which can be one of the following:
* Primitive: for basic types like String, Int, etc.
* Composite: for struct-like types with multiple fields.
* Vec: for vector types.

Each module metadata pi has the following structure:

$$pi = (n, M)$$

$$M = (m0, ..., mn)$$

where:

* n is a string representing the name of the module.
* M is a sequence of message definitions mi.

Each message definition mi has the following structure:

$$ mi = (n, u, T) $$

where:

* n is a string representing the name of the message.
* u is a string representing the type URL of the message.
* T is the type definition of the message, following the same structure as the type definitions in R.

#### Example

Metadata file example for the `Bank.MsgSend` message:

```json
{
"version": "1.0.0",
"modules": [
{
"name": "Bank",
"messages": [
{
"name": "MsgSend",
"typeUrl": "/cosmos.bank.v1beta1.MsgSend",
"type": {
"Struct": {
"fields": [
{
"name": "from_address",
"type": 1
},
{
"name": "to_address",
"type": 1
},
{
"name": "amount",
"type": {
"Vec": 2
}
}
]
}
}
}
]
}
],
"types": [
{
"id": 1,
"typeName": "String",
"type": {
"def": {
"Primitive": "String"
}
}
},
{
"id": 2,
"typeName": "Coin",
"type": {
"def": {
"Composite": {
"fields": [
{
"name": "denom",
"type": 1
},
{
"name": "amount",
"type": 1
}
]
}
}
}
}
]
}
```

### Metadata digest

The metadata digest is a compact representation of the complete metadata set. The hash of this digest serves as the metadata hash, which is included in the transaction's sign doc. Here's the proposed structure for the Hash type and the MetadataDigest in Go:

```go
type Hash [32]byte

type MetadataDigest struct {
TreeRootHash Hash // The Merkle root hash of the metadata tree
SpecVersion uint32 // A version number for the metadata specification
Props map[string]string // A map of properties for the metadata
}
```

The field `Props` is a map of properties for the metadata, it is used to store additional information about the metadata that are chain-specific. For example, it can be used to store the token symbols and decimals.

### Metadata merkle tree

The metadata merkle tree is a crucial component of the `SIGN_MODE_UNIFIED` design. It allows for efficient verification of metadata integrity without requiring offline signers to store the full metadata. The tree is constructed as a complete binary merkle tree using `blake3` as the hashing function.

#### Construction Process

1. **Prepare the leaves**: The initial data (leaves of the merkle tree) are the type information sorted by their unique identifiers. For `Enumeration` types, variants are sorted using their `index`.

2. **Sort the leaves**: The leaves are sorted to ensure the tree root is deterministic.

3. **Build the tree**: The tree is built from the bottom up, combining pairs of hashes to form parent nodes until a single root hash is obtained.

4. **Handle empty trees**: If there are no nodes left in the list (i.e., the initial data set was empty), an all-zeros hash `[32]byte{0}` is used to represent the empty tree.

The resulting `merkleTreeRoot` is the last node left in the list of nodes, representing the root of the entire merkle tree.

#### Purpose and Benefits

* **Efficient Verification**: The merkle tree structure allows for efficient verification of metadata integrity. Only the necessary chunks and their corresponding proofs need to be provided to offline wallets for transaction decoding.

* **Reduced Storage Requirements**: Offline signers don't need to store the full metadata. They can verify the integrity of the metadata using just the root hash and the relevant chunks.

* **Security**: The use of cryptographic hashes in the tree structure ensures the integrity of the metadata, making it tamper-evident.

### Encoding

`SIGN_MODE_UNIFIED` proposes using CBOR (Concise Binary Object Representation) for transaction encoding, offering several critical advantages:

* **Deterministic Encoding**: CBOR's canonical form ensures consistent merkleization of transaction data, which is essential for verification and proof generation.
* **Human-readable Diagnostic Notation**: While CBOR is binary, it provides a diagnostic notation format that maintains transparency and reduces blind signing risks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Human readability was a design goal (and powerful feature) of AminoJSON (and Textual). The user is presented with a literal representation of the tx bytes which are being signed over. From https://www.rfc-editor.org/rfc/rfc8949.html

JSON: {_ "a": 1, "b": [_ 2, 3]}
CBOR: 0xbf61610161629f0203ffff

I don't think we can say that CBOR is human readable, its use in a wallet UX would be more akin to blind signing. I'm not a security expert but isn't this a bit of step backwards for verifiability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for pointing that out, true!
We are proposing CBOR because it enables metadata merkleization, a central feature of this new sign mode. While CBOR is not human-readable in its raw form, any UI can easily decode it to present the data to users. Blind signing would only occur if a wallet implements a "lazy" UI that doesn't decode the CBOR string, which would likely make such a wallet less appealing to users.

This approach doesn't represent a step backwards in terms of verifiability, because:

  • CBOR transactions can always be decoded for human readability. We rely on widely-tested codec libraries for this, just as we do with other formats.
  • Since the transaction contains the metadata's root hash, the signer can verify that the decoded transaction chunks comply with the metadata sent by the node. This provides a "first-order" verification.
  • Finally, since the sign doc contains the root hash (meaning the user is signing both tx_blob + metadata_root_hash), validators can verify that the transaction was displayed (and signed) using the consensual metadata tree, providing a "second-order" verification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are proposing CBOR because it enables metadata merkleization, a central feature of this new sign mode.

What specifically about CBOR enables metadata merkleization as opposed to say JSON? It's simply an encoding format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original question here was about human readability. Given the amount of detail that the previous spec for textual put into it human readability (or even amino JSON), I find "CBOR transactions can always be decoded for human readability" to be a pretty underwhelming answer.

* **Type Safety**: CBOR's strong typing system prevents type confusion and ensures consistent cross-language interpretation.
* **Efficiency**: Offers compact binary encoding while maintaining full data model compatibility with JSON.
* **Universal Support**: Widely supported across programming languages with mature libraries.

### Signing process

The signing process steps are the following:

1. The transaction is built, including the root hash and transaction data.
2. The signer receives the transaction and starts receiving the corresponding chunks of the CBOR encoded metadata file (types and proofs).
3. The signer verifies the integrity of the received chunks by computing the root hash of the merkle tree from the chunks and comparing it against the tree root hash in the metadata digest.
4. The signer decodes the transaction fields using the received chunks and displays them to the user.
5. The user reviews the transaction and signs it.
6. Once the transaction is broadcasted, the validator compares the metadata hash included in the signed transaction against the one locally stored in the node to verify that all the messages were signed with the consensual metadata structure.



```mermaid
sequenceDiagram
participant Wallet
participant Node
participant User
participant Validator

Wallet->>Node: Request root hash
Node-->>Wallet: Send root hash
Wallet->>Wallet: Build Transaction
Wallet->>Node: Request Metadata Chunks
Node-->>Wallet: Send Metadata Chunks
Wallet->>Wallet: Verify Chunks
Wallet->>User: Display Transaction Details
User->>Wallet: Approve and Sign
Wallet->>Node: Broadcast Signed Transaction
Node->>Validator: Forward Transaction
Validator->>Validator: Verify Metadata Root Hash
Validator->>Validator: Compare with local values
```

## Decision
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were any alternatives considered? The simplest which comes to mind is a constrained JSON spec. Given that proto is the defacto IDL (it remains so in this design since the JSON metadata is derived from it) proto reflect could drive marshaling pretty easily. Downside being that clients still have a strong dependency on a proto tool chain, I think.


We will implement `SIGN_MODE_UNIFIED` as a new sign mode for the Cosmos SDK. This decision involves the following key actions:

* The implementation will include all the necessary code to build and support the new sign mode, including:
* Implementation of the MetadataDigest structure and its associated hash calculation function.
* Development of the Merkle tree construction and verification logic for the metadata.
* Integration of the new sign mode into the existing transaction signing and verification pipeline.
* A tool will be created to build the metadata file from the existing protobuf files. This tool will:
* Parse all relevant protobuf files in the Cosmos SDK.
* Generate a structured JSON metadata file that includes all type and message definitions.
* A CI job will be implemented to automatically run this metadata generation tool.

* The transaction signing process will be updated to include the MetadataDigest hash in the sign doc.
* The Ledger HW signing process will be accommodated to match this new sign mode.
* The transaction verification process will be modified to validate the metadata hash against the locally stored metadata on validator nodes.
* Comprehensive documentation for `SIGN_MODE_UNIFIED` will be created, including guides for wallet developers and chain developers on how to implement and use the new sign mode.
* A migration strategy will be implemented to allow for a (progressive) smooth transition from existing sign modes to `SIGN_MODE_UNIFIED`.
* Finally, all legacy sign modes (`SIGN_MODE_LEGACY_AMINO`, `SIGN_MODE_DIRECT`, `SIGN_MODE_TEXTUAL`) will be deprecated and removed from the SDK.

## Alternatives

### Encoding options

We've chosen CBOR (Concise Binary Object Representation) as our encoding format primarily because it provides deterministic encoding, allowing us to build a merkle tree with consistent root hashes across different signer implementations.

An alternative approach that prioritizes human readability would be to use plain JSON along with a specification defining "canonical" JSON encoding rules. This would achieve deterministic encoding across different signer implementations while maintaining readability. However, this approach would require all wallet implementations of this sign mode to ensure their JSON encoder complies with the canonical encoding rules, otherwise transactions would be rejected by validators. A complete suite for testing canonical JSON would be needed.

Below is a table comparing the two options:

| Feature | CBOR | JSON + SPEC |
| --------------------------- | --------------------------------- | --------------------------------------------- |
| **Data Size** | Compact binary format | Larger but optimized with canonical encoding |
| **Parsing Speed** | Faster due to binary encoding | Slower due to text processing |
| **Storage Requirements** | Lower | Higher but manageable with optimized structure|
| **Human Readability** | Not human-readable | Human-readable |
| **Ecosystem Support** | Requires specific libraries/tools | Widely supported with standardization efforts |
| **Security** | Less prone to injection attacks | Prevents injection with strict canonical rules|
| **Interoperability** | Standardized for consistent use | High due to canonical JSON specification |



## Consequences

### Backwards Compatibility

**Preservation of Existing Sign Modes:**
tac0turtle marked this conversation as resolved.
Show resolved Hide resolved
* Existing sign modes remain supported to ensure that legacy wallets and clients continue to function without interruption.

### Positive

* **Enhanced User Experience:**
* `SIGN_MODE_UNIFIED` provides a more intuitive and secure signing experience for users, especially on hardware wallets, by enabling the display of human-readable transaction details.

* **Improved Security:**
* Reduces the risk of blind signing by allowing users to verify transaction contents before signing.

* **Consistency Across Platforms:**
* Standardizes the signing process across different wallets and platforms, ensuring uniform behavior and reducing inconsistencies.

* **Reduced Maintenance Overhead:**
* Consolidates multiple sign modes into a unified approach, decreasing the complexity and maintenance efforts on the codebase.

* **Versioned Metadata:**
* The metadata will be versioned, allowing for better control and management of changes over time.

### Negative

* **Development Effort:**
* Requires significant development effort to implement the new sign mode, including updates to multiple components such as transaction builders, signers, and validators and HW wallet apps.

* **Tooling and Documentation Updates:**
* Requires updates to existing tools, documentation, and developer guides to accommodate the new sign mode, which may involve additional resources and time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a show stopper but I think it's worth mentioning that the sign doc is not human readable anymore

Copy link
Contributor Author

@raynaudoe raynaudoe Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added using plain json + spec as an alternative to cbor to keep human readability


### Neutral

* **Minimal Performance Impact:**
* The construction and verification of the merkle tree are optimized to have minimal impact on transaction processing performance.

* **New HW wallets apps:**
* The new HW wallets apps will need to be updated to support the new sign mode.


## References

https://polkadot-fellows.github.io/RFCs/approved/0078-merkleized-metadata.html
Loading