Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write an ADR about network properties and etcd #1852

Merged
merged 2 commits into from
Feb 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/adr/2022-03-28_017-udp-networking.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ slug: 17
title: |
17. Use UDP protocol for Hydra networking
authors: []
tags: [Proposed]
tags: [Superseded]
---

## Status

Proposed
Superseded (as never implemented) by [ADR 32](/adr/32)

## Context

Expand Down
4 changes: 2 additions & 2 deletions docs/adr/2023-09-09_027-network-resilience.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@ slug: 27
title: |
27. Network failures model
authors: [abailly, pgrange]
tags: [Accepted]
tags: [Superseded]
---

## Status

Accepted
Superseded by [ADR 32](/adr/32)

## Context

Expand Down
Binary file added docs/adr/2024-09-19-etcd-network-draft.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
74 changes: 74 additions & 0 deletions docs/adr/2025-02-12_032-network-properties-etcd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
slug: 32
title: |
32. Network layer properties, implementation using etcd
authors: [ch1bo]
tags: [Accepted]
---

## Status

Accepted

## Context

- The communication primitive of `broadcast` is introduced in [ADR 6](/adr/6). The original protocol design in the [paper](https://eprint.iacr.org/2020/299.pdf) and that ADR implicitly assume a **reliable broadcast**.

- [ADR 27](/adr/27) further specifies that the `hydra-node` should be tolerant to the _fail-recovery_ failure model, and takes the decision to implement a _reliable broadcast_ by persisting outgoing messages and using a _vector clock_ and heartbeat mechanism, over a dumb transport layer.
- The current transport layer in use is a simple _FireForget_ protocol over TCP connections implemented using `ouroboros-framework`.
- [ADR 17](/adr/17) proposed to use UDP instead
- Either this design or its implementation was discovered to be wrong, because this system did not survive fault injection tests with moderate package drops.

- This [research paper](https://arxiv.org/pdf/1707.01873) explored various consensus protocols used in blockchain space and reminds us of the correspondence between consensus and broadcasts:

> the form of consensus relevant for blockchain is technically known as atomic broadcast

It also states that (back then):

> The most important and most prominent way to implement atomic broadcast (i.e., consensus) in distributed systems prone to t < n/2 node crashes is the family of protocols known today as Paxos and Viewstamped Replication (VSR).

## Decision

- We realize that the way the off-chain protocol is specified in the paper, the `broadcast` abstraction required from the `Network` interface is a so-called _uniform reliable broadcast_. Hence, any implementation of `Network` needs to satisfy the following **properties**:

1. **Validity**: If a correct process p broadcasts a message m, then p eventually delivers m.
2. **No duplication**: No message is delivered more than once.
3. **No creation**: If a process delivers a message m with sender s, then m was previously broadcast by process s.
4. **Agreement**: If a message m is delivered by some correct process, then m is eventually delivered by every correct process.

See also Module 3.3 in [Introduction to Reliable and Secure Distributed Programming](https://www.distributedprogramming.net) by Cachin et al, or [Self-stabilizing Uniform Reliable Broadcast by Oskar Lundström](https://arxiv.org/abs/2001.03244)

- Use [`etcd`](https://etcd.io/) as a proxy to achieve reliable broadcast via its [raft](https://raft.github.io/) consensus
- Raft is an evolution of Paxos and similar to VSR
- Over-satisfies requirements as it provides "Uniform total order" (satisfies [atomic broadcast](https://en.m.wikipedia.org/wiki/Atomic_broadcast) properties)
- Each `hydra-node` runs a `etcd` instance to realize its `Network` interface
- See the following architecture diagram which also contains some notes on `Network` interface properties:

![](./2024-09-19-etcd-network-draft.jpg)

- We supersede [ADR 17](/adr/17) and [ADR 27](/adr/27) decisions on how to implement `Network` with the current ADR.
- Drop existing implementation using `Reliability` layer for now
- Could be revisited, as in theory it would satisfy properties if implemented correctly?
- Uniform reliable broadcast = only deliver when seen by everyone = not what we had implemented?

## Consequences

- Crash tolerance of up to `n/2` failing nodes

- Using `etcd` as-is adds a run-time dependency onto that binary.
- Docker image users should not see any different UX

- Introspectability network as the `etcd` cluster is queriable could improve debugging experience

- Persisted state for networking changes as there will be no `acks`, but the `etcd` Write Ahead Log (WAL) and a last seen revision.

- Can keep same user experience on configuration
- Full, static topology with listing everone as `--peer`
- Simpler configuration via [peer discovery](https://etcd.io/docs/v3.5/op-guide/clustering/#discovery) possible

- `PeerConnected` semantics needs to change to an overall `HydraNetworkConnected`
- We can only submit / receive messages when connected to the majority cluster

- `etcd` has a few features out-of-the-box we could lean into, e.g.
- use TLS to secure peer connections
- separate advertised and binding addresses