This repository has been archived by the owner on Aug 28, 2024. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: L1 batch QC database (BFT-476) (matter-labs#2340)
## What ❔ - [x] Add an `l1_batches_consensus` table to hold [L1 batch Quorum Certificates](https://github.com/matter-labs/era-consensus/blob/177881457f392fca990dbb3df1695737d90fd0c7/node/libs/roles/src/attester/messages/batch.rs#L67) from Attesters - [x] Add attesters to the config - [x] Implement methods in `PersistentBatchStore` - [x] `persisted` - [x] `last_batch` - [x] `last_batch_qc` - [x] `get_batch` - [x] `get_batch_qc` - [x] `store_qc` - [ ] `queue_next_batch` - _not going to implement for now_ - [ ] assign `SyncBatch::proof` - _not going to implement for now_ - [x] Add tests for all new methods in `ConsensusDal` and the `PersistentBatchStore` ### Caveat Implemented the updating of `persisted` with a loop that polls the database for newly available `SyncBatch` records, even if they have no proof. This inevitably triggers the gossiping of batch statuses and the pulling of `SyncBatch` between peers. For this reason `queue_next_batch` just drop the data, since we can't do anything with it without the proof yet. Returning an error or panicking would stop the consensus tasks. I ended up disabling the `persisted` by leaving its dummy implementation in place because when enabled the full node tests keep going on forever, printing the following logs in a loop: ```console ❯ RUST_LOG=info zk test rust test_full_nodes --no-capture ... 2024-07-03T14:22:57.882784Z INFO in{addr=[::1]:53082}: zksync_consensus_network: 191: new connection 2024-07-03T14:22:57.883457Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::gossip::runner: 383: peer = node:public:ed25519:068ffa0b3fedbbe5c2a6da3defd26e0d084248f12bfe98db85f7785b0b08b63e 2024-07-03T14:22:57.883764Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::gossip::runner: 416: peer = node:public:ed25519:7710ed90aad9f5859dfba06e13fb4e6fb0fe4d686f81f9d819464ad1fdc371bd 2024-07-03T14:22:57.886204Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886280Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: message too large: max = 10240B, got 13773B 2024-07-03T14:22:57.886633Z INFO in{addr=[::1]:53082}:gossip: zksync_consensus_network::rpc: 222: canceled ... 2024-07-03T14:22:57.888143Z INFO out{addr="[::1]:52998"}:gossip: zksync_consensus_network::rpc: 222: disconnected ... 2024-07-03T14:22:57.888390Z INFO zksync_consensus_network: 216: [::1]:53082: gossip.run_inbound_stream(): push_batch_store_state.: end of stream 2024-07-03T14:22:57.888446Z INFO zksync_consensus_network: 158: gossip.run_outbound_stream("[::1]:52998"): push_batch_store_state.: end of stream ``` So in the tests the message size exceeds the maximum. I think it's [hardcoded here](https://github.com/matter-labs/era-consensus/blob/decb988eb9e1a45fd5171d2cc540a360d9ca5f1f/node/actors/network/src/gossip/runner.rs#L109). Since this functionality isn't expected to work, I think we can disable it for now. ## Why ❔ The workflow of signing and submitting L1 batch certificates will be like this: 1. Data is inserted into the `l1_batches` table. 2. If the node is one of the Attesters it picks up the batch, signs and sends it to the gossip layer via matter-labs/era-consensus#137 3. The consensus collects votes about the L1 batch, and when the threshold is reached it saves the quorum certificate into Postgres 4. The node monitors Main Node (later L1) for new batch QCs and upserts them into the database (the QC can be different than what a particular node inserted based on gossip). This way a node which has been down for a period of time can backfill any QCs it missed. It is assumed that the Main Node API only serves QCs that have no gaps following them, ie. they are final - if it was L1 it wouldn't allow submissions with gaps, and this simulates that semantic. 5. The last height that doesn't have any gaps following it is used as a floor for what needs to be (re)signed and gossiped This PR supports the above workflow up to step 3. ## Checklist <!-- Check your PR fulfills the following items. --> <!-- For draft PRs check the boxes as you complete them. --> - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted via `zk fmt` and `zk lint`. --------- Co-authored-by: Bruno França <[email protected]>
- Loading branch information