-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Added a JSON RPC to simulating L1 for consensus attestation (BFT-495) #2480
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
aakoshh
reviewed
Jul 24, 2024
aakoshh
previously approved these changes
Jul 25, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, the new comments explaining the fallbacks are very helpful 👍
RomanBrodetski
previously approved these changes
Jul 26, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one of the unit tests is failing though
RomanBrodetski
approved these changes
Jul 26, 2024
pompon0
changed the title
feat: Added a JSON RPC to simulating L1 for consensus attestation
feat: Added a JSON RPC to simulating L1 for consensus attestation (BFT-495)
Jul 26, 2024
8 tasks
aakoshh
added a commit
to matter-labs/era-consensus
that referenced
this pull request
Jul 31, 2024
## What ❔ Poll the main node for which batch height to vote on. - [x] Return an `Option` rather than a `Vec` of `BatchQC` from `BatchVotes::find_quorum`: we decided not to implement voting on multiple heights for now, which makes it just confusing that a vector is returned. - [x] Create an `AttestationStatusClient` trait with a method to poll the next `BatchNumber` to vote on, which can be injected as a `dyn` dependency - [x] Create an `AttestationStatusWatch` that contains the next `BatchNumber` to vote on; this is updated by a single background task and listened to by multiple subscribers - [x] Create an `AttestationStatusRunner` which is supposed to be started along with the `BatchStore` in `zksync-era` and polls the client; the point of this is to push out the poll interval configuration to the edge and make the `AttestationStatusWatch` the integration point - [x] Change `AttesterRunner` to wait for the `AttestationStatusWatch` to change, then the payload to appear, then cast the vote; it doesn't need to worry about reverts because the node will restart if that happens - [x] Change `run_batch_qc_finder` to wait for the `AttestationStatusWatch` to change, then wait for the next available QC, and save it; this might produce gaps which is normal on the external nodes, but undersireable on the main node - on the main node it only produces gaps if the attesters vote on higher heights despite the main node telling them not to, which if they are majority honest should not happen, and if they are not then what can we do anyway - [x] Rename `PersistentBatchStore::earliest_batch_number_to_sign` to `next_batch_to_attest` in accordance with the new method on `consensus_dal`. - [ ] ~~Initialise `BatchNumber::min_batch_number` to by asking the main node where to resume from~~ This initialisation isn't necessary because 1) we decided that we'll only allow 1 vote per attester, so there is no attack vector of casting votes from batch 0 and 2) `run_batch_qc_finder` first waits for the API status to appear and then prunes all preceding data, so an older QC will not be saved. It was also undesirable to have to initialise from an API that might return nothing (or errors) for an unknown amount of time. ### Poll interval The `AttesterStatusRunner` polls the API at fixed intervals without any exponential backoff. I thought 5s would be a reasonable default value. With 60s batches this seems fine because the *next* batch to sign will be signalled as soon as the current batch QC is inserted into the database, which is well ahead of time of when the next batch and its commitment will be available. If we think we'll need to catch up with historical batch signatures it might be a bit slow. We generally still have to account for gossiping around the votes though, which in a large enough network could be on the order of several seconds as well. ### Potential deadlock There is a pathological sequence of events that could prevent the feature from starting on a fresh database with no prior batch QCs: 1. Say we start from batch 0, and a QC is gathered, but saving it to the database (which is an async process) takes a long time on the main node. 2. Say it takes so long that batch 1 and batch 2 are created, but none of them have a QC yet. 3. Say we have two attesters A and B, and A polled the API when it showed batch 2, and the main node set its minimum batch number in the vote registry to batch 2 as well. 4. Now the QC for batch 0 is saved to the database and the API indicates the next one to vote on is batch 1. 5. Attester B casts its vote on batch 1 but it goes ignored by the main node vote registry because its looking for a quorum with at least batch number 2. 6. Having missed an attestation the main node will never save a QC and never move on until it's restarted. Note that even if the registry minimum batch number was adjusted _down_ that might happen _after_ the vote for batch 1 has already been discarded, and because new votes are only pushed once through gossip there is no guarantee that it will get it again. The solution is coming in the form of fixing the starting point of gossip to be where the genesis starts, even if that means filling a potentially long history of batches in the beginning. See matter-labs/zksync-era#2539 ## Why ❔ We want to collect batch QCs without gaps, and for now put the main node in charge of what to vote on. The API to serve this information is in matter-labs/zksync-era#2480 and this PR is the follow up to make polling that information part of the signing process.
6 tasks
github-merge-queue bot
pushed a commit
that referenced
this pull request
Aug 1, 2024
🤖 I have created a release *beep* *boop* --- ## [24.13.0](core-v24.12.0...core-v24.13.0) (2024-07-31) ### Features * Add recovery tests to zk_supervisor ([#2444](#2444)) ([0c0d10a](0c0d10a)) * Added a JSON RPC to simulating L1 for consensus attestation ([#2480](#2480)) ([c6b3adf](c6b3adf)) * added dropping all attester certificates when doing hard fork ([#2529](#2529)) ([5acd686](5acd686)) * **configs:** Do not panic if config is only partially filled ([#2545](#2545)) ([db13fe3](db13fe3)) * **eth-sender:** Make eth-sender tests use blob txs + refactor of eth-sender tests ([#2316](#2316)) ([c8c8334](c8c8334)) * Introduce more tracing instrumentation ([#2523](#2523)) ([79d407a](79d407a)) * Remove unused VKs, add docs for BWG ([#2468](#2468)) ([2fa6bf0](2fa6bf0)) * Revisit base config values ([#2532](#2532)) ([3fac8ac](3fac8ac)) * Server 10k gwei limit on gas price and 1M limit on pubdata price ([#2460](#2460)) ([be238cc](be238cc)) * **vlog:** Implement otlp guard with force flush on drop ([#2536](#2536)) ([c9f76e5](c9f76e5)) * **vlog:** New vlog interface + opentelemtry improvements ([#2472](#2472)) ([c0815cd](c0815cd)) * **zk_toolbox:** add test upgrade subcommand to zk_toolbox ([#2515](#2515)) ([1a12f5f](1a12f5f)) * **zk_toolbox:** use configs from the main repo ([#2470](#2470)) ([4222d13](4222d13)) ### Bug Fixes * **contract verifier:** Fix config values ([#2510](#2510)) ([3729468](3729468)) * fixed panic propagation ([#2525](#2525)) ([e0fc58b](e0fc58b)) * **proof_data_handler:** Unlock jobs on transient errors ([#2486](#2486)) ([7c336b1](7c336b1)) * **prover:** Parallelize circuit metadata uploading for BWG ([#2520](#2520)) ([f49720f](f49720f)) * VM performance diff: don't show 0 as N/A ([#2276](#2276)) ([2fa2249](2fa2249)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: zksync-era-bot <[email protected]>
github-merge-queue bot
pushed a commit
that referenced
this pull request
Aug 1, 2024
## What ❔ Injects an `AttestationStatusWatch` into the `Executor` which on the main node is backed by an `AttestationStatusRunner` polling the `BatchStore::next_batch_to_attest` and external nodes call the `MainNodeApi::fetch_attestation_status`. TODO: - [x] Rebase after #2539 is merged - [x] Update the `era-consensus` dependency once matter-labs/era-consensus#161 is merged and `0.1.0-rc.5` is published ### Test failure - pruning the main node The following test never finished: ```shell zk test rust -p zksync_node_consensus tests::test_with_pruning::case_0 --no-capture ``` A little extra logging revealed that the `AttesterStatusRunner` never gets initialised, because the blocks get pruned earlier than it could read the result, ie. `ConsensusDal::batch_of_block(genesis.first_block)` probably returns `None` because it's not found, and never will because it's pruned. We discussed in #2480 that the main node is not supposed to be pruned, which is why the SQL that looks for what to attest on doesn't look for pruning records any more. Yet this test prunes the main node, and if I just try to prune the external node instead it panics because it doesn't have that block yet. Either the SQL should take pruning into account after all, or we have to figure out a way to wait in the test until the main node executor is running, or change the test to prune the external node; I did the latter. ## Why ❔ This is coordinating attesters to cast their votes on the batch which the main node tells them to do to produce a quorum certificate for all L1 batches without gaps. ## Checklist <!-- Check your PR fulfills the following items. --> <!-- For draft PRs check the boxes as you complete them. --> - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted via `zk fmt` and `zk lint`.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It just checks what is the last L1 batch certificate stored locally and returns its number (actually, the next number to avoid problems with 0).
ENs will query the main node for this information to determine what is the number of the next batch that they should vote for.