-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(tee_verifier): correctly initialize storage for re-execution #3017
Conversation
c4b2ee1
to
b0f0ab0
Compare
ef9a41d
to
3f93bdd
Compare
3f93bdd
to
71e29f8
Compare
Previously the storage for VM re-execution was initialized just from `WitnessInputMerklePaths`. This although misses the storage values for slots, which are only read/written to by rolled back transactions. With this commit, the TEE verifier uses `WitnessStorageState` of `VMRunWitnessInputData` to initialize the storage. This requires waiting for the BasicWitnessInputProducer to complete and therefore the TEE verifier input producer can be removed. The input for the TEE verifier is now collected in the `proof_data_handler`, which enables to remove the whole job queue for the TEE verifier input producer. Co-authored-by: Patrick Beza <[email protected]> Signed-off-by: Harald Hoyer <[email protected]>
71e29f8
to
e241949
Compare
simplify empty `tee_proofs` case, but pre-filter with `tee_type` to exclude other TEE techs. Signed-off-by: Harald Hoyer <[email protected]>
Signed-off-by: Harald Hoyer <[email protected]>
Signed-off-by: Harald Hoyer <[email protected]>
correct leftovers from debug/merge Signed-off-by: Harald Hoyer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left my concerns on the current implementation. If there's a doc for me to understand what's going on, that'd be great, could provide better input. Otherwise, maybe writing it down (maybe as part of PR?) would help moving this forward.
Signed-off-by: Harald Hoyer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably remove this function altogether:
zksync-era/core/lib/dal/src/tee_proof_generation_dal.rs
Lines 199 to 229 in 3a97168
pub async fn insert_tee_proof_generation_job( | |
&mut self, | |
batch_number: L1BatchNumber, | |
tee_type: TeeType, | |
) -> DalResult<()> { | |
let batch_number = i64::from(batch_number.0); | |
let query = sqlx::query!( | |
r#" | |
INSERT INTO | |
tee_proof_generation_details ( | |
l1_batch_number, tee_type, status, created_at, updated_at | |
) | |
VALUES | |
($1, $2, $3, NOW(), NOW()) | |
ON CONFLICT (l1_batch_number, tee_type) DO NOTHING | |
"#, | |
batch_number, | |
tee_type.to_string(), | |
TeeProofGenerationJobStatus::Unpicked.to_string(), | |
); | |
let instrumentation = Instrumented::new("insert_tee_proof_generation_job") | |
.with_arg("l1_batch_number", &batch_number) | |
.with_arg("tee_type", &tee_type); | |
instrumentation | |
.clone() | |
.with(query) | |
.execute(self.storage) | |
.await?; | |
Ok(()) | |
} |
You are now inserting new entries in the lock_batch_for_proving
function instead:
zksync-era/core/lib/dal/src/tee_proof_generation_dal.rs
Lines 78 to 104 in 3a97168
INSERT INTO | |
tee_proof_generation_details ( | |
l1_batch_number, tee_type, status, created_at, updated_at, prover_taken_at | |
) | |
SELECT | |
l1_batch_number, | |
$1, | |
$2, | |
NOW(), | |
NOW(), | |
NOW() | |
FROM | |
upsert | |
ON CONFLICT (l1_batch_number, tee_type) DO | |
UPDATE | |
SET | |
status = $2, | |
updated_at = NOW(), | |
prover_taken_at = NOW() | |
RETURNING | |
l1_batch_number | |
"#, | |
tee_type.to_string(), | |
TeeProofGenerationJobStatus::PickedByProver.to_string(), | |
TeeProofGenerationJobStatus::Unpicked.to_string(), | |
processing_timeout, | |
min_batch_number |
Update
You may consider merging my PR (#3037) that is addressing the above mentioned issue.
This makes handling large tables a lot more performant Signed-off-by: Harald Hoyer <[email protected]>
Signed-off-by: Harald Hoyer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rubber stamping, as long as @slowli's comments have been addressed, good with me.
Signed-off-by: Harald Hoyer <[email protected]>
c773c04
to
cb03ac8
Compare
I think |
Signed-off-by: Harald Hoyer <[email protected]>
## What ❔ With this commit, the TEE verifier uses `WitnessStorageState` of `VMRunWitnessInputData` to initialize the storage. This requires waiting for the BasicWitnessInputProducer to complete and therefore the TEE verifier input producer can be removed. The input for the TEE verifier is now collected in the `proof_data_handler`, which enables to remove the whole job queue for the TEE verifier input producer. ## Why ❔ Previously the storage for VM re-execution was initialized just from `WitnessInputMerklePaths`. This although misses the storage values for slots, which are only read/written to by rolled back transactions. This led to failed verification of blocks, which would normally pass. ## Checklist <!-- Check your PR fulfills the following items. --> <!-- For draft PRs check the boxes as you complete them. --> - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [ ] Documentation comments have been added / updated. - [x] Code has been formatted via `zk_supervisor fmt` and `zk_supervisor lint`. --------- Signed-off-by: Harald Hoyer <[email protected]> Co-authored-by: Patrick Beza <[email protected]>
🤖 I have created a release *beep* *boop* --- ## [25.0.0](core-v24.29.0...core-v25.0.0) (2024-10-23) ### ⚠ BREAKING CHANGES * **contracts:** integrate protocol defense changes ([#2737](#2737)) ### Features * Add CoinMarketCap external API ([#2971](#2971)) ([c1cb30e](c1cb30e)) * **api:** Implement eth_maxPriorityFeePerGas ([#3135](#3135)) ([35e84cc](35e84cc)) * **api:** Make acceptable values cache lag configurable ([#3028](#3028)) ([6747529](6747529)) * **contracts:** integrate protocol defense changes ([#2737](#2737)) ([c60a348](c60a348)) * **external-node:** save protocol version before opening a batch ([#3136](#3136)) ([d6de4f4](d6de4f4)) * Prover e2e test ([#2975](#2975)) ([0edd796](0edd796)) * **prover:** Add min_provers and dry_run features. Improve metrics and test. ([#3129](#3129)) ([7c28964](7c28964)) * **tee_verifier:** speedup SQL query for new jobs ([#3133](#3133)) ([30ceee8](30ceee8)) * vm2 tracers can access storage ([#3114](#3114)) ([e466b52](e466b52)) * **vm:** Return compressed bytecodes from `push_transaction()` ([#3126](#3126)) ([37f209f](37f209f)) ### Bug Fixes * **call_tracer:** Flat call tracer fixes for blocks ([#3095](#3095)) ([30ddb29](30ddb29)) * **consensus:** preventing config update reverts ([#3148](#3148)) ([caee55f](caee55f)) * **en:** Return `SyncState` health check ([#3142](#3142)) ([abeee81](abeee81)) * **external-node:** delete empty unsealed batch on EN initialization ([#3125](#3125)) ([5d5214b](5d5214b)) * Fix counter metric type to be Counter. ([#3153](#3153)) ([08a3fe7](08a3fe7)) * **mempool:** minor mempool improvements ([#3113](#3113)) ([cd16083](cd16083)) * **prover:** Run for zero queue to allow scaling down to 0 ([#3115](#3115)) ([bbe1919](bbe1919)) * restore instruction count functionality ([#3081](#3081)) ([6159f75](6159f75)) * **state-keeper:** save call trace for upgrade txs ([#3132](#3132)) ([e1c363f](e1c363f)) * **tee_prover:** add zstd compression ([#3144](#3144)) ([7241ae1](7241ae1)) * **tee_verifier:** correctly initialize storage for re-execution ([#3017](#3017)) ([9d88373](9d88373)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: zksync-era-bot <[email protected]>
🤖 I have created a release *beep* *boop* --- ## [16.6.0](prover-v16.5.0...prover-v16.6.0) (2024-10-31) ### Features * (DB migration) Rename recursion_scheduler_level_vk_hash to snark_wrapper_vk_hash ([#2809](#2809)) ([64f9551](64f9551)) * Add initial version prover_autoscaler ([#2993](#2993)) ([ebf9604](ebf9604)) * added seed_peers to consensus global config ([#2920](#2920)) ([e9d1d90](e9d1d90)) * attester committees data extractor (BFT-434) ([#2684](#2684)) ([92dde03](92dde03)) * Bump crypto and protocol deps ([#2825](#2825)) ([a5ffaf1](a5ffaf1)) * **circuit_prover:** Add circuit prover ([#2908](#2908)) ([48317e6](48317e6)) * **consensus:** Support for syncing blocks before consensus genesis over p2p network ([#3040](#3040)) ([d3edc3d](d3edc3d)) * **da-clients:** add secrets ([#2954](#2954)) ([f4631e4](f4631e4)) * gateway preparation ([#3006](#3006)) ([16f2757](16f2757)) * Integrate tracers and implement circuits tracer in vm2 ([#2653](#2653)) ([87b02e3](87b02e3)) * Move prover data to /home/popzxc/workspace/current/zksync-era/prover/data ([#2778](#2778)) ([62e4d46](62e4d46)) * Prover e2e test ([#2975](#2975)) ([0edd796](0edd796)) * **prover:** add CLI option to run prover with max allocation ([#2794](#2794)) ([35e4cae](35e4cae)) * **prover:** Add endpoint to PJM to get queue reports ([#2918](#2918)) ([2cec83f](2cec83f)) * **prover:** Add error to panic message of prover ([#2807](#2807)) ([6e057eb](6e057eb)) * **prover:** Add min_provers and dry_run features. Improve metrics and test. ([#3129](#3129)) ([7c28964](7c28964)) * **prover:** Add scale failure events watching and pods eviction. ([#3175](#3175)) ([dd166f8](dd166f8)) * **prover:** Add sending scale requests for Scaler targets ([#3194](#3194)) ([767c5bc](767c5bc)) * **prover:** Add support for scaling WGs and compressor ([#3179](#3179)) ([c41db9e](c41db9e)) * **prover:** Autoscaler sends scale request to appropriate agents. ([#3150](#3150)) ([bfedac0](bfedac0)) * **prover:** Extract keystore into a separate crate ([#2797](#2797)) ([e239260](e239260)) * **prover:** Optimize setup keys loading ([#2847](#2847)) ([19887ef](19887ef)) * **prover:** Refactor WitnessGenerator ([#2845](#2845)) ([934634b](934634b)) * **prover:** Update witness generator to zkevm_test_harness 0.150.6 ([#3029](#3029)) ([2151c28](2151c28)) * **prover:** Use query macro instead string literals for queries ([#2930](#2930)) ([1cf959d](1cf959d)) * **prover:** WG refactoring [#3](#3) ([#2942](#2942)) ([df68762](df68762)) * **prover:** WitnessGenerator refactoring [#2](#2) ([#2899](#2899)) ([36e5340](36e5340)) * Refactor metrics/make API use binaries ([#2735](#2735)) ([8ed086a](8ed086a)) * Remove prover db from house keeper ([#2795](#2795)) ([85b7346](85b7346)) * **tee:** use hex serialization for RPC responses ([#2887](#2887)) ([abe0440](abe0440)) * **utils:** Rework locate_workspace, introduce Workspace type ([#2830](#2830)) ([d256092](d256092)) * vm2 tracers can access storage ([#3114](#3114)) ([e466b52](e466b52)) * **vm:** Do not panic on VM divergence ([#2705](#2705)) ([7aa5721](7aa5721)) * **vm:** EVM emulator support – base ([#2979](#2979)) ([deafa46](deafa46)) * **vm:** Extract batch executor to separate crate ([#2702](#2702)) ([b82dfa4](b82dfa4)) * **zk_toolbox:** `zk_supervisor prover` subcommand ([#2820](#2820)) ([3506731](3506731)) * **zk_toolbox:** Add external_node consensus support ([#2821](#2821)) ([4a10d7d](4a10d7d)) * **zk_toolbox:** Add SQL format for zk supervisor ([#2950](#2950)) ([540e5d7](540e5d7)) * **zk_toolbox:** deploy legacy bridge ([#2837](#2837)) ([93b4e08](93b4e08)) * **zk_toolbox:** Redesign zk_toolbox commands ([#3003](#3003)) ([114834f](114834f)) * **zkstack_cli:** Build dependencies at zkstack build time ([#3157](#3157)) ([724d9a9](724d9a9)) ### Bug Fixes * allow compilation under current toolchain ([#3176](#3176)) ([89eadd3](89eadd3)) * **api:** Return correct flat call tracer ([#2917](#2917)) ([218646a](218646a)) * count SECP256 precompile to account validation gas limit as well ([#2859](#2859)) ([fee0c2a](fee0c2a)) * Fix Doc lint. ([#3158](#3158)) ([c79949b](c79949b)) * ignore unknown fields in rpc json response ([#2962](#2962)) ([692ea73](692ea73)) * **prover:** Do not exit on missing watcher data. ([#3119](#3119)) ([76ed6d9](76ed6d9)) * **prover:** fix setup_metadata_to_setup_data_key ([#2875](#2875)) ([4ae5a93](4ae5a93)) * **prover:** Run for zero queue to allow scaling down to 0 ([#3115](#3115)) ([bbe1919](bbe1919)) * **tee_verifier:** correctly initialize storage for re-execution ([#3017](#3017)) ([9d88373](9d88373)) * **vm:** Prepare new VM for use in API server and fix divergences ([#2994](#2994)) ([741b77e](741b77e)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
What ❔
With this commit, the TEE verifier uses
WitnessStorageState
ofVMRunWitnessInputData
to initialize the storage. This requires waiting for the BasicWitnessInputProducer to complete and therefore the TEE verifier input producer can be removed. The input for the TEE verifier is now collected in theproof_data_handler
, which enables to remove the whole job queue for the TEE verifier input producer.Why ❔
Previously the storage for VM re-execution was initialized just from
WitnessInputMerklePaths
. This although misses the storage values for slots, which are only read/written to by rolled back transactions. This led to failed verification of blocks, which would normally pass.Checklist
zk_supervisor fmt
andzk_supervisor lint
.