Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize memory consumption for verifier subprocess #16201

Merged

Conversation

dkijania
Copy link
Member

@dkijania dkijania commented Oct 9, 2024

Implementation of https://www.notion.so/o1labs/RAM-savvy-Mina-node-114e79b1f91080968c47e815124a6279 regarding verifier/prover key duplication. Removed duplicated key generation in verifier process and reusing prover verification key (through GRPC) in verifier. Steps are pretty simple:

  • Start prover process
  • Let prover process generate the proving key and return the verification key back to main thread (via an RPC response)
  • Start a verifier process and initiailize Mina node with the verification key returned by the prover process

Testing:

Apart from full suite of integration tests i did tests locally (also in order to generate memory usage comparison before and after changed). For this purpose I used single node (seed without bp keys). Bootstrapped it and observed memory usage for mina and both sub-processes (verifier, prover). Before my changes prover and verifier were using proving keys in a lazy way. Currently prover works more eagerly, that's why mina takes up to 2 GB when bootstrapping, then GC frees memory leaving on only:

  • 0.8 GB for prover
  • 0.3 GB for main mina process
  • 0.2 GB of verifier

To compare previously memory usage distribution was similar to:

  • 0.2 GB for prover
  • 0.3 GB for main mina process
  • 0.2 GB of verifier

Before first usage of prover/verifier cryptographic material

Also what is worth mentioning that due to passing verification key from prover to verifier we gained ~ 15 seconds in bootstrap speed. In order to reach ~1 second we need to refactor :

Pickles.Side_loaded.srs_precomputation () ;

And bake those computation into rust binary. As Matthew mentioned , we can compute it at compile time and completely remove 10s form verifier boatload leaving only 1 sec.


Another test was to use local mina network in order to simulate small network method usage. I used network of:

  • 1 seed
  • 2 whales bp
  • 2 fish bp
  • 1 passive node
  • snark coordinator with 2 snark workers

Below some raw data of memory usage:
metrics_new.csv
metrics_compatible.csv

And some graphs:

image

Above graph show that verifiers processes are taking up to 0.9 GB after they are forced to be evaluated. Mina processes are taking around ~250 to 300 Mb of residential memory size. Only two provers (for mina whales) are taking up to 2.0 GB as they are asked to init calculation for produced blocks.

Laser focus on verifiers:

image

After optimization:

image

Here we can observe quite big spike of memory for seed (up to 1.7 GB) as it was first node to calculate proof data. Provers are acting quite the same as before change for active block producers. Inactive bp (seeds,fishes,passive nodes are taking up to 0.9 as they eagerly loaded proof data). This behavior could be tweak as well and as a result we could skip launching provers for non-block producers (seeds, archive node feeders) save ~ 1 GB of memory
The major change is that more main and verifier processes are taking around 250 MB.

More detailed look at verifiers processes:

image

And finally verifiers memory comparison:

image

@@ -960,12 +960,30 @@ module For_tests = struct
let consensus_constants = precomputed_values.consensus_constants
end

module T = Transaction_snark.Make (struct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it is located in the For_unit_tests module, I suspect it can be initialized even by the production code if it's somehow referenced by some function somewhere.

I'm not sure if making the module is an expensive operation, but I'd suggest:

  1. Making it singleton for usage in unit tests (this would also reduce the code duplication)
  2. Measure the time to initialize it, and if it's non-negligble, surround with some laziness

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously initialization would happen from within a separate process, only when it's launched. Now it will happen if the module that contains this is somehow referenced by some code that is invoked in initialization, and it's possible that this is executed every time.

Once again, I'm not sure if making a functor is expensive itself or it is cheap, just generating a bunch of lazy values.

src/lib/verifier/prod.ml Outdated Show resolved Hide resolved
Lazy.force B.Proof.verification_key )

let get_blockchain_verification_key () = Lazy.force vk
let get_blockchain_verification_key () = Deferred.return verification_key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could just remove this method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes, forgot to do it

opam.export Outdated
@@ -21,6 +21,7 @@ roots: [
"js_of_ocaml-toplevel.4.0.0"
"lmdb.1.0"
"menhir.20210419"
"memtrace_viewer.0.16.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this dependency used somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only locally for memory trace. i will remove it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still unsure if this is purely for tests purposes. Maybe we should also reuse part of this method in prover ? then there will be a single place where we create BlockhainSnark and TransactionSnark modules. Prover bootstrap code is anyhow executed in mina main thread.

@@ -56,6 +56,9 @@ module Worker_state = struct
val toggle_internal_tracing : bool -> unit

val set_itn_logger_data : daemon_port:int -> unit

val get_blockchain_verification_key :
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder also what to do in Check | No_check modes. Where we return dummy keys. Maybe this method should return Pickles.Verification_key.t Deferred.t but then in Mina_grahpql modules we are exposing Verifiaction key from verifier so we would need to either break the interface and also there wrap repsone in optional or handle returning diummy as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update graphql module to use key that is returned from prover. It's okay if it's dummy in case of non-full modes.

It's non-ideal that we create a dummy prover that returns dummy keys. I'd prefer it to be refactored to not initialize prover unless it's full mode. But let's save this idea for one of next PRs, not this one.

@dkijania dkijania force-pushed the dkijania/optimize_memory_consumption_on_bootstrap branch from dd5ef45 to 081b26d Compare October 10, 2024 19:24
@dkijania dkijania force-pushed the dkijania/optimize_memory_consumption_on_bootstrap branch from 081b26d to 5a5884d Compare October 14, 2024 13:56
@dkijania dkijania changed the title [Do Not Merge] optimize memory consumption on bootstrap optimize memory consumption on bootstrap for verifier Oct 14, 2024
@dkijania
Copy link
Member Author

!ci-build-me

@dkijania
Copy link
Member Author

!ci-nightly-me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still unsure if this is purely for tests purposes. Maybe we should also reuse part of this method in prover ? then there will be a single place where we create BlockhainSnark and TransactionSnark modules. Prover bootstrap code is anyhow executed in mina main thread.

@dkijania
Copy link
Member Author

!ci-build-me

@dkijania
Copy link
Member Author

!ci-nightly-me

@dkijania dkijania marked this pull request as ready for review October 14, 2024 20:41
@dkijania dkijania requested review from a team as code owners October 14, 2024 20:41
@dkijania dkijania changed the title optimize memory consumption on bootstrap for verifier optimize memory consumption on bootstrap for verifier subprocess Oct 14, 2024
@dkijania dkijania changed the title optimize memory consumption on bootstrap for verifier subprocess optimize memory consumption for verifier subprocess Oct 14, 2024
; commit_id : string
; verification_key : Pickles.Verification_key.Stable.Latest.t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename to blockchain_verification_key here and elsewhere.

match Or_error.try_with (fun () -> T.verify ts) with
match
Or_error.try_with (fun () ->
Transaction_snark.verify ts ~key:verification_key )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. When we verify a transaction snark proof, we need to use the transaction snark verification key, not the blockchain snark verification key. You'll need to pipe that through.

@dkijania
Copy link
Member Author

!ci-build-me

@dkijania
Copy link
Member Author

!ci-build-me

@dkijania
Copy link
Member Author

!ci-nightly-me

…tion_on_bootstrap' into dkijania/optimize_memory_consumption_on_bootstrap
@@ -0,0 +1,165 @@
import collections
import os
import re
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's introduce this script as a separate PR, it's not directly related to this PR

src/lib/verifier/for_test.mli Outdated Show resolved Hide resolved
src/lib/verifier/for_test.mli Outdated Show resolved Hide resolved
@@ -0,0 +1,12 @@
val get_verification_keys :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove get_verification_keys from for_test.mli, it's never used

@@ -191,7 +189,7 @@ module Worker_state = struct
result

let get_blockchain_verification_key () =
Lazy.force B.Proof.verification_key
Deferred.return blockchain_verification_key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove get_blockchain_verification_key from verifier and its interface, it's never used

let get_blockchain_verification_key { verification_key; _ } =
Deferred.Or_error.try_with ~here:[%here] (fun () ->
Lazy.force verification_key )
let get_blockchain_verification_key { blockchain_verification_key; _ } =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this

@@ -56,6 +56,9 @@ module Worker_state = struct
val toggle_internal_tracing : bool -> unit

val set_itn_logger_data : daemon_port:int -> unit

val get_blockchain_verification_key :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update graphql module to use key that is returned from prover. It's okay if it's dummy in case of non-full modes.

It's non-ideal that we create a dummy prover that returns dummy keys. I'd prefer it to be refactored to not initialize prover unless it's full mode. But let's save this idea for one of next PRs, not this one.

Base automatically changed from georgeee/correct-snark-verify-functions to compatible October 21, 2024 11:56
@dkijania
Copy link
Member Author

!ci-build-me

@dkijania
Copy link
Member Author

!ci-build-me

@dkijania
Copy link
Member Author

!ci-build-me

@dkijania
Copy link
Member Author

!ci-nightly-me

@coveralls
Copy link

coveralls commented Oct 22, 2024

Coverage Status

coverage: 60.999% (+0.01%) from 60.989%
when pulling cabd22b on dkijania/optimize_memory_consumption_on_bootstrap
into 4f78a48 on compatible.

@dkijania
Copy link
Member Author

!ci-build-me

@dkijania
Copy link
Member Author

@dkijania dkijania requested review from georgeee and mrmr1993 October 23, 2024 12:22
Copy link
Member

@georgeee georgeee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with the last suggestion to simplify code

-> proof_level:Genesis_constants.Proof_level.t
-> ( [ `Blockchain of Pickles.Verification_key.t ]
* [ `Transaction of Pickles.Verification_key.t ] )
Async.Deferred.t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for nit-picking last moment, but only now I noticed Async.Deferred.t at the end which is completely unnecessary here (we just return at the end).

Also, let's add a helper function for the following code:

let ( `Blockchain blockchain_verification_key
                 , `Transaction transaction_verification_key ) =
          Verifier.For_test.get_verification_keys_eagerly ~constraint_constants
            ~proof_level
        in
        Verifier.create ~logger ~proof_level ~conf_dir:None
          ~pids:(Child_processes.Termination.create_pid_table ())
          ~commit_id:"not specified for unit tests" ()
          ~blockchain_verification_key ~transaction_verification_key )

The code is repeated many times, and helper would make the code more readable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! will do

let open Async.Deferred.Let_syntax in
let%bind blockchain_vk = Lazy.force B.Proof.verification_key in
let%bind transaction_vk = Lazy.force T.verification_key in
return (`Blockchain blockchain_vk, `Transaction transaction_vk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (`Blockchain blockchain_vk, `Transaction transaction_vk)
(`Blockchain blockchain_vk, `Transaction transaction_vk)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, compiler is not happy when i removed return.

@georgeee georgeee dismissed mrmr1993’s stale review October 25, 2024 09:19

Matthew's concern was covered in the follow-up commits

@dkijania dkijania self-assigned this Oct 30, 2024
@dkijania
Copy link
Member Author

!ci-build-me

@dkijania
Copy link
Member Author

!ci-nightly-me

@dkijania dkijania merged commit 4970213 into compatible Nov 7, 2024
65 checks passed
@dkijania dkijania deleted the dkijania/optimize_memory_consumption_on_bootstrap branch November 7, 2024 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants