-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Consensus context with proposer index caching #3604
Conversation
This is complementary to #3263, but would likely conflict slightly syntactically. I'll also post some benchmark numbers soon (TM) |
Benchmarks indicate a 7.5-20% reduction in run time for block processing for a recent block (slot 4769389). The reduction was greatest on my Apple M1 Pro, from a median time of 38.2ms to 30.9ms. On an AMD Ryzen 5950X the reduction was from 40.8ms to 34.7ms. Full details in this spreadsheet: https://docs.google.com/spreadsheets/d/1axT3Tda3wzWXdUzPEt5rAmxWE7PdIkSta7sW5PHfdq0/edit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, I like the ConsensusContext
! I notice you didn't go with ctx
😉 I have a couple of comments, but they're not strictly necessary!
I had a lil' grep and noticed two places where we're still computing the proposer index, perhaps you're aware of them:
consensus/state_processing/src/common/slash_validator.rs: let proposer_index = state.get_beacon_proposer_index(state.slot(), spec)?;
consensus/state_processing/src/per_block_processing/process_operations.rs: let proposer_index = state.get_beacon_proposer_index(state.slot(), spec)? as u64;
- This is phase0, so there are marginal returns from this. Perhaps useful during sync, for what that's worth in a WSS world. I could take or leave this.
Additionally, I believe this call is totally redundant and could be removed (it could also happen in another PR, potential scope creep):
let proposer_index = state.get_beacon_proposer_index(state.slot(), &self.spec)? as u64; |
## Proposed Changes Add a new Cargo compilation profile called `maxperf` which enables more aggressive compiler optimisations at the expense of compilation time. Some rough initial benchmarks show that this can provide up to a 25% reduction to run time for CPU bound tasks like block processing: https://docs.google.com/spreadsheets/d/15jHuZe7lLHhZq9Nw8kc6EL0Qh_N_YAYqkW2NQ_Afmtk/edit The numbers in that spreadsheet compare the `consensus-context` branch from #3604 to the same branch compiled with the `maxperf` profile using: ``` PROFILE=maxperf make install-lcli ``` ## Additional Info The downsides of the maxperf profile are: - It increases compile times substantially, which will particularly impact low-spec hardware. Compiling `lcli` is about 3x slower. Compiling Lighthouse is about 5x slower on my 5950X: 17m 38s rather than 3m 28s. As a result I think we should not enable this everywhere by default. - **Option 1**: enable by default for our released binaries. This gives the majority of users the fastest version of `lighthouse` possible, at the expense of slowing down our release CI. Source builds will continue to use the default `release` profile unless users opt-in to `maxperf`. - **Option 2**: enable by default for source builds. This gives users building from source an edge, but makes them pay for it with compilation time. I think I would prefer Option 1. I'll try doing some benchmarking to see how long a maxperf build of Lighthouse would take on GitHub actions. Credit to Nicholas Nethercote for documenting these options in the Rust Performance Book: https://nnethercote.github.io/perf-book/build-configuration.html.
## Proposed Changes Add a new Cargo compilation profile called `maxperf` which enables more aggressive compiler optimisations at the expense of compilation time. Some rough initial benchmarks show that this can provide up to a 25% reduction to run time for CPU bound tasks like block processing: https://docs.google.com/spreadsheets/d/15jHuZe7lLHhZq9Nw8kc6EL0Qh_N_YAYqkW2NQ_Afmtk/edit The numbers in that spreadsheet compare the `consensus-context` branch from #3604 to the same branch compiled with the `maxperf` profile using: ``` PROFILE=maxperf make install-lcli ``` ## Additional Info The downsides of the maxperf profile are: - It increases compile times substantially, which will particularly impact low-spec hardware. Compiling `lcli` is about 3x slower. Compiling Lighthouse is about 5x slower on my 5950X: 17m 38s rather than 3m 28s. As a result I think we should not enable this everywhere by default. - **Option 1**: enable by default for our released binaries. This gives the majority of users the fastest version of `lighthouse` possible, at the expense of slowing down our release CI. Source builds will continue to use the default `release` profile unless users opt-in to `maxperf`. - **Option 2**: enable by default for source builds. This gives users building from source an edge, but makes them pay for it with compilation time. I think I would prefer Option 1. I'll try doing some benchmarking to see how long a maxperf build of Lighthouse would take on GitHub actions. Credit to Nicholas Nethercote for documenting these options in the Rust Performance Book: https://nnethercote.github.io/perf-book/build-configuration.html.
## Proposed Changes Add a new Cargo compilation profile called `maxperf` which enables more aggressive compiler optimisations at the expense of compilation time. Some rough initial benchmarks show that this can provide up to a 25% reduction to run time for CPU bound tasks like block processing: https://docs.google.com/spreadsheets/d/15jHuZe7lLHhZq9Nw8kc6EL0Qh_N_YAYqkW2NQ_Afmtk/edit The numbers in that spreadsheet compare the `consensus-context` branch from #3604 to the same branch compiled with the `maxperf` profile using: ``` PROFILE=maxperf make install-lcli ``` ## Additional Info The downsides of the maxperf profile are: - It increases compile times substantially, which will particularly impact low-spec hardware. Compiling `lcli` is about 3x slower. Compiling Lighthouse is about 5x slower on my 5950X: 17m 38s rather than 3m 28s. As a result I think we should not enable this everywhere by default. - **Option 1**: enable by default for our released binaries. This gives the majority of users the fastest version of `lighthouse` possible, at the expense of slowing down our release CI. Source builds will continue to use the default `release` profile unless users opt-in to `maxperf`. - **Option 2**: enable by default for source builds. This gives users building from source an edge, but makes them pay for it with compilation time. I think I would prefer Option 1. I'll try doing some benchmarking to see how long a maxperf build of Lighthouse would take on GitHub actions. Credit to Nicholas Nethercote for documenting these options in the Rust Performance Book: https://nnethercote.github.io/perf-book/build-configuration.html.
## Proposed Changes Add a new Cargo compilation profile called `maxperf` which enables more aggressive compiler optimisations at the expense of compilation time. Some rough initial benchmarks show that this can provide up to a 25% reduction to run time for CPU bound tasks like block processing: https://docs.google.com/spreadsheets/d/15jHuZe7lLHhZq9Nw8kc6EL0Qh_N_YAYqkW2NQ_Afmtk/edit The numbers in that spreadsheet compare the `consensus-context` branch from #3604 to the same branch compiled with the `maxperf` profile using: ``` PROFILE=maxperf make install-lcli ``` ## Additional Info The downsides of the maxperf profile are: - It increases compile times substantially, which will particularly impact low-spec hardware. Compiling `lcli` is about 3x slower. Compiling Lighthouse is about 5x slower on my 5950X: 17m 38s rather than 3m 28s. As a result I think we should not enable this everywhere by default. - **Option 1**: enable by default for our released binaries. This gives the majority of users the fastest version of `lighthouse` possible, at the expense of slowing down our release CI. Source builds will continue to use the default `release` profile unless users opt-in to `maxperf`. - **Option 2**: enable by default for source builds. This gives users building from source an edge, but makes them pay for it with compilation time. I think I would prefer Option 1. I'll try doing some benchmarking to see how long a maxperf build of Lighthouse would take on GitHub actions. Credit to Nicholas Nethercote for documenting these options in the Rust Performance Book: https://nnethercote.github.io/perf-book/build-configuration.html.
## Proposed Changes Add a new Cargo compilation profile called `maxperf` which enables more aggressive compiler optimisations at the expense of compilation time. Some rough initial benchmarks show that this can provide up to a 25% reduction to run time for CPU bound tasks like block processing: https://docs.google.com/spreadsheets/d/15jHuZe7lLHhZq9Nw8kc6EL0Qh_N_YAYqkW2NQ_Afmtk/edit The numbers in that spreadsheet compare the `consensus-context` branch from #3604 to the same branch compiled with the `maxperf` profile using: ``` PROFILE=maxperf make install-lcli ``` ## Additional Info The downsides of the maxperf profile are: - It increases compile times substantially, which will particularly impact low-spec hardware. Compiling `lcli` is about 3x slower. Compiling Lighthouse is about 5x slower on my 5950X: 17m 38s rather than 3m 28s. As a result I think we should not enable this everywhere by default. - **Option 1**: enable by default for our released binaries. This gives the majority of users the fastest version of `lighthouse` possible, at the expense of slowing down our release CI. Source builds will continue to use the default `release` profile unless users opt-in to `maxperf`. - **Option 2**: enable by default for source builds. This gives users building from source an edge, but makes them pay for it with compilation time. I think I would prefer Option 1. I'll try doing some benchmarking to see how long a maxperf build of Lighthouse would take on GitHub actions. Credit to Nicholas Nethercote for documenting these options in the Rust Performance Book: https://nnethercote.github.io/perf-book/build-configuration.html.
Pushed one (1) more sneaky commit with an optimisation for block replay: 26b819f. I was going to put it in a follow-up but figured I'd streamline things by including it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, sorry for the review delay!
Thank you for the review! bors r+ |
## Issue Addressed Closes #2371 ## Proposed Changes Backport some changes from `tree-states` that remove duplicated calculations of the `proposer_index`. With this change the proposer index should be calculated only once for each block, and then plumbed through to every place it is required. ## Additional Info In future I hope to add more data to the consensus context that is cached on a per-epoch basis, like the effective balances of validators and the base rewards. There are some other changes to remove indexing in tests that were also useful for `tree-states` (the `tree-states` types don't implement `Index`).
Pull request successfully merged into unstable. Build succeeded:
|
## Issue Addressed Closes sigp#2371 ## Proposed Changes Backport some changes from `tree-states` that remove duplicated calculations of the `proposer_index`. With this change the proposer index should be calculated only once for each block, and then plumbed through to every place it is required. ## Additional Info In future I hope to add more data to the consensus context that is cached on a per-epoch basis, like the effective balances of validators and the base rewards. There are some other changes to remove indexing in tests that were also useful for `tree-states` (the `tree-states` types don't implement `Index`).
## Proposed Changes Add a new Cargo compilation profile called `maxperf` which enables more aggressive compiler optimisations at the expense of compilation time. Some rough initial benchmarks show that this can provide up to a 25% reduction to run time for CPU bound tasks like block processing: https://docs.google.com/spreadsheets/d/15jHuZe7lLHhZq9Nw8kc6EL0Qh_N_YAYqkW2NQ_Afmtk/edit The numbers in that spreadsheet compare the `consensus-context` branch from sigp#3604 to the same branch compiled with the `maxperf` profile using: ``` PROFILE=maxperf make install-lcli ``` ## Additional Info The downsides of the maxperf profile are: - It increases compile times substantially, which will particularly impact low-spec hardware. Compiling `lcli` is about 3x slower. Compiling Lighthouse is about 5x slower on my 5950X: 17m 38s rather than 3m 28s. As a result I think we should not enable this everywhere by default. - **Option 1**: enable by default for our released binaries. This gives the majority of users the fastest version of `lighthouse` possible, at the expense of slowing down our release CI. Source builds will continue to use the default `release` profile unless users opt-in to `maxperf`. - **Option 2**: enable by default for source builds. This gives users building from source an edge, but makes them pay for it with compilation time. I think I would prefer Option 1. I'll try doing some benchmarking to see how long a maxperf build of Lighthouse would take on GitHub actions. Credit to Nicholas Nethercote for documenting these options in the Rust Performance Book: https://nnethercote.github.io/perf-book/build-configuration.html.
Closes sigp#2371 Backport some changes from `tree-states` that remove duplicated calculations of the `proposer_index`. With this change the proposer index should be calculated only once for each block, and then plumbed through to every place it is required. In future I hope to add more data to the consensus context that is cached on a per-epoch basis, like the effective balances of validators and the base rewards. There are some other changes to remove indexing in tests that were also useful for `tree-states` (the `tree-states` types don't implement `Index`).
Issue Addressed
Closes #2371
Proposed Changes
Backport some changes from
tree-states
that remove duplicated calculations of theproposer_index
.With this change the proposer index should be calculated only once for each block, and then plumbed through to every place it is required.
Additional Info
In future I hope to add more data to the consensus context that is cached on a per-epoch basis, like the effective balances of validators and the base rewards.
There are some other changes to remove indexing in tests that were also useful for
tree-states
(thetree-states
types don't implementIndex
).