Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missed blocks to monitored validators #4731

Merged

Conversation

v4lproik
Copy link
Contributor

Solving #3414

Will add Prom metrics + tests tomorrow

@CLAassistant
Copy link

CLAassistant commented Sep 13, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@michaelsproul michaelsproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good! thanks for digging into this!

I've spammed a bunch of comments, hopefully they're helpful

beacon_node/beacon_chain/src/validator_monitor.rs Outdated Show resolved Hide resolved
beacon_node/beacon_chain/src/validator_monitor.rs Outdated Show resolved Hide resolved
beacon_node/beacon_chain/src/validator_monitor.rs Outdated Show resolved Hide resolved
beacon_node/beacon_chain/src/validator_monitor.rs Outdated Show resolved Hide resolved
beacon_node/beacon_chain/src/validator_monitor.rs Outdated Show resolved Hide resolved
beacon_node/beacon_chain/src/validator_monitor.rs Outdated Show resolved Hide resolved
beacon_node/beacon_chain/src/validator_monitor.rs Outdated Show resolved Hide resolved
Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delayed review, again! There's lots of progress here, thank you.

I have some suggestions that I've implemented over at https://github.com/paulhauner/lighthouse/tree/4731-paul-2. They are:

  • 86727: Add the MissedBlock struct which adds the parent_rootfield which prevents us from ignoring a skipped slot in a forked-chain because we've already seen it in another chain (this is unlikely to be practically useful, but I added it for completeness). I also removed the epoch field since we can determine that from the slot.
  • 98400: a minor fix to comment formatting.
  • 0f278: refactor the loops for detecting non-finalised skipped slots. I don't think the previous version was properly resetting the count for validators to 0 when they didn't have any skip slots detected.
  • 5e216: fixes an off-by-one with pruning (use >= rather than >). I also swapped the math from X > Y - 1 to X + 1 > Y because that works nicer with zero/zero-adjacent X values.
  • 73924: Uses the existing validator.id string rather than allocating a new one. This isn't a big deal, just a nit.
  • c8ce8: removes the "non finalised" metrics. This might be a contentious change and I've kept it as a single commit so you can revert it if you like. I just couldn't think of a scenario where the count of non-finalised missed blocks would be useful when we're already tracking the combined count of finalised and non-finalised blocks in the other metric. I'm totally open to your feedback here.

If you agree to all the changes, feel free to just merge my branch in here.

Let me know what you think! I'm ready to merge if these changes are added 🚀

@paulhauner paulhauner changed the base branch from stable to unstable November 2, 2023 22:02
@v4lproik
Copy link
Contributor Author

v4lproik commented Nov 5, 2023

Hey Paul!

The changes look 🚀 ! Thanks!

c8ce8: removes the "non finalised" metrics. This might be a contentious change and I've kept it as a single commit so you can revert it if you like. I just couldn't think of a scenario where the count of non-finalised missed blocks would be useful when we're already tracking the combined count of finalised and non-finalised blocks in the other metric. I'm totally open to your feedback here.

You are absolutely right, initially we were making the difference between non-finalized and finalized missed blocks which is not the case anymore so best removing it!

@v4lproik
Copy link
Contributor Author

v4lproik commented Nov 5, 2023

I removed a couple of variables that were leftovers from the validator monitor refactor. I just ran the linter locally, should be all good CI side :)

@v4lproik
Copy link
Contributor Author

v4lproik commented Nov 5, 2023

I forgot one variable in the tests, fixing it and re running the beacon_test locally as I am writing 🙏🏼

EDIT: fixed, all beacon_test tests green.

@paulhauner
Copy link
Member

paulhauner commented Nov 6, 2023

Thanks for fixing the merge conflicts!

We have a failing test now:

--- STDOUT:              beacon_chain::beacon_chain_tests validator_monitor::produces_missed_blocks ---

running 1 test
test validator_monitor::produces_missed_blocks ... FAILED

failures:

failures:
    validator_monitor::produces_missed_blocks

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 130 filtered out; finished in 5.15s


--- STDERR:              beacon_chain::beacon_chain_tests validator_monitor::produces_missed_blocks ---
thread 'validator_monitor::produces_missed_blocks' panicked at beacon_node/beacon_chain/tests/validator_monitor.rs:180:9:
assertion `left == right` failed
  left: 0
 right: 1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

   Canceling due to test failure: 4 tests still running
        PASS [  15.878s] beacon_chain::beacon_chain_tests sync_committee_verification::aggregated_gossip_verification
        PASS [  15.791s] beacon_chain::beacon_chain_tests sync_committee_verification::unaggregated_gossip_verification
        PASS [  14.648s] beacon_chain::beacon_chain_tests tests::produces_and_processes_with_genesis_skip_slots
        SLOW [>120.000s] beacon_chain::beacon_chain_tests block_verification::chain_segment_varying_chunk_size
        PASS [ 138.593s] beacon_chain::beacon_chain_tests block_verification::chain_segment_varying_chunk_size
------------
     Summary [ 153.680s] 230 tests run: 229 passed (1 slow), 1 failed, 0 skipped
        FAIL [   5.166s] beacon_chain::beacon_chain_tests validator_monitor::produces_missed_blocks
error: test run failed
make: *** [Makefile:161: test-beacon-chain-altair] Error 100

@v4lproik
Copy link
Contributor Author

v4lproik commented Nov 6, 2023

After debugging this issue this evening, I figured that the different fork names were making the test fail.

We were having a vec of proposers_per_epoch in fn add_validators_missed_blocks

 proposers_per_epoch = self.get_proposers_by_epoch_from_cache(
    slot_epoch,
    shuffling_decision_block,
);

that was different depending on the fork name. I was expecting the validator to miss the block to always be at the index 0 with the same value (i=7) where the validator index would be equal to validator_indexes[slot_in_epoch.as_usize()].

As I was wondering why the value at the index 0 is different depending on the fork name - eg. 7 (pre-altair) and 2 (post-altair).

I forgot to take into account that the get_beacon_proposer_indices is initiated depening on the fork seed. I simply added the validator_index that should be monitored per fork name and pass that list when initialising the validator_monitor component in the harness.

I just ran fmt, lint and the whole test suite for the BN (including all fork names this time around)
We should be all good! Last time this time! 🙏🏼

Again, thank you for your patience! 🙏🏼

Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thank you for your patience too!

Let's get this merged! 🚀

@paulhauner paulhauner added ready-for-merge This PR is ready to merge. and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Nov 7, 2023
@paulhauner
Copy link
Member

bors r+

@paulhauner
Copy link
Member

Hmm.. Bors doesn't seem to have detected that r+. I'll try again 🤞

bors r+

@paulhauner
Copy link
Member

It seems that bors is down. Trying again to see if it's back yet.

bors r+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge. v4.6.0 ETA Q1 2024
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants