Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify persisting tower local cluster tests #875

Merged
merged 4 commits into from
Jan 6, 2025

Conversation

carllin
Copy link

@carllin carllin commented Apr 18, 2024

Problem

Optimistic confirmation tests are needlessly convoluted.

The optimistic confirmation is not necessary for the core of the test which is to guarantee tower prevents bad voting behavior across restarts.

Summary of Changes

  1. Remove the need for optimistic confirmation
  2. Simplify the forking logic
  3. Remove the need for validator D

Ran each test 100 times, no failures

Fixes #

@carllin carllin changed the title Simplify persisting tower tests Simplify persisting tower local cluster tests Apr 18, 2024
@carllin carllin force-pushed the FixLocalClusterUtility branch from f7d52b3 to 41ae0a5 Compare April 18, 2024 03:10
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.8%. Comparing base (dd51e61) to head (41ae0a5).
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master     #875     +/-   ##
=========================================
- Coverage    81.8%    81.8%   -0.1%     
=========================================
  Files         851      851             
  Lines      231471   231471             
=========================================
- Hits       189534   189507     -27     
- Misses      41937    41964     +27     

local-cluster/tests/local_cluster.rs Outdated Show resolved Hide resolved
local-cluster/tests/local_cluster.rs Outdated Show resolved Hide resolved
local-cluster/tests/local_cluster.rs Outdated Show resolved Hide resolved
}

// A bit convoluted test case; but this roughly follows this test theoretical scenario:
// Validator A, B, C have 31, 36, 33 % of stake respectively. Leader schedule is split, first half
// of the test B is always leader, second half C is. Additionally we have a non voting validator D with 0
// stake to propagate gossip info.
// of the test B is always leader, second half C is.
//
// Step 1: Kill C, only A, B and D should be running

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a D now. By the way, why did we need D before but not anymore?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while, but I think this comment is a clue:

// also important so `C` doesn't run into NoPropagatedConfirmation errors on making its first forked

I think it comes from a time before we added this change:

// C should not produce any blocks at this time
validator_configs[2].fixed_leader_schedule = Some(FixedSchedule {
leader_schedule: Arc::new(c_leader_schedule),
});
to prevent C from making blocks before the base slot forking point.

and when we relied on waiting for validators A and C to vote on and OC some slot > next_slot_on_a, where that slot could have been produced by validator C

In this scenario, in order for C to reliably restart and make another fork, I think it would have needed to pass a propagated check, which implies seeing the vote from A in gossip, which is no longer necessary

@carllin carllin force-pushed the FixLocalClusterUtility branch 4 times, most recently from 410fe6b to 12a115a Compare December 25, 2024 04:38
@carllin carllin requested a review from wen-coding December 25, 2024 04:39
@carllin carllin force-pushed the FixLocalClusterUtility branch from 12a115a to 2b771c9 Compare December 25, 2024 06:18
// stake to propagate gossip info.
// of the test B is always leader, second half C is.
// We don't give validator A any slots because it's going to be deleting its ledger,
// so it may create versions of slots it's already created on a different fork

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean: it may create different versions of blocks for some blocks already created on a different fork?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, clarified!

//
// Step 1: Kill C, only A, B and D should be running
// Step 1: Kill C, only A, B should be running
//
// S0 -> S1 -> S2 -> S3 (A & B vote, optimistically confirmed)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, we are now checking:

last_vote >= 1

is it possible we only have S2 and S3 (no S0 and S1)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the only important slots are S2 and S3 because S2 is the last common ancestor slot slot and S3 is the fork that gets deleted, but still exists/doesn't exist in tower depending on which test we run.

You're right this S0, S1, S2, S3 stuff is quite confusing, I deleted all of this and replaced it with just the variable names base_slot, next_slot_on_a for clarity

@carllin carllin requested a review from wen-coding January 6, 2025 03:19
@carllin carllin merged commit 36e173d into anza-xyz:master Jan 6, 2025
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants