Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstairs state machine refactoring (3/3) #1577

Merged
merged 37 commits into from
Jan 13, 2025
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
525a990
The Grand Refactoring of DsState
mkeeter Nov 22, 2024
3bfd514
Remove needs_replay in favor of a returned value
mkeeter Nov 25, 2024
935cf98
All tests passing
mkeeter Nov 25, 2024
5fafe5c
Update OpenAPI stuff
mkeeter Nov 25, 2024
10e80e3
Fix cmon
mkeeter Nov 25, 2024
3fa356a
All tests passing!
mkeeter Nov 25, 2024
a76a39e
Allow reconciliation after replacement
mkeeter Dec 2, 2024
03152ca
Fix valid replacement states for other DS
mkeeter Dec 2, 2024
6f43453
Remove duplicate ClientStopReason in ClientRunResult
mkeeter Dec 2, 2024
f44eb6f
Correctly skip jobs on reinitialize
mkeeter Dec 2, 2024
fcd4bc6
Cleaning up reinitialize
mkeeter Dec 3, 2024
35133e1
Update comments
mkeeter Dec 3, 2024
8718c45
Cleaner ConnectionMode handling during replacement
mkeeter Dec 3, 2024
bbb5c41
Update OpenAPI docs
mkeeter Dec 3, 2024
e9564b0
Fix notify-nexus code
mkeeter Dec 3, 2024
c1e0af1
Don't autopromote when a client is disabled
mkeeter Dec 11, 2024
a44bd4c
Concerns!
mkeeter Dec 11, 2024
8a3eb7c
Reimplement checked_state_transition
mkeeter Dec 12, 2024
106d61f
Add RUST_BACKTRACE=1
mkeeter Dec 12, 2024
4fea9b5
Allow Connecting -> Connecting transition
mkeeter Dec 12, 2024
4e74de7
Allow Replacing -> New with auto_promote = true
mkeeter Dec 12, 2024
beb72cf
Add client IDs to panic strings
mkeeter Jan 6, 2025
4dbd46b
Fix mentions of DsState::LiveRepairReady
mkeeter Jan 6, 2025
fe0af70
Reorganize states
mkeeter Jan 6, 2025
fd3a03a
Reorganize continue_negotiation
mkeeter Jan 6, 2025
5c859f1
Add a pointer to the negotiation state diagram
mkeeter Jan 6, 2025
535de0c
Add start--> to DsState diagram
mkeeter Jan 6, 2025
56418c6
Tweak ClientStopReason::NegotiationFailed handling
mkeeter Jan 6, 2025
30db97e
Use full names in various checks
mkeeter Jan 6, 2025
1698442
Deactivation also goes to ConnectionMode::New
mkeeter Jan 7, 2025
2350caf
Clean up continue_negotiation edge cases
mkeeter Jan 7, 2025
23ab7a8
Retry on version mismatches as well
mkeeter Jan 8, 2025
f45c0cc
Fix rustfmt
mkeeter Jan 8, 2025
19fb4dc
Update OpenAPI docs
mkeeter Jan 8, 2025
708bbe4
Remove outdated comment
mkeeter Jan 13, 2025
ea999fb
We can deactivate from WaitActive
mkeeter Jan 13, 2025
12c38e1
Tweaks to set_active_request
mkeeter Jan 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/buildomat/jobs/test-ds.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ for t in "$input/bins/"*.gz; do
done

export BINDIR=/var/tmp/bins
export RUST_BACKTRACE=1

banner test_ds
ptime -m bash "$input/scripts/test_ds.sh"
Expand Down
1 change: 1 addition & 0 deletions .github/buildomat/jobs/test-live-repair.sh
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ for t in "$input/bins/"*.gz; do
done

export BINDIR=/var/tmp/bins
export RUST_BACKTRACE=1

echo "BINDIR is $BINDIR"
echo "bindir contains:"
Expand Down
1 change: 1 addition & 0 deletions .github/buildomat/jobs/test-memory.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ for t in "$input/rbins/"*.gz; do
done

export BINDIR=/var/tmp/bins
export RUST_BACKTRACE=1

banner setup
pfexec plimit -n 9123456 $$
Expand Down
1 change: 1 addition & 0 deletions .github/buildomat/jobs/test-region-create.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ for t in "$input/rbins/"*.gz; do
done

export BINDIR=/var/tmp/bins
export RUST_BACKTRACE=1

banner region
pfexec plimit -n 9123456 $$
Expand Down
1 change: 1 addition & 0 deletions .github/buildomat/jobs/test-repair.sh
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ for t in "$input/bins/"*.gz; do
done

export BINDIR=/var/tmp/bins
export RUST_BACKTRACE=1

echo "Setup self timeout"
# Give this test two hours to finish
Expand Down
1 change: 1 addition & 0 deletions .github/buildomat/jobs/test-replay.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ for t in "$input/bins/"*.gz; do
done

export BINDIR=/var/tmp/bins
export RUST_BACKTRACE=1
banner setup

echo "Setup self timeout"
Expand Down
1 change: 1 addition & 0 deletions .github/buildomat/jobs/test-up-encrypted.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ for t in "$input/bins/"*.gz; do
done

export BINDIR=/var/tmp/bins
export RUST_BACKTRACE=1

# Give this test one hour to finish
jobpid=$$; (sleep $(( 60 * 60 )); banner fail-timeout; ps -ef; zfs list;kill $jobpid) &
Expand Down
1 change: 1 addition & 0 deletions .github/buildomat/jobs/test-up-unencrypted.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ for t in "$input/bins/"*.gz; do
done

export BINDIR=/var/tmp/bins
export RUST_BACKTRACE=1

# Give this test two hours to finish
jobpid=$$; (sleep $(( 120 * 60 )); banner fail-timeout; ps -ef; zfs list;kill $jobpid) &
Expand Down
52 changes: 38 additions & 14 deletions cmon/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ use strum::IntoEnumIterator;
use strum_macros::EnumIter;
use tokio::time::{sleep, Duration};

use crucible::{Arg, ClientStopReason, DsState};
use crucible::{
Arg, ClientStopReason, ConnectionMode, DsState, NegotiationState,
};

/// Connect to crucible control server
#[derive(Parser, Debug)]
Expand Down Expand Up @@ -87,24 +89,46 @@ enum Action {
// Translate a DsState into a three letter string for printing.
fn short_state(dss: DsState) -> String {
match dss {
DsState::New
| DsState::Stopping(ClientStopReason::NegotiationFailed(..)) => {
"NEW".to_string()
}
DsState::WaitActive => "WAC".to_string(),
DsState::WaitQuorum => "WAQ".to_string(),
DsState::Reconcile => "REC".to_string(),
DsState::Connecting {
mkeeter marked this conversation as resolved.
Show resolved Hide resolved
state: NegotiationState::WaitActive,
..
} => "WAC".to_string(),

DsState::Connecting {
state: NegotiationState::WaitQuorum,
..
} => "WAQ".to_string(),
DsState::Connecting {
state: NegotiationState::Reconcile,
..
} => "REC".to_string(),
DsState::Active => "ACT".to_string(),
DsState::Faulted | DsState::Stopping(ClientStopReason::Fault(..)) => {
"FLT".to_string()
DsState::Connecting {
state: NegotiationState::LiveRepairReady,
..
} => "LRR".to_string(),
DsState::Stopping(ClientStopReason::NegotiationFailed(..))
| DsState::Connecting {
mode: ConnectionMode::New,
..
} => "NEW".to_string(),
DsState::Connecting {
mode: ConnectionMode::Faulted,
..
}
DsState::LiveRepairReady => "LRR".to_string(),
| DsState::Stopping(ClientStopReason::Fault(..)) => "FLT".to_string(),
DsState::LiveRepair => "LR".to_string(),
DsState::Offline => "OFF".to_string(),
DsState::Connecting {
mode: ConnectionMode::Offline,
..
} => "OFF".to_string(),
DsState::Stopping(ClientStopReason::Deactivated) => "DAV".to_string(),
DsState::Stopping(ClientStopReason::Disabled) => "DIS".to_string(),
DsState::Stopping(ClientStopReason::Replacing) => "RPC".to_string(),
DsState::Replaced => "RPD".to_string(),
DsState::Stopping(ClientStopReason::Replacing)
| DsState::Connecting {
mode: ConnectionMode::Replaced,
..
} => "RPD".to_string(),
}
}

Expand Down
65 changes: 60 additions & 5 deletions common/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -133,9 +133,6 @@ pub enum CrucibleError {
#[error("Repair stream error {0}")]
RepairStreamError(String),

#[error("Generation number is too low: {0}")]
GenerationNumberTooLow(String),

#[error("Active with different generation number")]
GenerationNumberInvalid,

Expand Down Expand Up @@ -165,6 +162,9 @@ pub enum CrucibleError {

#[error("Incompatible RegionDefinition {0}")]
RegionIncompatible(String),

#[error("Negotiation error: {0}")]
NegotiationError(NegotiationError),
}

impl From<std::io::Error> for CrucibleError {
Expand Down Expand Up @@ -197,6 +197,61 @@ impl<T> From<std::sync::mpsc::SendError<T>> for CrucibleError {
}
}

impl From<NegotiationError> for CrucibleError {
fn from(value: NegotiationError) -> Self {
CrucibleError::NegotiationError(value)
}
}

#[allow(clippy::derive_partial_eq_without_eq)]
#[derive(
thiserror::Error,
Debug,
PartialEq,
Clone,
Copy,
Serialize,
Deserialize,
JsonSchema,
)]
pub enum NegotiationError {
#[error("Message received out of order")]
OutOfOrder,

#[error("Generation 0 is illegal")]
GenerationZeroIsIllegal,

#[error(
"Generation number is too low: requested {requested}, found {actual}"
)]
GenerationNumberTooLow { requested: u64, actual: u64 },

#[error("Incompatible message version: wanted {expected}, got {actual}")]
IncompatibleVersion { expected: u32, actual: u32 },

#[error(
"Incompatible encryption settings: wanted {expected}, got {actual}"
)]
EncryptionMismatch { expected: bool, actual: bool },

#[error(
"Incompatible read-only settings: wanted {expected}, got {actual}"
)]
ReadOnlyMismatch { expected: bool, actual: bool },

#[error("Incompatible upstairs ID: wanted {expected}, got {actual}")]
UpstairsIdMismatch {
expected: uuid::Uuid,
actual: uuid::Uuid,
},

#[error("Incompatible session ID: wanted {expected}, got {actual}")]
SessionIdMismatch {
expected: uuid::Uuid,
actual: uuid::Uuid,
},
}

#[macro_export]
macro_rules! crucible_bail {
($i:ident) => { return Err(CrucibleError::$i) };
Expand Down Expand Up @@ -394,7 +449,6 @@ impl From<CrucibleError> for dropshot::HttpError {
| CrucibleError::DecryptionError
| CrucibleError::Disconnect
| CrucibleError::EncryptionError(_)
| CrucibleError::GenerationNumberTooLow(_)
| CrucibleError::GenerationNumberInvalid
| CrucibleError::GenericError(_)
| CrucibleError::HashMismatch
Expand All @@ -417,7 +471,8 @@ impl From<CrucibleError> for dropshot::HttpError {
| CrucibleError::MissingContextSlot(..)
| CrucibleError::BadMetadata(..)
| CrucibleError::BadContextSlot(..)
| CrucibleError::MissingBlockContext => {
| CrucibleError::MissingBlockContext
| CrucibleError::NegotiationError(..) => {
dropshot::HttpError::for_internal_error(e.to_string())
}
}
Expand Down
Loading
Loading