Skip to content

Commit

Permalink
Always allocate regions to different sleds (#7382)
Browse files Browse the repository at this point in the history
When performing region allocation _again_ for a particular volume and
increasing the redundancy, the region allocation query was not excluding
the sleds that have already been used by the existing allocation as
candidates for the new region (note that this only matters when the
allocation strategy mandates distinct sleds!). This resulted in region
replacement and region snapshot replacement having the potential to
allocate the replacement region onto the same sled as an existing
region.

This commit fixes the region allocation query, and then fixes the
fallout: many tests of region replacement or region snapshot replacement
were not creating many simulated sled agents, and therefore could not
properly satisfy the fixed region allocation query. The
`extra_sled_agents` parameter added in #7353 was used, along with
changing the `DiskTestBuilder` to simulated one zpool on many sleds.

Changing these replacement tests revealed the next problem: because all
the simulated sleds share the same IP (::1) and range of ports for the
fake Crucible agents, each region allocation would have the same target,
and the volume construction request would look like:

    "target": [
      "[::1]:1100",
      "[::1]:1100",
      "[::1]:1100"
    ]

This is very confusing for the replacement routines!

We can't change the simulated sled's IPs: they need to bind to localhost
to listen for requests. Therefore this commit also adds the idea of
"sled index", and this sled index is used to make each range of Crucible
agent ports unique.

This lead straight to the next problem: the simulated Pantry assumes
that there is really only one simulated sled-agent, and this has all the
storage! This is no longer true in a post-#7353 world. Snapshots would
not be able to complete because the simulated Pantry would only find one
of the three regions when searching the first simulated sled-agent's
storage.

The real Pantry would construct a Crucible Upstairs, and delegate
snapshot requests to that, so this commit also adds a simulated Crucible
Upstairs that's aware of each of the simulated sled agent's `Storage`,
and can therefore fulfill snapshot requests. Centralizing the place
where all simulated sled agents (and Pantry servers!) are created makes
this easy: the simulated Upstairs can be aware of all of the `Storage`
objects, and the simulated Pantry can use that simulated Upstairs to
take fake snapshots.

Fixes oxidecomputer/crucible#1594.
  • Loading branch information
jmpesp authored Jan 23, 2025
1 parent 4d9d269 commit 288eaf4
Show file tree
Hide file tree
Showing 16 changed files with 654 additions and 288 deletions.
28 changes: 25 additions & 3 deletions nexus/db-queries/src/db/queries/region_allocation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,27 @@ pub fn allocation_query(
),")
};

// If `distinct_sleds` is selected, then take note of the sleds used by
// existing allocations, and filter those out later. This step is required
// when taking an existing allocation of regions and increasing the
// redundancy in order to _not_ allocate to sleds already used.

let builder = if distinct_sleds {
builder.sql(
"
existing_sleds AS (
SELECT
zpool.sled_id as id
FROM
zpool
WHERE
zpool.id = ANY(SELECT pool_id FROM existing_zpools)
),",
)
} else {
builder
};

// Identifies zpools with enough space for region allocation, that are not
// currently used by this Volume's existing regions.
//
Expand Down Expand Up @@ -194,17 +215,18 @@ pub fn allocation_query(
AND physical_disk.disk_policy = 'in_service'
AND physical_disk.disk_state = 'active'
AND NOT(zpool.id = ANY(SELECT existing_zpools.pool_id FROM existing_zpools))
)"
"
).bind::<sql_types::BigInt, _>(size_delta as i64);

let builder = if distinct_sleds {
builder
.sql("ORDER BY zpool.sled_id, md5((CAST(zpool.id as BYTEA) || ")
.sql("AND NOT(sled.id = ANY(SELECT existing_sleds.id FROM existing_sleds)))
ORDER BY zpool.sled_id, md5((CAST(zpool.id as BYTEA) || ")
.param()
.sql("))")
.bind::<sql_types::Bytea, _>(seed.clone())
} else {
builder
builder.sql(")")
}
.sql("),");

Expand Down
10 changes: 10 additions & 0 deletions nexus/db-queries/tests/output/region_allocate_distinct_sleds.sql
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,15 @@ WITH
FROM
dataset INNER JOIN old_regions ON old_regions.dataset_id = dataset.id
),
existing_sleds
AS (
SELECT
zpool.sled_id AS id
FROM
zpool
WHERE
zpool.id = ANY (SELECT pool_id FROM existing_zpools)
),
candidate_zpools
AS (
SELECT
Expand Down Expand Up @@ -64,6 +73,7 @@ WITH
AND physical_disk.disk_policy = 'in_service'
AND physical_disk.disk_state = 'active'
AND NOT (zpool.id = ANY (SELECT existing_zpools.pool_id FROM existing_zpools))
AND NOT (sled.id = ANY (SELECT existing_sleds.id FROM existing_sleds))
ORDER BY
zpool.sled_id, md5(CAST(zpool.id AS BYTES) || $3)
),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,15 @@ WITH
region_snapshot.snapshot_id = $2
)
),
existing_sleds
AS (
SELECT
zpool.sled_id AS id
FROM
zpool
WHERE
zpool.id = ANY (SELECT pool_id FROM existing_zpools)
),
candidate_zpools
AS (
SELECT
Expand Down Expand Up @@ -75,6 +84,7 @@ WITH
AND physical_disk.disk_policy = 'in_service'
AND physical_disk.disk_state = 'active'
AND NOT (zpool.id = ANY (SELECT existing_zpools.pool_id FROM existing_zpools))
AND NOT (sled.id = ANY (SELECT existing_sleds.id FROM existing_sleds))
ORDER BY
zpool.sled_id, md5(CAST(zpool.id AS BYTES) || $4)
),
Expand Down
39 changes: 37 additions & 2 deletions nexus/inventory/src/collector.rs
Original file line number Diff line number Diff line change
Expand Up @@ -418,9 +418,11 @@ mod test {
use omicron_uuid_kinds::OmicronZoneUuid;
use omicron_uuid_kinds::SledUuid;
use omicron_uuid_kinds::ZpoolUuid;
use slog::o;
use std::fmt::Write;
use std::net::Ipv6Addr;
use std::net::SocketAddrV6;
use std::sync::Arc;

fn dump_collection(collection: &Collection) -> String {
// Construct a stable, human-readable summary of the Collection
Expand Down Expand Up @@ -561,16 +563,21 @@ mod test {
log: slog::Logger,
sled_id: SledUuid,
zone_id: OmicronZoneUuid,
simulated_upstairs: &Arc<sim::SimulatedUpstairs>,
) -> sim::Server {
// Start a simulated sled agent.
let config = sim::Config::for_testing(
sled_id,
sim::SimMode::Auto,
None,
None,
Some(vec![]),
sim::ZpoolConfig::None,
);
let agent = sim::Server::start(&config, &log, false).await.unwrap();

let agent =
sim::Server::start(&config, &log, false, &simulated_upstairs, 0)
.await
.unwrap();

// Pretend to put some zones onto this sled. We don't need to test this
// exhaustively here because there are builder tests that exercise a
Expand Down Expand Up @@ -607,18 +614,28 @@ mod test {
gateway_test_utils::setup::test_setup("test_basic", SpPort::One)
.await;
let log = &gwtestctx.logctx.log;

let simulated_upstairs =
Arc::new(sim::SimulatedUpstairs::new(log.new(o!(
"component" => "omicron_sled_agent::sim::SimulatedUpstairs",
))));

let sled1 = sim_sled_agent(
log.clone(),
"9cb9b78f-5614-440c-b66d-e8e81fab69b0".parse().unwrap(),
"5125277f-0988-490b-ac01-3bba20cc8f07".parse().unwrap(),
&simulated_upstairs,
)
.await;

let sled2 = sim_sled_agent(
log.clone(),
"03265caf-da7d-46c7-b1c2-39fa90ce5c65".parse().unwrap(),
"8b88a56f-3eb6-4d80-ba42-75d867bc427d".parse().unwrap(),
&simulated_upstairs,
)
.await;

let sled1_url = format!("http://{}/", sled1.http_server.local_addr());
let sled2_url = format!("http://{}/", sled2.http_server.local_addr());
let mgs_url = format!("http://{}/", gwtestctx.client.bind_address);
Expand Down Expand Up @@ -664,18 +681,28 @@ mod test {
)
.await;
let log = &gwtestctx1.logctx.log;

let simulated_upstairs =
Arc::new(sim::SimulatedUpstairs::new(log.new(o!(
"component" => "omicron_sled_agent::sim::SimulatedUpstairs",
))));

let sled1 = sim_sled_agent(
log.clone(),
"9cb9b78f-5614-440c-b66d-e8e81fab69b0".parse().unwrap(),
"5125277f-0988-490b-ac01-3bba20cc8f07".parse().unwrap(),
&simulated_upstairs,
)
.await;

let sled2 = sim_sled_agent(
log.clone(),
"03265caf-da7d-46c7-b1c2-39fa90ce5c65".parse().unwrap(),
"8b88a56f-3eb6-4d80-ba42-75d867bc427d".parse().unwrap(),
&simulated_upstairs,
)
.await;

let sled1_url = format!("http://{}/", sled1.http_server.local_addr());
let sled2_url = format!("http://{}/", sled2.http_server.local_addr());
let mgs_clients = [&gwtestctx1, &gwtestctx2]
Expand Down Expand Up @@ -765,12 +792,20 @@ mod test {
)
.await;
let log = &gwtestctx.logctx.log;

let simulated_upstairs =
Arc::new(sim::SimulatedUpstairs::new(log.new(o!(
"component" => "omicron_sled_agent::sim::SimulatedUpstairs",
))));

let sled1 = sim_sled_agent(
log.clone(),
"9cb9b78f-5614-440c-b66d-e8e81fab69b0".parse().unwrap(),
"5125277f-0988-490b-ac01-3bba20cc8f07".parse().unwrap(),
&simulated_upstairs,
)
.await;

let sled1_url = format!("http://{}/", sled1.http_server.local_addr());
let sledbogus_url = String::from("http://[100::1]:45678");
let mgs_url = format!("http://{}/", gwtestctx.client.bind_address);
Expand Down
6 changes: 3 additions & 3 deletions nexus/test-utils-macros/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ use syn::{parse_macro_input, ItemFn, Token};
#[derive(Debug, PartialEq, Eq, Hash)]
pub(crate) enum NexusTestArg {
ServerUnderTest(syn::Path),
ExtraSledAgents(usize),
ExtraSledAgents(u16),
}

impl syn::parse::Parse for NexusTestArg {
Expand All @@ -24,7 +24,7 @@ impl syn::parse::Parse for NexusTestArg {

if name.is_ident("extra_sled_agents") {
let value: syn::LitInt = input.parse()?;
let value = value.base10_parse::<usize>()?;
let value = value.base10_parse::<u16>()?;
return Ok(Self::ExtraSledAgents(value));
}

Expand Down Expand Up @@ -91,7 +91,7 @@ pub fn nexus_test(attrs: TokenStream, input: TokenStream) -> TokenStream {

let mut which_nexus =
syn::parse_str::<syn::Path>("::omicron_nexus::Server").unwrap();
let mut extra_sled_agents: usize = 0;
let mut extra_sled_agents: u16 = 0;

for arg in nexus_test_args.0 {
match arg {
Expand Down
Loading

0 comments on commit 288eaf4

Please sign in to comment.