-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[reconfigurator] SledEditor: be more strict about decommissioned sleds #7234
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a net improvement but I share your ambivalence.
I like making set_sled_state()
more specific to decommissioning. If we don't support other transitions today, then it seems like if/when we do, we'll want to think through what other updates need to be made. That might be easy to forget if somebody can just call set_sled_state(SomeOtherState)
.
nexus/reconfigurator/planning/src/blueprint_editor/sled_editor.rs
Outdated
Show resolved
Hide resolved
// We can't take ownership of `editor` from the `&mut self` | ||
// reference we have, and we need ownership to call | ||
// `finalize()`. Steal the contents via `mem::swap()` with an | ||
// empty editor. This isn't panic safe (i.e., if we panic | ||
// between the `mem::swap()` and the reassignment to `self.0` | ||
// below, we'll be left in the active state with an empty sled | ||
// editor), but omicron in general is not panic safe and aborts | ||
// on panic. Plus `finalize()` should never panic. | ||
let mut stolen = ActiveSledEditor::new_empty( | ||
DatasetIdsBackfillFromDb::empty(), | ||
); | ||
mem::swap(editor, &mut stolen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of all this, what about accepting an owned SledEditor
here and returning an owned one back? I think this means the caller would have to remove this sled editor from sled_editors
and insert the one it got back. But that seems okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think this makes failures super awkward? If we take self
and return Result<SledEditor, SledEditError>
, then on failure our caller doesn't get an editor back. I guess we could return either (SledEditor, Result<(), SledEditError>)
or Result<SledEditor, (SledEditor, SledEditError)>
but either of those seems worse to me than the slightly-messy internal details here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough!
) -> impl Iterator<Item = &BlueprintPhysicalDiskConfig> { | ||
match &self.0 { | ||
InnerSledEditor::Active(editor) => { | ||
Either::Left(editor.disks(filter)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never seen Either
before. Is this just a way to have a function return either of two different iterators that otherwise would have different types? I think I usually use Box<dyn Iterator<...>>
instead. This is neat.
(I think I'm ambivalent between both approaches but it's cool to know this exists.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! This is from itertools. This one doesn't require boxing but does mean there's a match on Left/Right on every call to Iterator::next()
. I doubt either choice makes any meaningful difference in this case.
// TODO-john The disks and datasets checks below don't pass what the | ||
// planner does currently to decommission sleds: if a sled is expunged, | ||
// we'll omit its disks and datasets from the outgoing blueprint | ||
// entirely without setting them all to the `Expunged` disposition. | ||
// Fixing this will conflict with ongoing disk work, so for now these | ||
// checks are commented out. | ||
/* | ||
// Check that all disks are expunged... | ||
if let Some(disk) = | ||
self.disks(DiskFilter::All).find(|disk| match disk.disposition { | ||
BlueprintPhysicalDiskDisposition::InService => true, | ||
BlueprintPhysicalDiskDisposition::Expunged => false, | ||
}) | ||
{ | ||
return Err(SledEditError::NonDecommissionableDiskInService { | ||
disk_id: disk.id, | ||
zpool_id: disk.pool_id, | ||
}); | ||
} | ||
|
||
// ... and all datasets are expunged ... | ||
if let Some(dataset) = | ||
self.datasets(BlueprintDatasetFilter::All).find(|dataset| { | ||
match dataset.disposition { | ||
BlueprintDatasetDisposition::InService => true, | ||
BlueprintDatasetDisposition::Expunged => false, | ||
} | ||
}) | ||
{ | ||
return Err(SledEditError::NonDecommissionableDatasetInService { | ||
dataset_id: dataset.id, | ||
kind: dataset.kind.clone(), | ||
}); | ||
} | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You knew I was going to suggest putting this into an issue instead. 😄 It could be in a comment for an existing issue that covers the work that would introduce these checks or a standalone issue we track under Reconfigurator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of my comments is a blocker.
…7235) Most of this diff is moving stuff around in preparation for a larger rework, but there are some nontrivial changes worth reviewing that I'll call out in comments below. This is staged on top of #7234. Prior to this PR, `BlueprintBuilder` had 3 fields that allocated various kinds of resources: * `sled_ip_allocators` (for allocating underlay IPs) * `external_networking` (for creating OPTE NICs and assigning external IP addresses for Nexus/boundary NTP/external DNS) * `internal_dns_subnets` (for assigning internal DNS zones, which get IPs distinct from the normal sled subnet) All three were lazily instantiated on first use. This is to allow the planner to first expunge zones then start adding; the first "add" that needs a resource from one of these allocators creates the allocator by snapshotting the state of available resources at that point (_after_ expungement). Eventually I'd like to break the builder up to make these two phases more explicit, but for now this PR settles for shrinking the builder's surface area some: * `sled_ip_allocators` is gone; each `SledEditor` now maintains its own underlay IP allocator. These are no longer instantiated lazily, but for underlay IPs specifically this is fine: we never reuse underlay IPs from expunged zones, so we don't need to wait for expungement to reuse resources like the other two. * `external_networking`/`internal_dns_subnets` were replaced by `resource_allocator`, which internally holds both of those types and has pass-through methods for the builder to use. This is still lazily instantiated via `OnceCell` to keep the existing implicit ordering/behavior.
This is a followup from #7204 (comment) and makes two changes, neither of which should affect behavior:
SledEditor
will now fail if a caller attempts to make changes to a decommissioned sled (internally, this is statically enforced by a type state enum - a sled in the decommissioned state does not have any methods that support editing, so we're forced to return an error)SledEditor::set_state()
is nowSledEditor::decommission()
, and it performs some checks that the sled looks decommissionableThe second bullet is more questionable than I expected it to be:
SledEditor
shouldn't do any checks here; in particular, it doesn't have the full context (e.g., any checks on "should we decommission this sled" that depend on thePlanningInput
can't be performed here, becauseSledEditor
intentionally doesn't have access toPlanningInput
).I don't feel super strongly about the checks in
decommission()
or even this PR as a whole; if this doesn't look like a useful direction, I'd be fine with discarding it. Please review with a pretty critical eye.