Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change dataset table to only hold Crucible datasets #7386

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

jgallagher
Copy link
Contributor

The diff of this PR is large but very mechanical. The actual changes here are:

  • The dataset table has been renamed crucible_dataset (and correspondingly, the Dataset model struct is now CrucibleDataset, and Datastore::dataset_* methods are now Datastore::crucible_dataset_*)
  • The zone_name column has been dropped (it was always NULL for Crucible datasets, enforced by a CHECK constraint)
  • The quota, reservation, and compression columns have been dropped (they were always NULL for Crucible datasets, although this was not enforced by a CHECK constraint; this information is kept in the blueprint and is not read from this table)
  • The ip, port, and size_used columns now have NOT NULL constraints (they already had NOT NULL constraints specifically for Crucible datasets via a CHECK constraint; correspondingly, the CrucibleDataset model type has non-optional fields for these columns)

and everything else is followup to account for this. Most of that followup is just renaming things. A few nontrivial exceptions:

  • There were a handful of places that had to deal with the optionality of address by failing if it wasn't set for Crucible datasets; that error handling is now gone since address is no longer optional.
  • Some tests specifically operated on the dataset table using a mix of Crucible and non-Crucible datasets. I tried to update these as reasonably as I could while keeping the spirit of the test in tact, but a closer look at these bits would be good.
  • The rack init request in the RSS -> Nexus handoff now only includes Crucible datasets.

I think this gets us most of the way toward crucible_dataset being a "rendezvous table" in the RFD 540 sense. As of this PR, the blueprint executor task is still responsible for adding entries to this table, but I'll move that to the rendezvous background task in a followup.

Fixes #7299. Gets us halfway to #7301 by fixing the dataset side of that issue (bp_omicron_dataset still needs work).

@jgallagher jgallagher requested review from jmpesp and smklein January 22, 2025 20:28

/// Database representation of a Crucible dataset's live information.
///
/// This includes the socket address of the Crucible downstairs that owns this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/downstairs/agent/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in deeec80

}

#[derive(Debug, Default, Serialize, Deserialize)]
pub struct CrucibleResourcesV2 {
pub datasets_and_regions: Vec<(Dataset, Region)>,
pub datasets_and_regions: Vec<(CrucibleDataset, Region)>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to be able to deserialize these existing blobs: will the JSON format change with the type change here? I don't think so but I've been wrong before!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does change but only in ways that should be compatible, I think:

  • The old JSON blobs will have zone_name, quota, reservation, and compression fields that we'll ignore when deserializing.
  • The old JSON blobs could have had null for the ip, port, or size_used fields. If they did, we will now fail to deserialize them. But those should have always been set for Crucible datasets, so should not have null?

I was planning to take this through a "deploy R12 on a racklette, upgrade to this branch" test before merging. (Although I'd like to wait to do that until I have the next, smaller PR up, to test both at the same time.) What should I do to the racklette before upgrading to ensure I have some of these JSON blobs present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, maybe a better check: where is this JSON in CRDB? I can check dogfood and see if we have any that would cause problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm well that didn't work. Assuming this is volume.resources_to_clean_up, dogfood only has empty resources for V1 and V2, and V3 looks like it only has IDs in it?

root@[fd00:1122:3344:105::3]:32221/omicron> select resources_to_clean_up from volume where resources_to_clean_up is not null and resources_to_clean_up like '{"V2%';
                     resources_to_clean_up
---------------------------------------------------------------
  {"V2":{"datasets_and_regions":[],"snapshots_to_delete":[]}}
  {"V2":{"datasets_and_regions":[],"snapshots_to_delete":[]}}
(2 rows)


Time: 12ms total (execution 12ms / network 0ms)

root@[fd00:1122:3344:105::3]:32221/omicron> select resources_to_clean_up from volume where resources_to_clean_up is not null and resources_to_clean_up like '{"V1%';
                      resources_to_clean_up
------------------------------------------------------------------
  {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}}
  {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}}
  {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}}
  {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}}
  {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}}
  {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}}
  {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}}
  {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}}
  {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}}
  {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}}
  {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}}
  {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}}
  {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}}
(13 rows)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, unfortunately in this case volume records are eventually hard deleted.

What should I do to the racklette before upgrading to ensure I have some of these JSON blobs present?

Actually, I think test_deserialize_old_crucible_resources covers this case!

let dataset = dataset?;
let id = dataset.id();

// If this dataset already exists, do nothing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a behaviour change from what was here before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make the claim it is not, even though it looks like it is. 😅

Before, we had this:

        // If this dataset already exists, only update it if it appears different from what exists
        // in the database already.
        let action = if let Some(db_dataset) = existing_datasets.remove(&id) {
            let db_config: DatasetConfig = db_dataset.try_into()?;
            let bp_config: DatasetConfig = bp_dataset.clone().try_into()?;

            if db_config == bp_config {
                num_unchanged += 1;
                continue;
            }

When flattening a db_dataset down to a DatasetConfig to see if there are any relevant changes, we were throwing away ip, port, and size_used, only leaving us with information derived from pool_id, zone_name, quota, reservation, and compression:

impl TryFrom<Dataset> for omicron_common::disk::DatasetConfig {
    type Error = Error;

    fn try_from(dataset: Dataset) -> Result<Self, Self::Error> {
        let compression = if let Some(c) = dataset.compression {
            c.parse().map_err(|e: anyhow::Error| {
                Error::internal_error(&e.to_string())
            })?
        } else {
            omicron_common::disk::CompressionAlgorithm::Off
        };

        Ok(Self {
            id: dataset.identity.id.into(),
            name: omicron_common::disk::DatasetName::new(
                omicron_common::zpool_name::ZpoolName::new_external(
                    ZpoolUuid::from_untyped_uuid(dataset.pool_id),
                ),
                dataset.kind.try_into_api(dataset.zone_name)?,
            ),
            inner: omicron_common::disk::SharedDatasetConfig {
                quota: dataset.quota.map(|q| q.into()),
                reservation: dataset.reservation.map(|r| r.into()),
                compression,
            },
        })
    }
}

We dropped all the those fields we were comparing except pool_id. So the equivalent code here would be something like:

if existing_dataset.pool_id == dataset.pool_id {
    num_unchanged += 1;
    continue;
}

but I don't believe it's possible for a dataset's pool ID to change, right? So I dropped what would look like a weird / spurious check and replaced it with just the "have we inserted this already" check that's here now.

If it is possible for a dataset's pool ID to change, the new code is a behavior change and is wrong.

A separate question is: is it possible for a dataset's IP or port to change? If so, this code is wrong, but so was the old code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dropped all the those fields we were comparing except pool_id.

I don't follow this part.

I don't believe it's possible for a dataset's pool ID to change, right?

I believe the same thing haha

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dropped all the those fields we were comparing except pool_id.

I don't follow this part.

The old code for deciding whether to try to upsert was this:

        // If this dataset already exists, only update it if it appears different from what exists
        // in the database already.
        let action = if let Some(db_dataset) = existing_datasets.remove(&id) {
            let db_config: DatasetConfig = db_dataset.try_into()?;
            let bp_config: DatasetConfig = bp_dataset.clone().try_into()?;

            if db_config == bp_config {
                num_unchanged += 1;
                continue;
            }

Those DatasetConfig structs are only comparing a subset of fields: pool_id, zone_name, quota, reservation, and compression. Of all those fields, the new CrucibleDataset only has pool_id. So the behaviorally-equivalent code would be to just compare "does the blueprint dataset pool_id match the database dataset pool_id", but that's a silly check to make when we know it can't change.

if !matches!(zone.zone_type, BlueprintZoneType::Crucible(_)) {
return None;
}
let dataset = zone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the type here CrucibleDataset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's DurableDataset, which contains a reference to an OmicronZoneDataset:

pub struct DurableDataset<'a> {
pub dataset: &'a OmicronZoneDataset,
pub kind: DatasetKind,
pub address: SocketAddrV6,
}

@@ -400,16 +376,17 @@ mod tests {
.expect("failed to upsert zpool");
}

// Call `ensure_dataset_records_exist` again, adding new datasets.
// Call `ensure_crucible_dataset_records_exist` again, adding new
// datasets.
//
// It should only insert these new zones.
let new_zones = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong, though I haven't looked deeply into this test: it's inserting two of the same zone that are different types than before this diff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay, but please double check me. Before, the execution code would insert dataset records for any DatasetKind since dataset held all dataset kinds; now, it only inserts records for Crucible datasets. The test is trying to ensure that new datasets added after the last execution are added, which is now restricted to "new Crucible datasets added after the last execution are added", so the new zones we use to test have to be Crucible zones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, the execution code would insert dataset records for any DatasetKind since dataset held all dataset kinds; now, it only inserts records for Crucible datasets.

I think this is why I was suspicious: based on that understanding this vec should have had four members, mixing together the crucible and non-crucible datasets.

There should be two versions of this test: one for "durable" datasets and one for crucible datasets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something - does it help if all_datasets already included Crucible datasets? (Guaranteed by https://github.com/oxidecomputer/omicron/pull/7386/files/26ccf5f2e0f22721cd4d1d26bac4d9c4dc659e6b#diff-d4165f273b361528feca0c934574df6365cd86f5688ab81665f35975620183a1R312-R316, and then we pass in all_datasets.iter().chain(new_zones); that used to provide a mix of crucible and non-crucible, and now just passes in "more crucible".)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We chatted about what made this test confusing; I took a stab at addressing it in 1201f9a

kind omicron.public.dataset_kind NOT NULL,
/*
* Contact information for the dataset: socket address of the Crucible
* downstairs service that owns this dataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/downstairs/agent/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in deeec80

Comment on lines +1 to +3
set local disallow_full_table_scans = off;

DELETE FROM omicron.public.crucible_dataset WHERE kind != 'crucible';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask if we wanted to add a constraint that the kind must be 'crucible' before dropping, to ensure we've cleared other dataset types, but I hesitated to recommend that because "blocking the update" also seems bad.

To confirm, if there happen to be non-crucible entries for any reason, we're okay dropping them quietly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There definitely are non-crucible entries! But dropping them is intentional: we're converting this into "the Crucible dataset rendezvous table", and after #7341 and #7350, there are no consumers of this table left that query for anything other than kind = 'crucible'.

(kind != 'zone') OR
(kind = 'zone' AND zone_name IS NOT NULL)
)
size_used INT NOT NULL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not your fault that this isn't labelled, but could we add: "this field is controlled by Nexus, and is updated as region allocations are performed from this dataset"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in deeec80

@@ -557,9 +557,10 @@ CREATE TYPE IF NOT EXISTS omicron.public.dataset_kind AS ENUM (
);

/*
* A dataset of allocated space within a zpool.
* Table tracking the contact information and size used by datasets associated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is it worth documenting that this is a rendezvous table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - added in deeec80

Comment on lines +1047 to +1048
ip -> Inet,
port -> Int4,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks for making these non-nullable, makes this a lot easier to use

}

pub async fn dataset_upsert_if_blueprint_is_current_target(
pub async fn crucible_dataset_upsert_if_blueprint_is_current_target(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(no action needed) since this is invoked by the blueprint executor to ensure dataset records exist, I assume this can also go away when we move more fully to the rendezvous table model.

(all to say, thanks for this PR, this is a great direction)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's the plan for the next PR 👍

}

datastore
.dataset_delete_if_blueprint_is_current_target(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rad to see this unified, it felt gross as it was written (deletion of a table row happening in multiple possible spots). Thanks again for this change.

@jgallagher
Copy link
Contributor Author

Notes from testing this PR along with its successor #7397 on london are at #7397 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

some datasets in RSS blueprint erroneously include addresses
3 participants