Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jsonnet: add experimental ingest storage support #8028

Merged
merged 4 commits into from
May 3, 2024

Conversation

pracucci
Copy link
Collaborator

@pracucci pracucci commented May 2, 2024

What this PR does

In this PR I'm adding the jsonnet support for the experimental ingest storage. This PR is upstreaming the jsonnet we're using at Grafana Labs to test the ingest storage, and it migration from the Mimir classic architecture.

I've codified the migration steps in the jsonnet "tests". The diff of each step matched with the actual diff applied at Grafana Labs when we tested the migration process:

diff --color=always operations/mimir-tests/test-ingest-storage-migration-step-0-generated.yaml operations/mimir-tests/test-ingest-storage-migration-step-1-generated.yaml
diff --color=always operations/mimir-tests/test-ingest-storage-migration-step-1-generated.yaml operations/mimir-tests/test-ingest-storage-migration-step-2-generated.yaml
diff --color=always operations/mimir-tests/test-ingest-storage-migration-step-2-generated.yaml operations/mimir-tests/test-ingest-storage-migration-step-3-generated.yaml
diff --color=always operations/mimir-tests/test-ingest-storage-migration-step-3-generated.yaml operations/mimir-tests/test-ingest-storage-migration-step-4-generated.yaml
diff --color=always operations/mimir-tests/test-ingest-storage-migration-step-4-generated.yaml operations/mimir-tests/test-ingest-storage-migration-step-5-generated.yaml
diff --color=always operations/mimir-tests/test-ingest-storage-migration-step-5-generated.yaml operations/mimir-tests/test-ingest-storage-migration-step-6-generated.yaml
diff --color=always operations/mimir-tests/test-ingest-storage-migration-step-6-generated.yaml operations/mimir-tests/test-ingest-storage-migration-step-7-generated.yaml
diff --color=always operations/mimir-tests/test-ingest-storage-migration-step-7-generated.yaml operations/mimir-tests/test-ingest-storage-migration-step-8-generated.yaml
diff --color=always operations/mimir-tests/test-ingest-storage-migration-step-8-generated.yaml operations/mimir-tests/test-ingest-storage-migration-step-9-generated.yaml
diff --color=always operations/mimir-tests/test-ingest-storage-migration-step-9-generated.yaml operations/mimir-tests/test-ingest-storage-migration-step-10-generated.yaml
diff --color=always operations/mimir-tests/test-ingest-storage-migration-step-10-generated.yaml operations/mimir-tests/test-ingest-storage-migration-step-11-generated.yaml
diff --color=always operations/mimir-tests/test-ingest-storage-migration-step-11-generated.yaml operations/mimir-tests/test-ingest-storage-migration-step-final-generated.yaml

Which issue(s) this PR fixes or relates to

N/A

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@@ -336,72 +332,4 @@
if !$._config.is_microservices_deployment_mode || ($._config.multi_zone_store_gateway_enabled && !$._config.multi_zone_store_gateway_migration_enabled)
then null
else super.store_gateway_pdb,

//
// Rollout operator.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: I've simply moved it to a dedicated file.

$._config.multi_zone_ingester_enabled ||
$._config.multi_zone_store_gateway_enabled ||
$._config.cortex_compactor_concurrent_rollout_enabled ||
$._config.ingest_storage_ingester_autoscaling_enabled,
Copy link
Collaborator Author

@pracucci pracucci May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: added $._config.ingest_storage_ingester_autoscaling_enabled. This is the only logic change compared to the version in operations/mimir/multi-zone.libsonnet.

@pracucci pracucci force-pushed the jsonnet-add-ingest-storage-support branch from 0cf0618 to 1b24186 Compare May 2, 2024 16:19
@@ -8,7 +8,7 @@ should_be_namespaced(contents) {
}

should_be_namespaced(contents) {
not contents.kind in ["PodSecurityPolicy", "Namespace"]
not contents.kind in ["PodSecurityPolicy", "Namespace", "CustomResourceDefinition"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: CustomResourceDefinition doesn't have namespace because it's a global resource.

@pracucci pracucci marked this pull request as ready for review May 2, 2024 17:17
@pracucci pracucci requested a review from a team as a code owner May 2, 2024 17:17
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job codifying the migration.

namespace: $._config.namespace,
},
spec: {
replicas:: null, // We don't want this field to be managed by Flux.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
replicas:: null, // We don't want this field to be managed by Flux.
replicas:: null, // Hide replicas field.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b02e79d

ingest_storage_migration_classic_ingesters_scale_down: false,
ingest_storage_migration_classic_ingesters_zone_a_decommission: true,
ingest_storage_migration_classic_ingesters_zone_b_decommission: true,
ingest_storage_migration_classic_ingesters_zone_c_decommission: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one field for scaling down all zones, but 3 fields for decommission. If these statefulset sets are scaled down already, why not decomission all of them at once?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see that this variable is also used in later steps.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll decommission them all together, but then restore 1 zone at a time when we migrate ingester-zone-[abc]-partition back to ingester-zone-[abc].

$.ingest_storage_ingester_args +
$.ingest_storage_ingester_ring_client_args + {
// Run partition ingesters on a dedicated hash ring, so that they don't clash with classic ingesters.
'ingester.ring.prefix': 'partition-ingesters/',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would be nice to use common local var for partition-ingesters/.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b02e79d. I've used a config option instead of a local var, because it's scattered around 2 files.

@@ -0,0 +1,12 @@
// Migration step 8:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at diff for generated YAML files between step 7 and 8, I think we're missing setup of ingester-zone-[abc], before we scale down ingester-zone-[abc]-partition. Ingesters without the suffix were removed in step 6, but they still don't exist at this point. Is that OK? If we scale down -partition ingesters now, we have no ingesters running and read-path is broken. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're missing setup of ingester-zone-[abc], before we scale down ingester-zone-[abc]-partition

We don't setup ingester-zone-[abc] before scaling down ingester-zone-[abc]-partition but after that because we'll reuse PVCs (since migrating PVCs is a migration step not done through jsonnet it's not visibile in the codified steps). ingester-zone-[abc] is setup in step 9. To keep the steps easier here, I've codified all zones at once, but in practice the migration will be 1 zone at a time to have no downtime.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we scale down -partition ingesters now, we have no ingesters running and read-path is broken

From step 8:

// Migration step 8:
// - Begin migrating ingester-zone-[abc]-partition StatefulSets to ingester-zone-[abc]. This should be done zone-by-zone,
//   but to keep the jsonnet test smaller we're going to do all zones at once. In this step we're going to scale down
//   ingester-zone-[abc]-partition replicas to 0.

In practice will be done 1 zone at a time.

pracucci added 4 commits May 3, 2024 12:04
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
@pracucci pracucci force-pushed the jsonnet-add-ingest-storage-support branch from 85ec995 to b02e79d Compare May 3, 2024 10:20
@pracucci pracucci merged commit 9f95485 into main May 3, 2024
29 checks passed
@pracucci pracucci deleted the jsonnet-add-ingest-storage-support branch May 3, 2024 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants