Skip to content

Commit

Permalink
Reduce sync concurrency in store-gateway by default to reduce disk co…
Browse files Browse the repository at this point in the history
…ntention (#7136)

* Reduce sync concurrency in store-gateway by default to reduce disk contention

* Update CHANGELOG.md
  • Loading branch information
andyasp authored Jan 16, 2024
1 parent c39af79 commit f299ba7
Show file tree
Hide file tree
Showing 7 changed files with 9 additions and 13 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
* `prometheus_sd_refresh_failures_total` renamed to `cortex_prometheus_sd_refresh_failures_total`
* `prometheus_sd_refresh_duration_seconds` renamed to `cortex_prometheus_sd_refresh_duration_seconds`
* [CHANGE] Query-frontend: the default value for `-query-frontend.not-running-timeout` has been changed from 0 (disabled) to 2s. The configuration option has also been moved from "experimental" to "advanced". #7126
* [CHANGE] Store-gateway: to reduce disk contention on HDDs the default value for `blocks-storage.bucket-store.tenant-sync-concurrency` has been changed from `10` to `1` and the default value for `blocks-storage.bucket-store.block-sync-concurrency` has been changed from `20` to `4`. #7136
* [FEATURE] Introduce `-tenant-federation.max-tenants` option to limit the max number of tenants allowed for requests when federation is enabled. #6959
* [FEATURE] Cardinality API: added a new `count_method` parameter which enables counting active label values. #7085
* [FEATURE] Querier / query-frontend: added `-querier.promql-experimental-functions-enabled` CLI flag (and respective YAML config option) to enable experimental PromQL functions. The experimental functions introduced are: `mad_over_time()`, `sort_by_label()` and `sort_by_label_desc()`. #7057
Expand Down
4 changes: 2 additions & 2 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -6366,7 +6366,7 @@
"required": false,
"desc": "Maximum number of concurrent tenants synching blocks.",
"fieldValue": null,
"fieldDefaultValue": 10,
"fieldDefaultValue": 1,
"fieldFlag": "blocks-storage.bucket-store.tenant-sync-concurrency",
"fieldType": "int",
"fieldCategory": "advanced"
Expand All @@ -6377,7 +6377,7 @@
"required": false,
"desc": "Maximum number of concurrent blocks synching per tenant.",
"fieldValue": null,
"fieldDefaultValue": 20,
"fieldDefaultValue": 4,
"fieldFlag": "blocks-storage.bucket-store.block-sync-concurrency",
"fieldType": "int",
"fieldCategory": "advanced"
Expand Down
4 changes: 2 additions & 2 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ Usage of ./cmd/mimir/mimir:
-blocks-storage.bucket-store.batch-series-size int
This option controls how many series to fetch per batch. The batch size must be greater than 0. (default 5000)
-blocks-storage.bucket-store.block-sync-concurrency int
Maximum number of concurrent blocks synching per tenant. (default 20)
Maximum number of concurrent blocks synching per tenant. (default 4)
-blocks-storage.bucket-store.bucket-index.idle-timeout duration
How long a unused bucket index should be cached. Once this timeout expires, the unused bucket index is removed from the in-memory cache. This option is used only by querier. (default 1h0m0s)
-blocks-storage.bucket-store.bucket-index.max-stale-period duration
Expand Down Expand Up @@ -660,7 +660,7 @@ Usage of ./cmd/mimir/mimir:
-blocks-storage.bucket-store.sync-interval duration
How frequently to scan the bucket, or to refresh the bucket index (if enabled), in order to look for changes (new blocks shipped by ingesters and blocks deleted by retention or compaction). (default 15m0s)
-blocks-storage.bucket-store.tenant-sync-concurrency int
Maximum number of concurrent tenants synching blocks. (default 10)
Maximum number of concurrent tenants synching blocks. (default 1)
-blocks-storage.filesystem.dir string
Local filesystem storage directory. (default "blocks")
-blocks-storage.gcs.bucket-name string
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3474,11 +3474,11 @@ bucket_store:
# (advanced) Maximum number of concurrent tenants synching blocks.
# CLI flag: -blocks-storage.bucket-store.tenant-sync-concurrency
[tenant_sync_concurrency: <int> | default = 10]
[tenant_sync_concurrency: <int> | default = 1]
# (advanced) Maximum number of concurrent blocks synching per tenant.
# CLI flag: -blocks-storage.bucket-store.block-sync-concurrency
[block_sync_concurrency: <int> | default = 20]
[block_sync_concurrency: <int> | default = 4]
# (advanced) Number of Go routines to use when syncing block meta files from
# object storage per tenant.
Expand Down
2 changes: 0 additions & 2 deletions operations/mimir/config.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,6 @@

// When store_gateway_lazy_loading_enabled: true, block index-headers are pre-downloaded but lazy loaded at query time.
// Enabling lazy loading results in faster startup times at the cost of some latency during query time.
// store_gateway_lazy_loading_enabled: false will also reduce the concurrency of blocks syncing;
// this improves startup times when running on HDDs instead of SSDs as it reduces random reads.
store_gateway_lazy_loading_enabled: true,

// Number of memcached replicas for each memcached statefulset
Expand Down
3 changes: 0 additions & 3 deletions operations/mimir/store-gateway.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,6 @@
'blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout': '60m',
} else {
'blocks-storage.bucket-store.index-header.lazy-loading-enabled': 'false',
// Force fewer random disk reads; this increases throughoput and reduces i/o wait on HDDs.
'blocks-storage.bucket-store.block-sync-concurrency': 4,
'blocks-storage.bucket-store.tenant-sync-concurrency': 1,
}) +
$.blocks_chunks_concurrency_connection_config +
$.blocks_chunks_caching_config +
Expand Down
4 changes: 2 additions & 2 deletions pkg/storage/tsdb/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -444,8 +444,8 @@ func (cfg *BucketStoreConfig) RegisterFlags(f *flag.FlagSet) {
f.DurationVar(&cfg.SyncInterval, "blocks-storage.bucket-store.sync-interval", 15*time.Minute, "How frequently to scan the bucket, or to refresh the bucket index (if enabled), in order to look for changes (new blocks shipped by ingesters and blocks deleted by retention or compaction).")
f.Uint64Var(&cfg.SeriesHashCacheMaxBytes, "blocks-storage.bucket-store.series-hash-cache-max-size-bytes", uint64(1*units.Gibibyte), "Max size - in bytes - of the in-memory series hash cache. The cache is shared across all tenants and it's used only when query sharding is enabled.")
f.IntVar(&cfg.MaxConcurrent, "blocks-storage.bucket-store.max-concurrent", 100, "Max number of concurrent queries to execute against the long-term storage. The limit is shared across all tenants.")
f.IntVar(&cfg.TenantSyncConcurrency, "blocks-storage.bucket-store.tenant-sync-concurrency", 10, "Maximum number of concurrent tenants synching blocks.")
f.IntVar(&cfg.BlockSyncConcurrency, "blocks-storage.bucket-store.block-sync-concurrency", 20, "Maximum number of concurrent blocks synching per tenant.")
f.IntVar(&cfg.TenantSyncConcurrency, "blocks-storage.bucket-store.tenant-sync-concurrency", 1, "Maximum number of concurrent tenants synching blocks.")
f.IntVar(&cfg.BlockSyncConcurrency, "blocks-storage.bucket-store.block-sync-concurrency", 4, "Maximum number of concurrent blocks synching per tenant.")
f.IntVar(&cfg.MetaSyncConcurrency, "blocks-storage.bucket-store.meta-sync-concurrency", 20, "Number of Go routines to use when syncing block meta files from object storage per tenant.")
f.DurationVar(&cfg.IgnoreDeletionMarksDelay, "blocks-storage.bucket-store.ignore-deletion-marks-delay", time.Hour*1, "Duration after which the blocks marked for deletion will be filtered out while fetching blocks. "+
"The idea of ignore-deletion-marks-delay is to ignore blocks that are marked for deletion with some delay. This ensures store can still serve blocks that are meant to be deleted but do not have a replacement yet.")
Expand Down

0 comments on commit f299ba7

Please sign in to comment.