store: Improve main pain points for using store gateway against big bucket. #814

bwplotka · 2019-02-06T14:14:09Z

Acceptance criteria:

Improve store startup time: Store: initialisation time #532
Improve store startup S3 traffic & disk size needed: How to reduce the network traffic generated by the s3 store initialization process #664

--> #942

Reduce store "baseline" memory per each block: store: Store gateway consuming lots of memory / OOMing #448
Sync and client filtering of all objects for each group can be slow for large number of objects
Fix nil panic on lazypostings: store (s3, gcs): invalid memory address or nil pointer dereference #335

Initial ideas:

Ad1 & Ad2: precompute index.cache JSONs and compress them (!) in object storage, for store gateway to fetch on startup.
Ad3: Somehow build single symbols maps in memory?
Ad4: Virtual dirs? Allow usage of folders within S3 #697
Ad5: Limit queries: store: add ability to limit max num of samples / concurrent queries #798

Extra option mentioned below: add --max-time and --min-time to store & compactor to "shard" those within time.

CC @claytono @tdabasinskas @xjewer

The text was updated successfully, but these errors were encountered:

claytono · 2019-02-06T14:32:05Z

Another option @antonio and I have discussed is adding a --mintime and --maxtime flag to thanos store and compacter. If the flag was given, then they would each ignore blocks outside of the time range given, allowing you to run multiple thanos store and compactor components against a single bucket, but also easily repartition by just selecting different time ranges.

GiedriusS · 2019-02-06T14:52:38Z

I think we should add this #335 to the list.

earthdiaosi · 2019-03-01T10:32:53Z

Is there any update?

claytono · 2019-03-01T12:31:10Z

I've started work a patch for the--min-time and --max-time functionality. I've got it working for the store code, and I hope to start working on the compactor piece soon.

bwplotka · 2019-03-01T18:15:29Z

Help wanted for other stuff.

We also likely fixed: #335 on master, but tests are pending by @GiedriusS (:

@claytono cool 👍

earthdiaosi · 2019-03-04T03:01:00Z

@claytono cool, Can you submit the code about the store first? That's what we need...

SuperQ · 2019-03-18T16:27:59Z

I have a lot of large buckets, many of the index.cache.json files are ~100MB.

One idea that came to mind was to use FlatBuffers.

GiedriusS · 2019-03-18T19:29:19Z

@claytono is there any update? It would be nice to solve this in a general way as we discussed here.

claytono · 2019-03-19T17:09:09Z

I'm hoping to get a PR up for this this week if time allows. For now, my PR only addresses partitioning on the thanos-store side of things. It's not clear to me if there really needs to be similar limiting on the compactor side of things or not. We're planning to do an initial deployment without compactor support for time ranges.

bwplotka · 2019-03-19T17:21:49Z

Another option @antonio and I have discussed is adding a --mintime and --maxtime

We talk about this as well in @povilasv PR:#930

midnightconman · 2019-03-28T15:52:54Z

I just tried 0.3.2 on Tuesday, it didn't work for my large buckets in s3. I have 37 prometheus clusters (currently), 9TB of data total, largest bucket is around 700GB. I reverted to 0.2.1 and things are back to normal. High latency and query timeouts were the issues I was seeing. I am running prometheus 2.4.3, not sure if that might have been contributing to the issue.

Do you guys think this work will help towards that end? Thanks for the great work 😄

GiedriusS · 2019-03-28T16:56:48Z

@midnightconman have you read the change log? Most likely you need to increase your index cache size (:

midnightconman · 2019-03-29T18:00:08Z

@midnightconman have you read the change log? Most likely you need to increase your index cache size (:

I did 😄

I tried settings of --index-cache-size=20GB and --chunk-pool-size=200GB, no change. Strangely the disk usage for 0.2.1 and 0.3.2 in /data is the same?

I am not talking about slower queries, like 0.2.1 is 200ms and 0.3.2 is 1000ms... 0.3.2 queries never return for larger buckets.

baelish · 2019-07-29T14:11:30Z

Could we have multiple store gateways divide the load between themselves? Ideally I would picture 3 node gateways pointing to a single bucket and they each handle a third of the chunks divided over the whole time period (e.g. all have some newer and older chunks). If another one is added then it would work out a new way to divide it. It would do the same should one disappear. I think this would be nicer then having the user work out the times to set to ensure they match the chunks and it could also prevent the store gateway with the newest chunks doing most work while the ones with older chunks do little.

claytono · 2019-07-29T14:29:54Z

@baelish That seems ideal. The manual time range partitioning was mostly proposed as something that would be fairly simple to implement and start using quickly. I would guess the issues with doing that would be coordination between them, and the need to publish consistent time ranges. With the latter, I think the issue is that currently stores publish just a mintime and max time, so if you want to have just queries routed to a store that definitely had the blocks, you'd want to make sure the store had a contiguous range of blocks, or change the way they're published such that they can publish multiple time ranges.

baelish · 2019-07-29T15:09:58Z

@claytono makes sense, sometimes you need to get things out there quick. Perhaps it could be considered a long term goal.

bwplotka · 2019-11-01T13:17:58Z

Thanks, everyone involved! ❤️

We have now time partitioning and block by external labels sharding as requested in this ticket so we can close this!

For further improvements and ideas tracking issue please see: #1705

Happy Halloween!

bwplotka added feature request/improvement difficulty: hard help wanted priority: P1 labels Feb 6, 2019

GiedriusS mentioned this issue Feb 10, 2019

Compactor Retention Issue and Resource Usage #824

Closed

This was referenced Mar 15, 2019

WIP store: Add --skip-window functionality #930

Closed

store: persistent index cache across restart #902

Closed

claytono mentioned this issue Mar 21, 2019

Add time based partitioning to store component #957

Closed

povilasv mentioned this issue Apr 24, 2019

Store: Add time & duration based partitioning #1077

Closed

bwplotka closed this as completed Nov 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

store: Improve main pain points for using store gateway against big bucket. #814

store: Improve main pain points for using store gateway against big bucket. #814

bwplotka commented Feb 6, 2019 •

edited

Loading

claytono commented Feb 6, 2019

GiedriusS commented Feb 6, 2019

earthdiaosi commented Mar 1, 2019

claytono commented Mar 1, 2019

bwplotka commented Mar 1, 2019

earthdiaosi commented Mar 4, 2019

SuperQ commented Mar 18, 2019

GiedriusS commented Mar 18, 2019

claytono commented Mar 19, 2019

bwplotka commented Mar 19, 2019

midnightconman commented Mar 28, 2019

GiedriusS commented Mar 28, 2019

midnightconman commented Mar 29, 2019

baelish commented Jul 29, 2019

claytono commented Jul 29, 2019

baelish commented Jul 29, 2019

bwplotka commented Nov 1, 2019

store: Improve main pain points for using store gateway against big bucket. #814

store: Improve main pain points for using store gateway against big bucket. #814

Comments

bwplotka commented Feb 6, 2019 • edited Loading

claytono commented Feb 6, 2019

GiedriusS commented Feb 6, 2019

earthdiaosi commented Mar 1, 2019

claytono commented Mar 1, 2019

bwplotka commented Mar 1, 2019

earthdiaosi commented Mar 4, 2019

SuperQ commented Mar 18, 2019

GiedriusS commented Mar 18, 2019

claytono commented Mar 19, 2019

bwplotka commented Mar 19, 2019

midnightconman commented Mar 28, 2019

GiedriusS commented Mar 28, 2019

midnightconman commented Mar 29, 2019

baelish commented Jul 29, 2019

claytono commented Jul 29, 2019

baelish commented Jul 29, 2019

bwplotka commented Nov 1, 2019

bwplotka commented Feb 6, 2019 •

edited

Loading