Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store: Improve main pain points for using store gateway against big bucket. #814

Closed
3 of 5 tasks
bwplotka opened this issue Feb 6, 2019 · 17 comments
Closed
3 of 5 tasks

Comments

@bwplotka
Copy link
Member

bwplotka commented Feb 6, 2019

Acceptance criteria:

  1. Improve store startup time: Store: initialisation time #532
  2. Improve store startup S3 traffic & disk size needed: How to reduce the network traffic generated by the s3 store initialization process #664

--> #942

  1. Reduce store "baseline" memory per each block: store: Store gateway consuming lots of memory / OOMing #448
  2. Sync and client filtering of all objects for each group can be slow for large number of objects
  3. Fix nil panic on lazypostings: store (s3, gcs): invalid memory address or nil pointer dereference #335

Initial ideas:

Extra option mentioned below: add --max-time and --min-time to store & compactor to "shard" those within time.

CC @claytono @tdabasinskas @xjewer

@claytono
Copy link
Contributor

claytono commented Feb 6, 2019

Another option @antonio and I have discussed is adding a --mintime and --maxtime flag to thanos store and compacter. If the flag was given, then they would each ignore blocks outside of the time range given, allowing you to run multiple thanos store and compactor components against a single bucket, but also easily repartition by just selecting different time ranges.

@GiedriusS
Copy link
Member

I think we should add this #335 to the list.

@earthdiaosi
Copy link

Is there any update?

@claytono
Copy link
Contributor

claytono commented Mar 1, 2019

I've started work a patch for the--min-time and --max-time functionality. I've got it working for the store code, and I hope to start working on the compactor piece soon.

@bwplotka
Copy link
Member Author

bwplotka commented Mar 1, 2019

Help wanted for other stuff.

We also likely fixed: #335 on master, but tests are pending by @GiedriusS (:

@claytono cool 👍

@earthdiaosi
Copy link

@claytono cool, Can you submit the code about the store first? That's what we need...

@SuperQ
Copy link
Contributor

SuperQ commented Mar 18, 2019

I have a lot of large buckets, many of the index.cache.json files are ~100MB.

One idea that came to mind was to use FlatBuffers.

@GiedriusS
Copy link
Member

@claytono is there any update? It would be nice to solve this in a general way as we discussed here.

@claytono
Copy link
Contributor

I'm hoping to get a PR up for this this week if time allows. For now, my PR only addresses partitioning on the thanos-store side of things. It's not clear to me if there really needs to be similar limiting on the compactor side of things or not. We're planning to do an initial deployment without compactor support for time ranges.

@bwplotka
Copy link
Member Author

Another option @antonio and I have discussed is adding a --mintime and --maxtime

We talk about this as well in @povilasv PR:#930

@midnightconman
Copy link
Contributor

I just tried 0.3.2 on Tuesday, it didn't work for my large buckets in s3. I have 37 prometheus clusters (currently), 9TB of data total, largest bucket is around 700GB. I reverted to 0.2.1 and things are back to normal. High latency and query timeouts were the issues I was seeing. I am running prometheus 2.4.3, not sure if that might have been contributing to the issue.

Do you guys think this work will help towards that end? Thanks for the great work 😄

@GiedriusS
Copy link
Member

@midnightconman have you read the change log? Most likely you need to increase your index cache size (:

@midnightconman
Copy link
Contributor

@midnightconman have you read the change log? Most likely you need to increase your index cache size (:

I did 😄

I tried settings of --index-cache-size=20GB and --chunk-pool-size=200GB, no change. Strangely the disk usage for 0.2.1 and 0.3.2 in /data is the same?

I am not talking about slower queries, like 0.2.1 is 200ms and 0.3.2 is 1000ms... 0.3.2 queries never return for larger buckets.

@baelish
Copy link

baelish commented Jul 29, 2019

Could we have multiple store gateways divide the load between themselves? Ideally I would picture 3 node gateways pointing to a single bucket and they each handle a third of the chunks divided over the whole time period (e.g. all have some newer and older chunks). If another one is added then it would work out a new way to divide it. It would do the same should one disappear. I think this would be nicer then having the user work out the times to set to ensure they match the chunks and it could also prevent the store gateway with the newest chunks doing most work while the ones with older chunks do little.

@claytono
Copy link
Contributor

@baelish That seems ideal. The manual time range partitioning was mostly proposed as something that would be fairly simple to implement and start using quickly. I would guess the issues with doing that would be coordination between them, and the need to publish consistent time ranges. With the latter, I think the issue is that currently stores publish just a mintime and max time, so if you want to have just queries routed to a store that definitely had the blocks, you'd want to make sure the store had a contiguous range of blocks, or change the way they're published such that they can publish multiple time ranges.

@baelish
Copy link

baelish commented Jul 29, 2019

@claytono makes sense, sometimes you need to get things out there quick. Perhaps it could be considered a long term goal.

@bwplotka
Copy link
Member Author

bwplotka commented Nov 1, 2019

Thanks, everyone involved! ❤️

We have now time partitioning and block by external labels sharding as requested in this ticket so we can close this!

For further improvements and ideas tracking issue please see: #1705

Happy Halloween!

@bwplotka bwplotka closed this as completed Nov 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants