Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP store: Add --skip-window functionality #930

Closed
wants to merge 4 commits into from

Conversation

povilasv
Copy link
Member

@povilasv povilasv commented Mar 15, 2019

Changes

Adds --skip-window to Thanos Store, this allows users to configure time duration, which won't be reported to Thanos Query.

Let's say we do --skip-window=2d and have Prometheus of retention of 60h,

Thanos Query will only query Prometheus via Thanos Sidecar for first 2 days.
For more than 2d, it will hit both Thanos Sidecar + Thanos Store.

Verification

Tested locally in our dev environment (both turned on and turned off)
Also unit tests

@povilasv povilasv force-pushed the store-skip-window branch from 7d0de0b to 417a129 Compare March 15, 2019 12:57
@GiedriusS
Copy link
Member

GiedriusS commented Mar 15, 2019

As talked on Slack, I don't think this should be merged as-is because:

  • We should opt to have --min-time and --max-time options as outlined here store: Improve main pain points for using store gateway against big bucket. #814. They should optionally support relative time so that we could support this use-case;
  • Would love to make this a bit more dynamic because as we know sometimes it takes Thanos Store a long time to be restarted which we would have to do in case we'd want to change the options so perhaps another version of these options should be in Thanos Query - in the --store option and DNS/file SD; plus, Sidecar might become unavailable for any reason - what would you do then? Quickly restart Thanos Store? Doesn't sound like a very viable and user-friendly option;
  • Perhaps add some kind --economical mode to Thanos Query which would only ask Thanos Store node iff the data hadn't been retrieved from Sidecars already.

@GiedriusS
Copy link
Member

In other words, --economic would do #716 something like this but automatic.

@povilasv
Copy link
Member Author

In other words, --economic would do #716 something like this but automatic.

I can't see how would I achieve the same thing with that PR? Could you elaborate?

@povilasv povilasv force-pushed the store-skip-window branch from 0300303 to fbd4241 Compare March 18, 2019 15:11
@GiedriusS
Copy link
Member

GiedriusS commented Mar 19, 2019

Sure, I talked about that PR only as an inspiration. My point is that if we are doing this for the same of efficiency don't want to ask Thanos Stores for data which is in Thanos Sidecars then this functionality belongs in Thanos Query. If we are doing this for the sake of sharding then Thanos Store should instead support only syncing a part of the data with --{min,max}-time together with the support of relative times, not only fixed points in the past. My 2 cents.

@povilasv povilasv force-pushed the store-skip-window branch from 1143071 to d6194e7 Compare March 19, 2019 14:19
@povilasv
Copy link
Member Author

povilasv commented Mar 19, 2019

@GiedriusS I'm not saying this is a long term solution, but it is as simple as possible and mitigates issue #919 completely.

Maybe we can agree on a hidden flag or something like that?

But I think at this point we have changed flags in this project, so maybe it's no a big deal to have it and remove it once proper solution comes in?

FYI after applying this change our Thanos Query 99 histogram_quantile went down drastically:

2019-03-19-095014

@povilasv povilasv force-pushed the store-skip-window branch from d6194e7 to ce62cb5 Compare March 19, 2019 14:40
@povilasv povilasv force-pushed the store-skip-window branch from ce62cb5 to 5721587 Compare March 19, 2019 15:09
@bwplotka
Copy link
Member

Ok, had a long offline discussion with @povilasv

The conclusion so far is that @povilasv have some real slowdown on store-gateway so his focus now is to "mask" slowdowns for some queries by excluding it from query either by various timeouts on Querier or by removing store from query path directly. This still does not fix underlying issue, but good solid timeouts and nice degradation model is still nice to have.

Having this said I agree with @GiedriusS that skip-window is super hacky and it's hard to justify. Also it can be misused but not familiar user. However in my opinion we should add something else which should help @povilasv AND other Thanos use cases.

I mean 2 hidden flags:

  • --store.max-time
  • --store.min-time

Those would define the time range storeAPI (as store gw) will get data for. The most important thing is that those flags should control also what store gw sync. The main use case (apart from @povilasv one - to prioritize the sidecars artificially) is to:

  • Allow user to limit querable retention for some users/applications by deploying separate store gw with this flags
  • Store more (year +) but allow querying only 1y data.. This is to implement something like cold/hot storage differentiator.
  • Mitigate some issues with store or bucket that someone might have temporarily.

I think this can be really useful overall.

Thoughts @povilasv @GiedriusS ?

@bwplotka
Copy link
Member

BTW: #814 (comment)

@povilasv
Copy link
Member Author

closing in favor of #957

@povilasv povilasv closed this Mar 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants