-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanos stores are unable to share the same redis database (bucket_cache) #6939
Comments
Same for us |
I think the memcached thing is only cosmetic because the remote index cache is situated in a file called memcached.go |
Redis cache should have a config option "max_async_buffer_size" which does feel related. Why blocks are not loading because of it is a mystery though. |
Thanks for your answer @MichaHoffmann.
Yes that was my first feeling but wasn't 100% sure, thank you.
I'm not sure the logs and the fact that the blocks are not loaded are linked finally, as I only launched the queries on the stores that were successfully started with all blocks.
Exactly, I played with it quickly and didn't found the good setup ATM. I can try to tweak the conf, but I wont go deeper right now until the initial issue is fixed, we have currently ~40 different stores to setup and I cannot afford to loose them randomly 😄 I'd be glad to do any tests you need to help finding out this issue 👍 |
We also face the same issue ( though sporadically ); I wonder if we just hit the redis cache too hard and because of network latency the buffer fills too quickly. |
Hi, I did another set of tests increasing the log level (debug) and focused on a specific block.
Could it be more global than just redis ? I've read this issue and we can observe the same kind of logs using memcached 🤔 |
Thanks to the issue mentioned above ☝️ I realized that the failing blocks comes from another stores. @MichaHoffmann Can you tell me if you are in the same situation with more than 1 store using the same cluster ? Is it a known / wanted behavior not be able to share the cache system across different stores or can it be marked as a bug? |
The cache should be shared across different deployments; this is for sure a bug i think |
Ah so we are using redis only for index cache thats why i dont see the bucket meta sync issues i think! For bucket cache we use groupcache which works fine as far as i can tell! |
Yes it looks that only bucket_cache is concerned, I didn't notice any trouble using only index cache, but my tests didn't last long because of the issue mentioned. A potential solution could be be to add kind of storeID that would prefix the keys in the cache so each store would be able to retrieve only the keys that belong to it 🤔 And this would work for either memcache or redis, no? On our side, right now, we will go back and deploy simple redis instances and use a dedicated databases for each stores. About groupcache, if I understand well, a group should only contains stores that point to the same object storage; on our side it would mean multiplying at least by 2 or 3 the number of stores (from ~40 to 80 or 120). I'm not sure it's worth it |
Added path to the hash here #7158 so this should help given that you have a separate file for specifying the bucket cache configuration. Long-term fix would be to add a field like |
I ran into a similar issue with memcached, two or more stores would eventually make long term metrics not usable. Revert to in-memory cache for bucket for now. Definitily would appreciate prioritize this fix. |
I'm having the same issues on a single instance Elasticache Redis with index cache on db 0 and bucket cache on db 1. So, the problem is not that they're sharing the same database. Changing the bucket cache to in-memory, as mentioned by @gjshao44 , solved it. |
Hello guys,
I’m currently working on the redis cache & bucket_cache implementation on some of our store gateways and I’m facing some weird behaviors when enabling redis bucket_cache.
The redis cluster setup is a 3 node cluster running version 7 :
Please find below the cache config that I use :
Index cache config :
Bucket_cache config :
Here's the initial status of the loaded blocks before enabling cache & bucket_cache :
Here's what I see after enabling it :
... and getting the following logs :
Just by curiosity, I wanted to see if I was able to reproduce it exactly so I did the same operation one more time:
This time, when the store has restarted, everything looks just fine 🤷♂️
Thanos version :
What you expected to happen:
I expect that all the blocks are loaded without issue when enabling the bucket_cache
Second point, I don't know if it can be linked, or not (if not, my apologies and I can create another issue), I've then tried to load some metrics (last 3d) and in the logs :
I'm not using any memcache in the configs, so I tried to find where this can come from, and I discovered that it might be related to
NewRemoteIndexCache
function that is called from the factory.go, which is the one from memcache.go.Is it really trying to call memcache even when using redis ? Or is it just a matter or printing and this is the async buffer of the redis implementation that need to be increased ? In both scenarios (if I'm not mistaken), a small explanation / clarification is needed here, because it's confusing. 🤔
Environment:
Thanks a lot ! 🙏
The text was updated successfully, but these errors were encountered: