Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LRU cache ensureFits() tweaks WRT deadlocking #796

Merged
merged 5 commits into from
Feb 1, 2019

Conversation

GiedriusS
Copy link
Member

Related to #651. An initial attempt to fix this because I am always nervous that some person could unintentionally make our Thanos Store instance hang even if the index cache is quite big :(

Changes

thanos_store_index_cache_items_overflowed_total metric added which shows how many times we have tried to add an item to the cache which was simply too big according to the --index-cache-size parameter. ensureFits() now returns false when we cannot possibly ensure that the passed slice could fit into the cache. The original problem still exists in some form that setPostings or setSeries could block for a longer time if there are a lot of small items in the cache however at least it shouldn't dead-lock.

Verification

Wrote unit tests for it and they pass. It is kind of hard to reproduce this unfortunately in a live environment with the changes but I will see what I can do.

Giedrius Statkevičius added 3 commits January 30, 2019 13:33
This could potentially spin forever if the size of that item is bigger
than the max size so add this special case to handle it instead of
spinning forever. In this case, the oldest one will not be removed in
this function but later an item might be evicted when we will add an
item to the LRU with c.lru.Add() so it will be OK either way -- just
that we will not spin here forever.
* ensureFits() now returns true if the passed items will fit in the
cache
* Wire up the results to a new counter metric
thanos_store_index_cache_items_overflowed_total
* Add cache tests to check the edge cases
@bwplotka bwplotka requested review from bwplotka and domgreen February 1, 2019 12:45
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from the first glance, thanks!

Can we make remoteOlder bit smarter as in suggesiton?

@@ -127,7 +139,11 @@ func (c *indexCache) setPostings(b ulid.ULID, l labels.Label, v []byte) {
c.mtx.Lock()
defer c.mtx.Unlock()

c.ensureFits(v)
fits := c.ensureFits(v)
if !fits {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can inline this.

func (c *indexCache) ensureFits(b []byte) bool {
if uint64(len(b)) > c.maxSize {
return false
}
for c.curSize+uint64(len(b)) > c.maxSize {
c.lru.RemoveOldest()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could do more aggressive removal if the slice is big enough.

Copy link
Member Author

@GiedriusS GiedriusS Feb 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but how the implementation would be like? I have two ideas:

  • If the slice size exceeds a certain threshold then we could remove 2 or more items at a time
  • Get all keys, calculate their sizes, and remove the top X items to make sure there's enough size for the item b?

Either way, the scope would definitely blow up since I do not think it is worth making these kinds of changes without scrupulous benchmarking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, maybe lets leave it for another day? I am mainly interested now in making sure that the loop doesn't go forever

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

Giedrius Statkevičius added 2 commits February 1, 2019 15:42
No need to create an extra variable to hold a boolean.
Inc() is specifically designed for this case so use it instead of
Add(1).
@GiedriusS GiedriusS changed the title LRU cache ensureFits() tweaks LRU cache ensureFits() tweaks WRT deadlocking Feb 1, 2019
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Well done and tested. But would be nice to follow up on more aggressive removal. (:

@bwplotka bwplotka merged commit d1ed0ee into thanos-io:master Feb 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants