-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add separate expirable LRU implementation, with generics #116
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this code the same as the changed code in #113? |
Precisely the same code as in #113, but in a separate file, as making these changes in existing simplelru decreases the performance of the service. |
Hey @jefferai, do you have any feedback/reviews for the PR, additionally? |
No, I'm just busy with personal stuff (moving) on top of work and haven't been able to get to this yet, sorry. |
I've completely forgot about this one, I'm addressing your comments in the meantime. |
Is this PR getting merged anytime soon? |
I have to add few minor changes prior to that, I'll try to add them before the New Year. |
I was hoping to look at combining the two but realistically I don't have time right now. That said there isn't really an issue with having it be separate and trying to merge them later if I see a good way. (Mostly I am just worried that fixes applied to one will be forgotten for the other.) Please address the comments when you get a chance and I will then do another look through and hopefully get it merged. |
c8eb9cd
to
f7ffb3b
Compare
Fixed benchmarks everywhere to calculate the hit ratio correctly. Affects only logging. Benchmarks:
|
f7ffb3b
to
ab4d622
Compare
@paskal Can you please stop force pushing once reviews have started? It breaks links in emails, it makes it harder to tell what has changed since a previous review, and it often ends up dropping comments from GitHub if it can't figure out where it should apply. |
Curious about the choice of using a list and iterating through on a schedule vs. using a heap (e.g. adapting https://golang.google.cn/pkg/container/heap/) and having a timer keyed off of the next expiration (reset on any push if needed). That seems like it would purge more closely to timeouts. |
If I recall correctly, the idea is to break out of checking the list as soon as the first non-expired item is found so that the complexity of cleanup is M, where M is the number of cleaned entries. With heap by expiry date usage, the complexity of cleanup increases to M*log(N), where N is the total number of entries as we are forced to rebalance the heap each time we drop the oldest item from there. However, I see that breaking out of the loop is not implemented there, and I have useless continue: I'll fix that. While we are speaking of the expiring mechanism, also I see that I don't remove the outdated items once I hit them on And last, in another cache implementation, I saw PurgeEvery set to half of entry TTL: It might be a better solution than the currently hardcoded 5 minutes default. What do you think? |
I forgot we're dealing with LRU here, and the expiry check can't stop at first non-expired elements. When items are moved to the front after access, their TTL is left intact. So the cleanup complexity now is plain N, with the additional heap would be M*log(N), and with the additional list would be M. Currently, we have the maximal size of the cache before the clean up, but with an additional list or heap, we can trigger the cleanup call in the background once we find an expired entry so that the size would be more balanced at the cost of more frequent cleanups with locks. Basically, we're delving into the Garbage Collection area of problems:) Another simple improvement I can add to the current setup to decrease the complexity to M is checking for Please let me know if you want me to alter the solution and how. |
So I don't have super great answers here, because anything that simplifies the mechanism necessarily trades off performance. As you noted, we're into the GC side of things. One possibility is to actually load a timer for every single expiring entry. This has the benefit that expiration is essentially instant and you don't have to track anything, but it trades off goroutines. Vault does this, but Vault is not necessarily tracking as many objects as you might want to track here. If we knew for sure that the key is something we could e.g. hash -- or require such a value for expiring entries -- you could use something like a heap but shard entries across some set of these. E.g. hash it, take the first byte as an index into 256 different heaps, and have a background process check each heap periodically. Another approach is time bucketing -- keep a list or heap or something for all items expiring within some period of time. The challenge here is figuring out the bucketing size -- maybe you have 1000 entries expiring every minute, or maybe 1 entry every 1000 minutes -- but it means you only ever need to check one bucket of items when checking for expiration, and you can just run through the list. |
d1a52e0
to
8bfa58f
Compare
I've added expiration per item. Force push due to rebasing on top of master. Performance:
|
dd3daea
to
067032a
Compare
I am worried about a goroutine per item. Nothing stops you from having a million items with a million goroutines. Even partitioning it as some libs do can leave you with a very lengthy stack trace if you SIGQUIT. go-cache for instance uses a pool of goroutines to do expiration and even then any stack trace requires filtering out goroutine after goroutine. |
Any other approach I can think of is more complicated to implement, read, comprehend, and troubleshoot. I know a production project with 200-250k goroutines, so the number is not a big problem. Time buckets mean memory overhead, and with more complicated code, we're still having a big number of goroutines. For example, with buckets of size TTL/100, for 10ms it would be 100us, for 100ms - 1ms, 5m - 6s, 1h - 36s, 4h - 2m44s, 12h - 7m12s, 24h - 14m24s. The overhead of approximately 1% of entries not expiring on time looks reasonable to me. What do you think? |
I agree the number isn't a problem -- the troubleshooting is. A stack trace with 250k goroutines that have to be filtered out in order to get to the important bits makes any project using it much harder to troubleshoot. Time bucketing could bring you down to a small number of goroutines -- in fact, you could just use a single one, with a timer that is reset to the expire time of the next bucket. So you could easily do 100 or 1000 buckets. I think the question then is one of philosophy for whoever wants to use this: do you care about items expiring at their expiration or not? I'm not sure if there is a clear answer, although I lean on the idea that you don't -- it's an LRU so items can be evicted at any time anyways. So probably the contract should be that you can be sure that when it's purged the callback is run, but the docs make it clear that you cannot count on it being purged at a specific time. Thoughts? |
067032a
to
8fcc5ee
Compare
I've reverted the changes introducing per-item cleanup and created 100 buckets cleaned up every 1/100th of TTL, clarifying code and comments in the process. I've outlined the tradeoffs in the code itself: we have ~1% memory overhead and entries are cleaned up to 1% earlier than their expiration date. This cleanup interval and overhead make sense for expiring TTL on intervals varying from 100ms to 10 days.
Because I'm satisfied with the current code and the tradeoffs I've made with overhead. Please let me know what you think. |
Nope, just need time to get back to a review. @mgaffney can look as well. |
@paskal can we move the expirable LRU code to its own package? Having |
Thread-safe. It could be used in place of simplelru.LRU but shouldn't as it has built-in locks, whereas simplelru.LRU doesn't, which allows more effective locking on top of it in top-level package cache implementations.
That sets the memory overhead to approximately 1% mark at the cost of expiring the cache entries up to 1% faster than their TTL expires. Previously, all entries were scanned for expiration by purgeEvery interval, which created computation overhead, and after this commit, we delete entries we want to delete without checking extra ones.
8b6aca2
to
d23ed83
Compare
@mgaffney, it is done; please take a look at the current version. I've had to duplicate the |
It could be moved to internal, so it can be shared without being exported. |
d23ed83
to
1154eab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paskal thank you for all your hard work on this!
Thread-safe. It could be used in place of simplelru.LRU but shouldn't as it has built-in locks, whereas simplelru.LRU doesn't, which allows more effective locking on top of it in top-level package cache implementations.
Similar to what was done in #41. I've tried to make the new cache as close as possible to the existing one regarding code style and behaviour, however existing simplelru.LRU is not thread-safe, as pointed out above, so it's not a drop-in replacement.
Another difference I left in place is that
size=0
on this type of cache means unlimited, which makes sense with the expirable cache.Original work was done in lcw, but it turned out that I don't need it there, and I thought it would be nice to add my changes to upstream (this repo).
This PR is a recreation of #68, which was not merged. cc @jefferai @Dentrax for review as an alternative to #113 (which I unlikely will be able to finish while preserving the current package's speed).