-
-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit the cluster cache by memory instead of number of clusters #956
base: main
Are you sure you want to change the base?
Conversation
c2ce797
to
d8c8678
Compare
75e78c6
to
5429143
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a serious change and I skimmed through the PR very quickly without even trying to comment on it but with the only purpose of getting a rough idea of where it will eventually take us. I will come back to it next week.
55616af
to
f3ea99f
Compare
f3ea99f
to
cc02757
Compare
5429143
to
10134b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a high-level critique of the PR. We need to address those concerns about the design of the solution before paying attention to the implementation details.
zim::lru_cache<int, int> cache_lru(1); | ||
zim::lru_cache<int, int, zim::UnitCostEstimation> cache_lru(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is about the full change of test/lrucache.cpp
rather than this particular line.
Enhancement of the lru_cache
(handling of items with item-dependent cost) is not tested.
Tests that definitely need to be added:
- Adding a large item to an already full cache drops the right amount of small items.
- What happens when a single item with a cost above the max limit of the cache is added? BTW, should we allow the cache to exceed the limit in such a case (another option is to keep the cache intact and not register the offending item in the cache)? If yes, why don't we allow a few small items to be present in the cache too? This is discussed in more detail in another comment.
Tests that would be nice to have:
- Handling of
CostEstimation::cost()
whose output depends on the internal state of its argument.
src/rawstreamreader.h
Outdated
size_t getSize() const override { | ||
return m_reader->size().v; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be used with non-compressed clusters. On platforms supporting mmap()
those should be mmap()
ed (unless the cluster spans two or more segments of a multi-part ZIM file), in which case for the purposes of this PR it makes more sense to report the cluster cache cost of those clusters as 0 (or find a different way of double-reporting the same memory in cache accounting).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No sure about the best answer here.
mmapped content is still kind of an allocation:
- It reserves a memory address (and so, consume available addresses)
- Virtual memory is increased by the size of the mmap
- It is composed of pages that have to be handled by kernel as "classic" memory allocation (which may be, in turn, backed by mmap)
However, they provide so specific features:
- No io call (once created)
- Resident memory is increased by the page size only when we actually read in the page.
- swapping the page to the fs is noop as the content is already in the fs.
We should probably count the resident memory as real memory usage. But we can only (easily) know the virtual memory.
Resident memory depends more on the blobs returned to the user than from the cluster itself (for uncompressed clusters).
For a first version, I tend to think that tick to the virtual memory is "good enough" and at least fulfill the purpose : to not have the cache consuming too much.
@benoit74 Do you know what your measurement was about ? Virtual or resident memory ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reflection I think we should count 0 all the time.
For uncompressed clusters, we know the size of the data but we mmap it only when we read a blob from it (and only the region corresponding to the blob, not the whole cluster)
For compressed clusters, we don't know the size of the compressed stream. So the stream reader has a "virtual" size of the whole zim file size - cluster offset
. This size is obviously wrong. We actually read from this stream (and get buffer from it) but it is only temporary buffer to feed the decompression stream.
So we should not use the stream reader size to compute the cache size.
src/cluster.cpp
Outdated
// This function must return a constant size for a given cluster. | ||
// This is important as we want to remove the same size that what we add when we remove | ||
// the cluster from the cache. | ||
// However, because of partial decompression, this size can change: | ||
// - As we advance in the compression, we can create new blob readers in `m_blobReaders` | ||
// - The stream itself may allocate memory. | ||
// To solve this, we take the average and say a cluster's blob readers will half be created and | ||
// so we assume a readers size of half the full uncompressed cluster data size. | ||
// It also appears that when we get the size of the stream, we reach a state where no | ||
// futher allocation will be done by it. Probably because: | ||
// - We already started to decompresse the stream to read the offsets | ||
// - Cluster data size is smaller than window size associated to compression level (?) | ||
// We anyway check that and print a warning if this is not the case, hopping that user will create | ||
// an issue allowing us for further analysis. | ||
size_t zim::ClusterMemorySize::get_cluster_size(const Cluster& cluster) { | ||
if (!cluster.m_memorySize) { | ||
auto base_struct = sizeof(Cluster); | ||
auto offsets_size = sizeof(offset_t) * cluster.m_blobOffsets.size(); | ||
auto readers_size = cluster.m_blobOffsets.back().v / 2; | ||
cluster.m_streamSize = cluster.m_reader->getSize(); | ||
cluster.m_memorySize = base_struct + offsets_size + readers_size + cluster.m_streamSize; | ||
} | ||
auto streamSize = cluster.m_reader->getSize(); | ||
if (streamSize != cluster.m_streamSize) { | ||
std::cerr << "WARNING: stream size have changed from " << cluster.m_streamSize << " to " << streamSize << std::endl; | ||
std::cerr << "Please open an issue on https://github.com/openzim/libzim/issues with this message and the zim file you use" << std::endl; | ||
} | ||
return cluster.m_memorySize; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that for our next revision of the ZIM file format we should consider including the uncompressed size of a compressed cluster in the cluster header.
Also, though this is not the best place for discussing such an idea, I propose to think about possible benefits of introducing an item data cache. In some scenarios, clusters may be too large a unit of caching. Consider a usage pattern when multiple relatively small items all from different clusters are constantly used. The total memory consumed by those items can be orders of magnitude smaller than the size of their clusters. And if the latter exceeds the cluster cache limit, libzim may keep thrashing for no good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that for our next revision of the ZIM file format we should consider including the uncompressed size of a compressed cluster in the cluster header.
This is what I have done in Jubako.
I would like to consider it as the base for the next major revision of ZIM file format but it is a discussion far beyond the current one.
src/cluster.h
Outdated
@@ -91,6 +95,15 @@ namespace zim | |||
Blob getBlob(blob_index_t n, offset_t offset, zsize_t size) const; | |||
|
|||
static std::shared_ptr<Cluster> read(const Reader& zimReader, offset_t clusterOffset); | |||
friend struct ClusterMemorySize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it better to have a public Cluster::getMemorySize()
?
src/cluster.cpp
Outdated
auto offsets_size = sizeof(offset_t) * cluster.m_blobOffsets.size(); | ||
auto readers_size = cluster.m_blobOffsets.back().v / 2; | ||
cluster.m_streamSize = cluster.m_reader->getSize(); | ||
cluster.m_memorySize = base_struct + offsets_size + readers_size + cluster.m_streamSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to be that precise in your approximate estimation of the memory usage by the cluster, shouldn't you multiply readers_size
by the additional memory usage of an individual blob reader?
src/lrucache.h
Outdated
void increaseSize(size_t extra_size) { | ||
// increaseSize is called after we have added a value to the cache to update | ||
// the size of the current cache. | ||
// We must ensure that we don't drop the value we just added. | ||
// While it is technically ok to keep no value if max cache size is 0 (or memory size < of the size of one cluster) | ||
// it will make recreate the value all the time. | ||
// This is especially problematic with clusters. | ||
// Let's be nice with our user and be tolerent to miss configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "miss configuration" sounds rather like a beauty contest. I guess you meant "misconfiguration" :)
- Clusters should not be mentioned in this file. However the raised issue deserves some discussion. The proposal to be tolerant and allow a single item in a cache even if its size exceeds the cache size limit can be helpful in some scenarios but harmful in others like the following one: suppose that the working set for some usage pattern of a ZIM file involves two or more clusters - one large cluster above the limit and the rest cumulatively below the limit. Then you will have to rotate them all through the cache all the time, while you could keep the small ones in the cache and only re-read the big cluster. Alternatively, if you want to make the cache smarter, you may keep the large cluster always in the cache and never evict it from the cache just because you want to replace it with a tiny one. But that means keeping track of some statistics and making decisions based on the history of items entering and leaving the cache. The API of
lru_cache
will need to be enhanced to takeCostEstimation
not as a type but an object and the semantics ofCostEstimation::cost()
will need to include not only the cost of maintaining the given object in the cache but also the cost of recreating it once it is dropped from the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"miss configuration" made it to the next stage of the contest but I believe that the jury (being technical and faithful to its mission) will not be tricked and will disqualify her
template<typename key_t, typename value_t, typename CostEstimation> | ||
class lru_cache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a serious enhancement that should be difficult to do right. LRU is about heuristics of maintaining items in the cache based on the order they have been used. Introducing cost adds an important dimension to consider - when facing the choice of dropping from the cache an expensive old item or a cheap not-so-old one, which one is more justified? Or the other way around - it may be preferable to get rid of a heavy item that is not the first candidate for removal according to the LRU heuristics since that will make more room in the cache for multiple lightweight items.
Well, I kind of agree with all the high level critics. However, lots of your concerns (and mines) are especially about a particular conditions: So I follow kelson lead here (and agree with it): Let's do something here and tweak the algorithm, if needed, later based on specific use cases. Although this cache system is far to be perfect, I think it is compliant with the kelson's request:
[*] It seems that I forgot to upgrade this default value, but it is what expected from #947 (comment) |
Yes, for now the goal is not to allow to run on as little memory as possible but to be able to ensure to not use excessive memory. library.kiwix.org often uses 10s of GBs of memory currently. dev.library.kiwix.org with only 80 ZIMs already uses a bit less than 500M but it keeps growing. In addition to operational issues this causes on our servers, this also causes serious questions about the ability to run kiwix-serve on a RaspberryPi / desktop with any "serious" traffic / amount of ZIMs, hence this effort. |
@mgautierfr If the assumption is that the cache will contain significantly more than a few items, then let's take it for granted, state it explicitly in the code and pursue an implementation that takes advantage of it (instead of considering edge cases that don't fit that assumption). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are several comments about the implementation. The summary of this and previous review is that this enhancement can be a little simplified and must be tested better (both with new unit tests for the cache classes and a new unit test in the archive test suite ensuring that mmap()
-ed clusters are accounted in the cache appropriately).
src/concurrent_cache.h
Outdated
template<typename CostEstimation> | ||
struct FutureToValueCostEstimation { | ||
template<typename T> | ||
static size_t cost(const std::shared_future<T>& future) { | ||
// The future is the value in the cache. | ||
// When calling getOrPut, if the key is not in the cache, | ||
// we add a future and then we compute the value and set the future. | ||
// But lrucache call us when we add the future, meaning before we have | ||
// computed the value. If we wait here (or use future.get), we will dead lock | ||
// as we need to exit before setting the value. | ||
// So in this case, we return 0. `ConcurrentCache::getOrPut` will correctly increase | ||
// the current cache size when it have an actual value. | ||
// We still need to compute the size of the value if the future has a value as it | ||
// is also use to decrease the cache size when the value is drop. | ||
using namespace std::chrono_literals; | ||
std::future_status status = future.wait_for(0s); | ||
if (status == std::future_status::ready) { | ||
return CostEstimation::cost(future.get()); | ||
} else { | ||
return 0; | ||
} | ||
} | ||
|
||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that such a tweak is not needed. Even if it addresses some scenarios, we better avoid it. You can use a dummy cost estimation config that returns 0 unconditionally. The change in ConcurrentCache::getOrPut()
should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost
method is also called when we remove the value from the cache. If we use a dummy cost of 0 all the time, ConcurrentCache
would have to decrease the cache size itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand the current assumption is that the cost of a cached item doesn't change, otherwise the total cost maintained by the cache will be wrong (in particular it may underflow and make the cache stop functioning). Then the cost value returned during addition to the cache can be recorded with the item. This will protect against the said potential problem if the cost of an item happens to depend on its internal state, at the much more acceptable cost (pun intended) of the cache not correctly respecting the cost limit. Another effect will be a little gain in performance if cost computation is costly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. But if we store the size of the item in the lru_cache. We cannot simply use a dummy value of 0 here. Or else we would have to update the stored item's size on top of simply increase the total cache size.
To do so, we would have to extend the lru cache api to provide an access to the store individual size (or store a pair<size_t, T>
instead of T
and change all the call sites)
Not sure it worth it in this context where we have only one use case of non const (per item) size and lru/concurrent cache is not public.
cc02757
to
c853517
Compare
faa9871
to
71bdb79
Compare
c853517
to
8dd1508
Compare
0bcbf3d
to
e99f9c0
Compare
Last PR version is rebased on new On top of small changes based on your review, the important change is how we compute the size of a cluster (this is subject to discussion):
[*] The decompression stream will allocate a memory for the configured window (depending of compression option). This allocation is always accepted by the kernel. However, the actual reservation of the memory pages will be made when process will access the memory. So if compression options tell a window size of 8MB and the cluster size is 1MB, we will allocate 8MB (virtual memory size) but only 1MB will be used (resident memory size). We may want to also clamp the input size to half of the uncompressed data size as decompression stream will not store more than what we decompress. But I don't know what is the size of other decompression structures, so I prefer to be conservative here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a partial review focusing exclusively on lru_cache
src/lrucache.h
Outdated
if (!extra_size) { | ||
// Don't try to remove an item if we have new size == 0. | ||
// This is the case when concurent cache add a future without value. | ||
// We will handle the real increase size when concurent cache will directly call us. | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This explanation and the if
statement itself seem to be superfluous - assuming that the cache correctly maintains the size the rest of this function will have no effect anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a subtle side effect to remove this test.
Without it, if we have a cache already populated with only one item and a _current_size
greater than max. When we add the "empty" future, we have two items and now we have to drop the previous item from the cache. When we will have the real cost of the cluster in concurrent cache, we correctly increase the size of the cache on the second call to increaseSize
.
With the test, the first call does nothing and we drop the first item only when we have created the cluster and computed its size.
At the end we have the same thing, but as we are in concurrent context I prefer to have the first call being a noop and drop the first item on second call instead of have for a transient time an empty cache.
cache_lru.put(7, 777); | ||
EXPECT_TRUE(cache_lru.exists(7)); | ||
EXPECT_EQ(777, cache_lru.get(7)); | ||
EXPECT_EQ(1u, cache_lru.size()); | ||
} | ||
|
||
TEST(CacheTest, OverwritingPut) { | ||
zim::lru_cache<int, int> cache_lru(1); | ||
zim::lru_cache<int, int, zim::UnitCostEstimation> cache_lru(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't it make sense to default to UnitCostEstimation
for the cost estimation template parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I go, the more I prefer explicit better than implicit.
Needing to specify the UnitCostEstimation
is only problematic for testing. In src
is is specified only once and it is the same than for FutureToValueCostEstimation
.
src/lrucache.h
Outdated
@@ -103,6 +111,8 @@ class lru_cache { | |||
auto it = _cache_items_map.find(key); | |||
if (it != _cache_items_map.end()) { | |||
_cache_items_list.splice(_cache_items_list.begin(), _cache_items_list, it->second); | |||
_current_size -= CostEstimation::cost(it->second->second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this enhancement is exclusively about caching of clusters (i.e. objects with relatively high cost). However when considering lru_cache
as a general purpose utility, we have the following problems:
- Without any documentation present, cost is an abstract concept and it can represent something different from memory usage. For example,
a. the time needed to recreate the object if it's dropped from the cache,
b. consumption of some other type of storage (disk-space, video memory, next level cache)
c. usage of some limited set of resources (file descriptors, network connections, etc) - If cost is defined to stand for RAM usage only then memory consumption by the cache data structures should be included in the accounting.
The above concern, should at least be addressed in documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without any documentation present, cost is an abstract concept and it can represent something different from memory usage.
I agree with you. This is why there is no documentation present 🙂 Cost may be whatever the user of the lru_cache want to track and limit by.
And this lru_cache is not part of public api. It is used only in two specific and well identified place (dirent cache and cluster cache). As such, I don't consider lru_cache
(this implementation at least) as a general purpose utility. I have added few comments anyway.
src/lrucache.h
Outdated
void increaseSize(size_t extra_size) { | ||
// increaseSize is called after we have added a value to the cache to update | ||
// the size of the current cache. | ||
// We must ensure that we don't drop the value we just added. | ||
// While it is technically ok to keep no value if max cache size is 0 (or memory size < of the size of one cluster) | ||
// it will make recreate the value all the time. | ||
// This is especially problematic with clusters. | ||
// Let's be nice with our user and be tolerent to miss configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"miss configuration" made it to the next stage of the contest but I believe that the jury (being technical and faithful to its mission) will not be tricked and will disqualify her
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase&fixup. I will review the clean version of the PR.
A ConcurrentCache is a specific lru_cache.
At cache level, "size" is renamed to "cost", representing how much a cache can store (and how much an item cost). "size" is keep to count the number of items. At higher level, we keep the "size" sementics as we are speaking about the size of the cache, whatever this is. This is the first step to a cache limited by memory usage.
We need to know the size of our input reader. Especially for compression stream where we need to know the size of the compression state.
This is useless now as we destroy the cache itself just after but it will be useful when cache will be global.
Cluster cache will be global. So we cannot test content againt a ref_archive as getting the data from it will also be impacted by cache configuration. So we "preload" all the content first and then to the test.
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started another thorough review but as soon as the feeling started mounting that in the context of this PR the implementation seems to be overengineered (unless further enhancement is planned) I stopped.
@mgautierfr I'd like to learn your unbiased opinion on my proposal below (as if you hadn't invested all the effort into this implementation) before I resume the review.
src/concurrent_cache.h
Outdated
@@ -40,15 +40,15 @@ namespace zim | |||
available. | |||
*/ | |||
template <typename Key, typename Value> | |||
class ConcurrentCache | |||
class ConcurrentCache: private lru_cache<Key, std::shared_future<Value>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explanation in the commit description is not a good one. Private inheritance is not an "is a" relation. To the best of my understanding, this change is made so that ConcurrentCache
may have access to the protected
API of lru_cache
.
src/concurrent_cache.h
Outdated
// There is a small window when the valuePromise may be drop from lru cache after | ||
// we set the value but before we increase the size of the cache. | ||
// In this case decrease the size of `cost` before increasing it. | ||
// First of all it should be pretty rare as we have just put the future in the cache so it | ||
// should not be the least used item. | ||
// If it happens, this should not be a problem if current_size is bigger than `cost` (most of the time) | ||
// For the really rare specific case of current cach size being lower than `cost` (if possible), | ||
// `decreaseCost` will clamp the new size to 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like such subtle pitfalls here and there (that may turn into time-bombs). This makes me come up with the following question and proposal:
Is memory usage accounting going to expand beyond clusters? If not, then why don't we implement item cost tracking at ConcurrentCache
level, leaving lru_cache
absolutely intact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is memory usage accounting going to expand beyond clusters?
I don't know. The may purpose of all this is to avoid a exaggerated memory consumption for long time running kiwix-serve with a "lot" (not defined, relative to the system) of zim to serve.
We have assumed it was the cluster cache taking all this memory. Mainly because we currently have a limit per zim file (and so, a lot of zim files make the cache mostly unlimited) and this cache should be more memory consuming than dirent cache.
So there is no current plan to extend this limitation to anything else for now, but we may if we found our assumption was wrong.
If not, then why don't we implement item cost tracking at ConcurrentCache level, leaving lru_cache absolutely intact?
Your idea seems good at first glance. However thinking twice, here few limitations (not necessarily blockers):
- We would have to make the inner lru_cache used by ConcurrentCache unlimited (can be easily made with a huge size_t limit, but still)
- Concurrent cache would have to re-implement all the
put
/putMissing
anddrop
method to drop itself the clusters from the cache and trace the allocated memory. - All the thing about what is the (constant) memory consumption of a cluster would still stand.
At the end, it would simply move all the things about memory consumption a level up (in ConcurrentCache) and make it call lru_cache to drop an item even if its internal limit is not reach.
It would indeed make lru_cache almost unmodified but I'm not sure it would worth it.
(And I don't see a reason we would like to keep lru_cache clean from memory limitation but accept the modifications in ConcurrentCache)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And I don't see a reason we would like to keep lru_cache clean from memory limitation but accept the modifications in ConcurrentCache)
The main benefit would be avoiding design quirks.
- Concurrent cache would have to re-implement all the
put
/putMissing
anddrop
method to drop itself the clusters from the cache and trace the allocated memory.
ConcurrentCache
is a relatively lightweight wrapper around lru_cache
(without any is-a relation between them). Its public API related to memory usage management consists of just three member functions:
- getOrPut()
- drop()
- setMaxSize()
Maintaining memory limits entirely within that small set of functions can be achieved with less changes. It also simplifies the implementation in which the cost of an item is evaluated only once (on addition) and is stored with the item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main benefit would be avoiding design quirks.
But ConcurrentCache is a thin wrapper to make the lru_cache thread safe.
Adding a memory limitation on it seems to me this is a quirk we should avoid.
Maintaining memory limits entirely within that small set of functions can be achieved with less changes. It also simplifies the implementation in which the cost of an item is evaluated only once (on addition) and is stored with the item.
I've started implementing your idea to see if I was wrong. Here few things have faced:
- We have to decrease the size when we drop an item. So we need to make
lru_cache::dropLast
return the drop item. However, current drop implementation return false if key is not found. As we want to return the value and can't usestd::optional
(we limit ourselves to cpp14), we would have to throw thestd::out_of_range
instead of returning false. Add so all user of drop have to catch it. - We still have to drop all cluster from a archive at archive destruction. So we need a dropAll. Either it is implemented in lru_cache (and we have to return all of them to be able to decrease the size) or it is implemented in ConcurrentCache (and we break encapsulation).
- lru_cache may drop value by itself. This should not happen in practice as ConcurrentCache will configure it to be unlimited. But it create somehow an inconsistent api as we have code which can drop value from lru_cache without ConcurrentCache catching it.
Anyway, last commit implement your idea. We can discuss about real implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it was more complicated than I thought, but it helped to surface a potential problem that was present with the old implementation too. I felt that something was wrong, but I thought that it was just an uncomfortable feeling caused by the arcane interplay between lru_cache
and ConcurrentCache
. Now with the cost accounting fully done in a single class the problem was easier to spot.
Consider working with two ZIM files - one on (very) slow drive and the other on a fast drive. A cluster from the slow ZIM file is requested. A placeholder for it is created in the cache, and while the cluster is loading, a burst of activity on the fast ZIM file happens that fills the cache with entries from the fast ZIM file and drops the placeholder entry of the slow ZIM file. Since the cost of that placeholder entry is not known, a fake value of 0 is used instead (the result being that the removal of the entry has no effect and leaves no trace). Then loading of the slow cluster completes and the cache cost is increased by its actual memory usage value, but that item is no longer in the cache! Repetition of that scenario can lead to the current "fake" cost of the cache being above the limit, even though the cache is empty!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same scenario can occur more naturally as follows:
- slow access: cluster not in the cache (cache miss)
- fast access: cluster already in the cache (cache hit, the cluster is moved to the beginning of the LRU list)
Thus, a cache miss, followed by a large enough number of cache hits (so that the cache miss is pushed to the end of the LRU queue), followed by another cache miss (occurring before the first cache miss is fully resolved), will result in the first cache miss being evicted from the cache before it was even properly recorded and the cache memory usage balance off by the cost of that ill-fated cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last commit should fix this point.
(I have removed the "WIP" commit)
If creation of the value takes a lot of time, we may have move a lot of value in front in the lru cache and drop the shared_future before we set it. So before blindly increase the cost, we check the promise is still here and readd it else.
I think I have a somewhat cleaner design for memory-limited concurrent cache in mind. I will try to implement it in a branch so that I can address any hidden issues. Also I believe we need a unit test for |
Fix #947