Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the cluster cache by memory instead of number of clusters #956

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

mgautierfr
Copy link
Collaborator

  • Allow the cache to be limited by memory usage instead of number of items
  • Make the cluster cache global.

Fix #947

@mgautierfr mgautierfr force-pushed the memory_cache branch 3 times, most recently from 75e78c6 to 5429143 Compare February 14, 2025 16:33
Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a serious change and I skimmed through the PR very quickly without even trying to comment on it but with the only purpose of getting a rough idea of where it will eventually take us. I will come back to it next week.

Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a high-level critique of the PR. We need to address those concerns about the design of the solution before paying attention to the implementation details.

zim::lru_cache<int, int> cache_lru(1);
zim::lru_cache<int, int, zim::UnitCostEstimation> cache_lru(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is about the full change of test/lrucache.cpp rather than this particular line.

Enhancement of the lru_cache (handling of items with item-dependent cost) is not tested.

Tests that definitely need to be added:

  1. Adding a large item to an already full cache drops the right amount of small items.
  2. What happens when a single item with a cost above the max limit of the cache is added? BTW, should we allow the cache to exceed the limit in such a case (another option is to keep the cache intact and not register the offending item in the cache)? If yes, why don't we allow a few small items to be present in the cache too? This is discussed in more detail in another comment.

Tests that would be nice to have:

  1. Handling of CostEstimation::cost() whose output depends on the internal state of its argument.

Comment on lines 38 to 40
size_t getSize() const override {
return m_reader->size().v;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be used with non-compressed clusters. On platforms supporting mmap() those should be mmap()ed (unless the cluster spans two or more segments of a multi-part ZIM file), in which case for the purposes of this PR it makes more sense to report the cluster cache cost of those clusters as 0 (or find a different way of double-reporting the same memory in cache accounting).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sure about the best answer here.

mmapped content is still kind of an allocation:

  • It reserves a memory address (and so, consume available addresses)
  • Virtual memory is increased by the size of the mmap
  • It is composed of pages that have to be handled by kernel as "classic" memory allocation (which may be, in turn, backed by mmap)

However, they provide so specific features:

  • No io call (once created)
  • Resident memory is increased by the page size only when we actually read in the page.
  • swapping the page to the fs is noop as the content is already in the fs.

We should probably count the resident memory as real memory usage. But we can only (easily) know the virtual memory.

Resident memory depends more on the blobs returned to the user than from the cluster itself (for uncompressed clusters).
For a first version, I tend to think that tick to the virtual memory is "good enough" and at least fulfill the purpose : to not have the cache consuming too much.


@benoit74 Do you know what your measurement was about ? Virtual or resident memory ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reflection I think we should count 0 all the time.

For uncompressed clusters, we know the size of the data but we mmap it only when we read a blob from it (and only the region corresponding to the blob, not the whole cluster)

For compressed clusters, we don't know the size of the compressed stream. So the stream reader has a "virtual" size of the whole zim file size - cluster offset. This size is obviously wrong. We actually read from this stream (and get buffer from it) but it is only temporary buffer to feed the decompression stream.

So we should not use the stream reader size to compute the cache size.

src/cluster.cpp Outdated
Comment on lines 185 to 221
// This function must return a constant size for a given cluster.
// This is important as we want to remove the same size that what we add when we remove
// the cluster from the cache.
// However, because of partial decompression, this size can change:
// - As we advance in the compression, we can create new blob readers in `m_blobReaders`
// - The stream itself may allocate memory.
// To solve this, we take the average and say a cluster's blob readers will half be created and
// so we assume a readers size of half the full uncompressed cluster data size.
// It also appears that when we get the size of the stream, we reach a state where no
// futher allocation will be done by it. Probably because:
// - We already started to decompresse the stream to read the offsets
// - Cluster data size is smaller than window size associated to compression level (?)
// We anyway check that and print a warning if this is not the case, hopping that user will create
// an issue allowing us for further analysis.
size_t zim::ClusterMemorySize::get_cluster_size(const Cluster& cluster) {
if (!cluster.m_memorySize) {
auto base_struct = sizeof(Cluster);
auto offsets_size = sizeof(offset_t) * cluster.m_blobOffsets.size();
auto readers_size = cluster.m_blobOffsets.back().v / 2;
cluster.m_streamSize = cluster.m_reader->getSize();
cluster.m_memorySize = base_struct + offsets_size + readers_size + cluster.m_streamSize;
}
auto streamSize = cluster.m_reader->getSize();
if (streamSize != cluster.m_streamSize) {
std::cerr << "WARNING: stream size have changed from " << cluster.m_streamSize << " to " << streamSize << std::endl;
std::cerr << "Please open an issue on https://github.com/openzim/libzim/issues with this message and the zim file you use" << std::endl;
}
return cluster.m_memorySize;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that for our next revision of the ZIM file format we should consider including the uncompressed size of a compressed cluster in the cluster header.

Also, though this is not the best place for discussing such an idea, I propose to think about possible benefits of introducing an item data cache. In some scenarios, clusters may be too large a unit of caching. Consider a usage pattern when multiple relatively small items all from different clusters are constantly used. The total memory consumed by those items can be orders of magnitude smaller than the size of their clusters. And if the latter exceeds the cluster cache limit, libzim may keep thrashing for no good reason.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that for our next revision of the ZIM file format we should consider including the uncompressed size of a compressed cluster in the cluster header.

This is what I have done in Jubako.
I would like to consider it as the base for the next major revision of ZIM file format but it is a discussion far beyond the current one.

src/cluster.h Outdated
@@ -91,6 +95,15 @@ namespace zim
Blob getBlob(blob_index_t n, offset_t offset, zsize_t size) const;

static std::shared_ptr<Cluster> read(const Reader& zimReader, offset_t clusterOffset);
friend struct ClusterMemorySize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to have a public Cluster::getMemorySize()?

src/cluster.cpp Outdated
auto offsets_size = sizeof(offset_t) * cluster.m_blobOffsets.size();
auto readers_size = cluster.m_blobOffsets.back().v / 2;
cluster.m_streamSize = cluster.m_reader->getSize();
cluster.m_memorySize = base_struct + offsets_size + readers_size + cluster.m_streamSize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to be that precise in your approximate estimation of the memory usage by the cluster, shouldn't you multiply readers_size by the additional memory usage of an individual blob reader?

src/lrucache.h Outdated
Comment on lines 179 to 186
void increaseSize(size_t extra_size) {
// increaseSize is called after we have added a value to the cache to update
// the size of the current cache.
// We must ensure that we don't drop the value we just added.
// While it is technically ok to keep no value if max cache size is 0 (or memory size < of the size of one cluster)
// it will make recreate the value all the time.
// This is especially problematic with clusters.
// Let's be nice with our user and be tolerent to miss configuration.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. "miss configuration" sounds rather like a beauty contest. I guess you meant "misconfiguration" :)
  2. Clusters should not be mentioned in this file. However the raised issue deserves some discussion. The proposal to be tolerant and allow a single item in a cache even if its size exceeds the cache size limit can be helpful in some scenarios but harmful in others like the following one: suppose that the working set for some usage pattern of a ZIM file involves two or more clusters - one large cluster above the limit and the rest cumulatively below the limit. Then you will have to rotate them all through the cache all the time, while you could keep the small ones in the cache and only re-read the big cluster. Alternatively, if you want to make the cache smarter, you may keep the large cluster always in the cache and never evict it from the cache just because you want to replace it with a tiny one. But that means keeping track of some statistics and making decisions based on the history of items entering and leaving the cache. The API of lru_cache will need to be enhanced to take CostEstimation not as a type but an object and the semantics of CostEstimation::cost() will need to include not only the cost of maintaining the given object in the cache but also the cost of recreating it once it is dropped from the cache.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"miss configuration" made it to the next stage of the contest but I believe that the jury (being technical and faithful to its mission) will not be tricked and will disqualify her

Comment on lines +57 to 79
template<typename key_t, typename value_t, typename CostEstimation>
class lru_cache {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a serious enhancement that should be difficult to do right. LRU is about heuristics of maintaining items in the cache based on the order they have been used. Introducing cost adds an important dimension to consider - when facing the choice of dropping from the cache an expensive old item or a cheap not-so-old one, which one is more justified? Or the other way around - it may be preferable to get rid of a heavy item that is not the first candidate for removal according to the LRU heuristics since that will make more room in the cache for multiple lightweight items.

@mgautierfr
Copy link
Collaborator Author

Well, I kind of agree with all the high level critics.
However, we spend time with @kelson42, @rgaudin and @benoit74 to discuss how to do this. Please see discussions in #947 and #946
@kelson42 was really about doing something "simple", I was not sure it was possible as I had the same concerns than you.

However, lots of your concerns (and mines) are especially about a particular conditions:
When the cache limit is small and under the size of one or two items (clusters in our case).
But clusters sizes are mostly limited to 2MB (uncompressed). From traces, we should be around 10MB for a vast majority of clusters. As we limit the size of the cache to 512MB[*] by default, we should not see this case. If user use a too small limit (< 10MB ?), I think we can easily argue that libzim need more to be efficient.

So I follow kelson lead here (and agree with it): Let's do something here and tweak the algorithm, if needed, later based on specific use cases.
A real analyze of the behavior with a lot of zim files can probably be done only with @benoit74 and @rgaudin on https://library.kiwix.org and we need a first version of global cache to do it.

Although this cache system is far to be perfect, I think it is compliant with the kelson's request:

Remember, I'm not asking to build the smartest cache system. I'm just asking for a caching system which does not run out of memory and works in a reasonable manner.

[*] It seems that I forgot to upgrade this default value, but it is what expected from #947 (comment)

@benoit74
Copy link

Yes, for now the goal is not to allow to run on as little memory as possible but to be able to ensure to not use excessive memory.

library.kiwix.org often uses 10s of GBs of memory currently. dev.library.kiwix.org with only 80 ZIMs already uses a bit less than 500M but it keeps growing. In addition to operational issues this causes on our servers, this also causes serious questions about the ability to run kiwix-serve on a RaspberryPi / desktop with any "serious" traffic / amount of ZIMs, hence this effort.

@veloman-yunkan
Copy link
Collaborator

However, lots of your concerns (and mines) are especially about a particular conditions:
When the cache limit is small and under the size of one or two items (clusters in our case).
But clusters sizes are mostly limited to 2MB (uncompressed). From traces, we should be around 10MB for a vast majority of clusters. As we limit the size of the cache to 512MB[*] by default, we should not see this case. If user use a too small limit (< 10MB ?), I think we can easily argue that libzim need more to be efficient.

@mgautierfr If the assumption is that the cache will contain significantly more than a few items, then let's take it for granted, state it explicitly in the code and pursue an implementation that takes advantage of it (instead of considering edge cases that don't fit that assumption).

Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are several comments about the implementation. The summary of this and previous review is that this enhancement can be a little simplified and must be tested better (both with new unit tests for the cache classes and a new unit test in the archive test suite ensuring that mmap()-ed clusters are accounted in the cache appropriately).

Comment on lines 33 to 56
template<typename CostEstimation>
struct FutureToValueCostEstimation {
template<typename T>
static size_t cost(const std::shared_future<T>& future) {
// The future is the value in the cache.
// When calling getOrPut, if the key is not in the cache,
// we add a future and then we compute the value and set the future.
// But lrucache call us when we add the future, meaning before we have
// computed the value. If we wait here (or use future.get), we will dead lock
// as we need to exit before setting the value.
// So in this case, we return 0. `ConcurrentCache::getOrPut` will correctly increase
// the current cache size when it have an actual value.
// We still need to compute the size of the value if the future has a value as it
// is also use to decrease the cache size when the value is drop.
using namespace std::chrono_literals;
std::future_status status = future.wait_for(0s);
if (status == std::future_status::ready) {
return CostEstimation::cost(future.get());
} else {
return 0;
}
}

};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that such a tweak is not needed. Even if it addresses some scenarios, we better avoid it. You can use a dummy cost estimation config that returns 0 unconditionally. The change in ConcurrentCache::getOrPut() should be enough.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost method is also called when we remove the value from the cache. If we use a dummy cost of 0 all the time, ConcurrentCache would have to decrease the cache size itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand the current assumption is that the cost of a cached item doesn't change, otherwise the total cost maintained by the cache will be wrong (in particular it may underflow and make the cache stop functioning). Then the cost value returned during addition to the cache can be recorded with the item. This will protect against the said potential problem if the cost of an item happens to depend on its internal state, at the much more acceptable cost (pun intended) of the cache not correctly respecting the cost limit. Another effect will be a little gain in performance if cost computation is costly.

Copy link
Collaborator Author

@mgautierfr mgautierfr Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. But if we store the size of the item in the lru_cache. We cannot simply use a dummy value of 0 here. Or else we would have to update the stored item's size on top of simply increase the total cache size.

To do so, we would have to extend the lru cache api to provide an access to the store individual size (or store a pair<size_t, T> instead of T and change all the call sites)

Not sure it worth it in this context where we have only one use case of non const (per item) size and lru/concurrent cache is not public.

Base automatically changed from cache_config to main February 27, 2025 12:56
@mgautierfr mgautierfr force-pushed the memory_cache branch 4 times, most recently from 0bcbf3d to e99f9c0 Compare February 27, 2025 14:37
@mgautierfr
Copy link
Collaborator Author

Last PR version is rebased on new main and takes your input into consideration.
Please review it as a new PR. Keeping thing in fixup commit was too complex with the rebase on new version of #950 which change methods casing. Sorry about that.

On top of small changes based on your review, the important change is how we compute the size of a cluster (this is subject to discussion):

  • For uncompressed cluster, we only count the needed size to store the offsets.
  • For compressed cluster, we count:
    . the offsets size
    . the size of uncompressed data / 2 (we assume that, in average, we decompress half of the cluster)
    . the input size clamped to the size of the uncompressed data to count only resident memory size [*]

[*] The decompression stream will allocate a memory for the configured window (depending of compression option). This allocation is always accepted by the kernel. However, the actual reservation of the memory pages will be made when process will access the memory. So if compression options tell a window size of 8MB and the cluster size is 1MB, we will allocate 8MB (virtual memory size) but only 1MB will be used (resident memory size).

We may want to also clamp the input size to half of the uncompressed data size as decompression stream will not store more than what we decompress. But I don't know what is the size of other decompression structures, so I prefer to be conservative here.

Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a partial review focusing exclusively on lru_cache

src/lrucache.h Outdated
Comment on lines 171 to 215
if (!extra_size) {
// Don't try to remove an item if we have new size == 0.
// This is the case when concurent cache add a future without value.
// We will handle the real increase size when concurent cache will directly call us.
return;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation and the if statement itself seem to be superfluous - assuming that the cache correctly maintains the size the rest of this function will have no effect anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a subtle side effect to remove this test.

Without it, if we have a cache already populated with only one item and a _current_size greater than max. When we add the "empty" future, we have two items and now we have to drop the previous item from the cache. When we will have the real cost of the cluster in concurrent cache, we correctly increase the size of the cache on the second call to increaseSize.
With the test, the first call does nothing and we drop the first item only when we have created the cluster and computed its size.

At the end we have the same thing, but as we are in concurrent context I prefer to have the first call being a noop and drop the first item on second call instead of have for a transient time an empty cache.

cache_lru.put(7, 777);
EXPECT_TRUE(cache_lru.exists(7));
EXPECT_EQ(777, cache_lru.get(7));
EXPECT_EQ(1u, cache_lru.size());
}

TEST(CacheTest, OverwritingPut) {
zim::lru_cache<int, int> cache_lru(1);
zim::lru_cache<int, int, zim::UnitCostEstimation> cache_lru(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it make sense to default to UnitCostEstimation for the cost estimation template parameter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I go, the more I prefer explicit better than implicit.
Needing to specify the UnitCostEstimation is only problematic for testing. In src is is specified only once and it is the same than for FutureToValueCostEstimation.

src/lrucache.h Outdated
@@ -103,6 +111,8 @@ class lru_cache {
auto it = _cache_items_map.find(key);
if (it != _cache_items_map.end()) {
_cache_items_list.splice(_cache_items_list.begin(), _cache_items_list, it->second);
_current_size -= CostEstimation::cost(it->second->second);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this enhancement is exclusively about caching of clusters (i.e. objects with relatively high cost). However when considering lru_cache as a general purpose utility, we have the following problems:

  1. Without any documentation present, cost is an abstract concept and it can represent something different from memory usage. For example,
    a. the time needed to recreate the object if it's dropped from the cache,
    b. consumption of some other type of storage (disk-space, video memory, next level cache)
    c. usage of some limited set of resources (file descriptors, network connections, etc)
  2. If cost is defined to stand for RAM usage only then memory consumption by the cache data structures should be included in the accounting.

The above concern, should at least be addressed in documentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without any documentation present, cost is an abstract concept and it can represent something different from memory usage.

I agree with you. This is why there is no documentation present 🙂 Cost may be whatever the user of the lru_cache want to track and limit by.
And this lru_cache is not part of public api. It is used only in two specific and well identified place (dirent cache and cluster cache). As such, I don't consider lru_cache (this implementation at least) as a general purpose utility. I have added few comments anyway.

src/lrucache.h Outdated
Comment on lines 179 to 186
void increaseSize(size_t extra_size) {
// increaseSize is called after we have added a value to the cache to update
// the size of the current cache.
// We must ensure that we don't drop the value we just added.
// While it is technically ok to keep no value if max cache size is 0 (or memory size < of the size of one cluster)
// it will make recreate the value all the time.
// This is especially problematic with clusters.
// Let's be nice with our user and be tolerent to miss configuration.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"miss configuration" made it to the next stage of the contest but I believe that the jury (being technical and faithful to its mission) will not be tricked and will disqualify her

Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase&fixup. I will review the clean version of the PR.

A ConcurrentCache is a specific lru_cache.
At cache level, "size" is renamed to "cost", representing how much a
cache can store (and how much an item cost). "size" is keep to count
the number of items.

At higher level, we keep the "size" sementics as we are speaking about
the size of the cache, whatever this is.

This is the first step to a cache limited by memory usage.
We need to know the size of our input reader. Especially for compression
stream where we need to know the size of the compression state.
This is useless now as we destroy the cache itself just after but it will
be useful when cache will be global.
Cluster cache will be global. So we cannot test content againt a
ref_archive as getting the data from it will also be impacted by cache
configuration.

So we "preload" all the content first and then to the test.
@mgautierfr
Copy link
Collaborator Author

Done

Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started another thorough review but as soon as the feeling started mounting that in the context of this PR the implementation seems to be overengineered (unless further enhancement is planned) I stopped.

@mgautierfr I'd like to learn your unbiased opinion on my proposal below (as if you hadn't invested all the effort into this implementation) before I resume the review.

@@ -40,15 +40,15 @@ namespace zim
available.
*/
template <typename Key, typename Value>
class ConcurrentCache
class ConcurrentCache: private lru_cache<Key, std::shared_future<Value>>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation in the commit description is not a good one. Private inheritance is not an "is a" relation. To the best of my understanding, this change is made so that ConcurrentCache may have access to the protected API of lru_cache.

Comment on lines 99 to 106
// There is a small window when the valuePromise may be drop from lru cache after
// we set the value but before we increase the size of the cache.
// In this case decrease the size of `cost` before increasing it.
// First of all it should be pretty rare as we have just put the future in the cache so it
// should not be the least used item.
// If it happens, this should not be a problem if current_size is bigger than `cost` (most of the time)
// For the really rare specific case of current cach size being lower than `cost` (if possible),
// `decreaseCost` will clamp the new size to 0.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like such subtle pitfalls here and there (that may turn into time-bombs). This makes me come up with the following question and proposal:

Is memory usage accounting going to expand beyond clusters? If not, then why don't we implement item cost tracking at ConcurrentCache level, leaving lru_cache absolutely intact?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is memory usage accounting going to expand beyond clusters?

I don't know. The may purpose of all this is to avoid a exaggerated memory consumption for long time running kiwix-serve with a "lot" (not defined, relative to the system) of zim to serve.

We have assumed it was the cluster cache taking all this memory. Mainly because we currently have a limit per zim file (and so, a lot of zim files make the cache mostly unlimited) and this cache should be more memory consuming than dirent cache.

So there is no current plan to extend this limitation to anything else for now, but we may if we found our assumption was wrong.

If not, then why don't we implement item cost tracking at ConcurrentCache level, leaving lru_cache absolutely intact?

Your idea seems good at first glance. However thinking twice, here few limitations (not necessarily blockers):

  • We would have to make the inner lru_cache used by ConcurrentCache unlimited (can be easily made with a huge size_t limit, but still)
  • Concurrent cache would have to re-implement all the put/putMissing and drop method to drop itself the clusters from the cache and trace the allocated memory.
  • All the thing about what is the (constant) memory consumption of a cluster would still stand.

At the end, it would simply move all the things about memory consumption a level up (in ConcurrentCache) and make it call lru_cache to drop an item even if its internal limit is not reach.
It would indeed make lru_cache almost unmodified but I'm not sure it would worth it.
(And I don't see a reason we would like to keep lru_cache clean from memory limitation but accept the modifications in ConcurrentCache)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And I don't see a reason we would like to keep lru_cache clean from memory limitation but accept the modifications in ConcurrentCache)

The main benefit would be avoiding design quirks.

  • Concurrent cache would have to re-implement all the put/putMissing and drop method to drop itself the clusters from the cache and trace the allocated memory.

ConcurrentCache is a relatively lightweight wrapper around lru_cache (without any is-a relation between them). Its public API related to memory usage management consists of just three member functions:

  • getOrPut()
  • drop()
  • setMaxSize()

Maintaining memory limits entirely within that small set of functions can be achieved with less changes. It also simplifies the implementation in which the cost of an item is evaluated only once (on addition) and is stored with the item.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main benefit would be avoiding design quirks.

But ConcurrentCache is a thin wrapper to make the lru_cache thread safe.
Adding a memory limitation on it seems to me this is a quirk we should avoid.

Maintaining memory limits entirely within that small set of functions can be achieved with less changes. It also simplifies the implementation in which the cost of an item is evaluated only once (on addition) and is stored with the item.

I've started implementing your idea to see if I was wrong. Here few things have faced:

  • We have to decrease the size when we drop an item. So we need to make lru_cache::dropLast return the drop item. However, current drop implementation return false if key is not found. As we want to return the value and can't use std::optional (we limit ourselves to cpp14), we would have to throw the std::out_of_range instead of returning false. Add so all user of drop have to catch it.
  • We still have to drop all cluster from a archive at archive destruction. So we need a dropAll. Either it is implemented in lru_cache (and we have to return all of them to be able to decrease the size) or it is implemented in ConcurrentCache (and we break encapsulation).
  • lru_cache may drop value by itself. This should not happen in practice as ConcurrentCache will configure it to be unlimited. But it create somehow an inconsistent api as we have code which can drop value from lru_cache without ConcurrentCache catching it.

Anyway, last commit implement your idea. We can discuss about real implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it was more complicated than I thought, but it helped to surface a potential problem that was present with the old implementation too. I felt that something was wrong, but I thought that it was just an uncomfortable feeling caused by the arcane interplay between lru_cache and ConcurrentCache. Now with the cost accounting fully done in a single class the problem was easier to spot.

Consider working with two ZIM files - one on (very) slow drive and the other on a fast drive. A cluster from the slow ZIM file is requested. A placeholder for it is created in the cache, and while the cluster is loading, a burst of activity on the fast ZIM file happens that fills the cache with entries from the fast ZIM file and drops the placeholder entry of the slow ZIM file. Since the cost of that placeholder entry is not known, a fake value of 0 is used instead (the result being that the removal of the entry has no effect and leaves no trace). Then loading of the slow cluster completes and the cache cost is increased by its actual memory usage value, but that item is no longer in the cache! Repetition of that scenario can lead to the current "fake" cost of the cache being above the limit, even though the cache is empty!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same scenario can occur more naturally as follows:

  • slow access: cluster not in the cache (cache miss)
  • fast access: cluster already in the cache (cache hit, the cluster is moved to the beginning of the LRU list)

Thus, a cache miss, followed by a large enough number of cache hits (so that the cache miss is pushed to the end of the LRU queue), followed by another cache miss (occurring before the first cache miss is fully resolved), will result in the first cache miss being evicted from the cache before it was even properly recorded and the cache memory usage balance off by the cost of that ill-fated cluster.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last commit should fix this point.
(I have removed the "WIP" commit)

If creation of the value takes a lot of time, we may have move a lot of
value in front in the lru cache and drop the shared_future before we
set it.

So before blindly increase the cost, we check the promise is still here
and readd it else.
@veloman-yunkan
Copy link
Collaborator

I think I have a somewhat cleaner design for memory-limited concurrent cache in mind. I will try to implement it in a branch so that I can address any hidden issues. Also I believe we need a unit test for ConcurrentCache. I will add it too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

About cache limited by memory
3 participants