-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify evictable cache implementation #10949
Conversation
eccab78
to
2299a21
Compare
lib/trino-collect/src/test/java/io/trino/collect/cache/TestEvictableCache.java
Outdated
Show resolved
Hide resolved
lib/trino-collect/src/main/java/io/trino/collect/cache/EvictableCache.java
Show resolved
Hide resolved
{ | ||
// Includes entries being computed. Approximate, as allowed per method contract. | ||
return delegate.size(); | ||
dataCache.invalidateAll(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you choose to invalidate first the dataCache
and not the tokens
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't matter really
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's consider 2 threads: T1, T2
//invalidation call T1
Token token = tokens.remove(key); // 1
if (token != null) {
dataCache.invalidate(token); // 3
// get call T2
Token token = tokens.computeIfAbsent(key, ignored -> newToken); // 2
return dataCache.get(token); // 4
tokens
map will contain an outdated useless token.
Per com.google.common.cache.Cache
javadoc:
@param the type of the cache's values, which are not permitted to be null
In case that the value returned by dataCache.get(token)
is null
we should maybe remove remove the token from tokens
. I'm not sure however if there's any gain in this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokens
map will contain an outdated useless token.
that would be a problem, but i don't see this happening in your example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not look correct. It feels you can get leaked entry in dataCache.
Consider following flow:
* (T1) invalidateAll
* (T1) dataCache.invalidateAll();
* entries from cache are gone
* (T2) put(K, V)
* (T2) load new value into `dataCache` for token which is present in the `tokens` map
* (T1) `tokens.clear()`
Now we have an entry in dataCache
for token which is no longer present in tokens
map.
Note: maybe that is not possible if we removalListeners
are executed synchronously as part of dataCache.invalidateAll();
. But I am. not sure if that is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not look correct. It feels you can get leaked entry in dataCache.
correct, there is a small window where a value is loaded and subsequently made inaccessible.
this isn't a leak though, assuming the cache is bounded -- which was my assumption, but was not guaranteed.
I am adding a check in EvictableCacheBuilder mandating the cache to be bounded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel safer would be to get a copy of
tokens
;
that would require switching away from ConcurrentHashMap
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(T1) invalidateAll call
dataCache.invalidateAll(); //1
tokens.clear(); //4
T2 get(key, valueLoader) call
Token<K> token = tokens.computeIfAbsent(key, ignored -> newToken); //2
return dataCache.get(token, valueLoader); //3
After //4 the dataCache
still contains an "lost" entry for which there is no corresponding tokens
entry.
The same problem may occur if we flip the statements in the invalidateAll call to have the tokens cleared first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reproduction scenario which showcases that the cache size is sometimes greater than 0 after calling invalidateAll
:
@Test
public void testInvalidateAllAndLoadConcurrently()
throws Exception
{
AtomicLong remoteState = new AtomicLong(1);
Cache<Integer, Long> cache = EvictableCacheBuilder.newBuilder().maximumSize(10_000).build();
Integer key = 42;
int threads = 10_000;
ExecutorService executor = newFixedThreadPool(threads);
try {
List<Future<Void>> futures = IntStream.range(0, threads)
.mapToObj(threadNumber -> executor.submit(() -> {
if (threadNumber%2 == 0) {
cache.get(key, remoteState::get);
}else{
cache.invalidateAll();
long cacheSize = cache.size();
if (cacheSize > 0){
System.out.println(cacheSize);
}
}
return (Void) null;
}))
.collect(toImmutableList());
futures.forEach(MoreFutures::getFutureValue);
}
finally {
executor.shutdownNow();
executor.awaitTermination(10, SECONDS);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After //4 the
dataCache
still contains an "lost" entry for which there is no correspondingtokens
entry.
absolutely.
lib/trino-collect/src/main/java/io/trino/collect/cache/EvictableCache.java
Show resolved
Hide resolved
lib/trino-collect/src/main/java/io/trino/collect/cache/EvictableCache.java
Show resolved
Hide resolved
lib/trino-collect/src/main/java/io/trino/collect/cache/EvictableCache.java
Show resolved
Hide resolved
2299a21
to
2501486
Compare
lib/trino-collect/src/main/java/io/trino/collect/cache/EvictableCache.java
Show resolved
Hide resolved
2501486
to
ad35166
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
|
||
@Test(timeOut = TEST_TIMEOUT_MILLIS) | ||
public void testDisabledCache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this is disabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it has maximumSize(0)
lib/trino-collect/src/main/java/io/trino/collect/cache/EvictableCache.java
Show resolved
Hide resolved
eaf6eeb
to
6bd36cd
Compare
lib/trino-collect/src/test/java/io/trino/collect/cache/TestEvictableCache.java
Show resolved
Hide resolved
6bd36cd
to
42f50c2
Compare
rebased (conflicts) |
@Override | ||
public String toString() | ||
{ | ||
return format("CacheToken(%s; %s)", Integer.toHexString(hashCode()), key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From hashCode
javadoc:
As much as is reasonably practical, the hashCode method defined
by class {@code Object} does return distinct integers for
distinct objects
IIUC this says same hashcode can be returned for different objects present in VM at the same time (maybe I am reading it incorrectly).
Given that does it make sense to put hashcode as part of the toString
result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i found it helpful and default Object.toString
does exactly that, but i see it could be misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (I read through produciton code but just skimmed test code)
3061254
to
a2ee24c
Compare
@@ -719,6 +718,15 @@ public int hashCode() | |||
{ | |||
return Objects.hash(tableHandle, tupleDomain); | |||
} | |||
|
|||
@Override | |||
public String toString() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is not really related to the current commit.
Reusing the token-based approach of `EvictableLoadingCache` allows implementing both evictable `Cache` (previously known as `EvictableCache`) and evictable `LoadingCache` (`EvictableLoadingCache`) in a concise manner. This also introducing a builder for the caches (`EvictableCacheBuilder`), which unlocks flexibility of `CacheBuilder` (like weighted cache entries), while still preventing unsupported usage patterns. For example, registering `removalListener` is not allowed, as removal listener is internally used by the cache implementation.
CI #10932, some PT timeouts |
978e914
to
a48b9af
Compare
CI #10932 |
Reusing the token-based approach of
EvictableLoadingCache
allowsimplementing both evictable
Cache
(previously known asEvictableCache
) and evictableLoadingCache
(EvictableLoadingCache
)in a concise manner. This also introducing a builder for the caches
(
EvictableCacheBuilder
), which unlocks flexibility ofCacheBuilder
(like weighted cache entries), while still preventingunsupported usage patterns. For example, registering
removalListener
is not allowed, as removal listener is internally used by the cache
implementation.