Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify evictable cache implementation #10949

Merged
merged 1 commit into from
Feb 9, 2022

Conversation

findepi
Copy link
Member

@findepi findepi commented Feb 4, 2022

Reusing the token-based approach of EvictableLoadingCache allows
implementing both evictable Cache (previously known as
EvictableCache) and evictable LoadingCache (EvictableLoadingCache)
in a concise manner. This also introducing a builder for the caches
(EvictableCacheBuilder), which unlocks flexibility of
CacheBuilder (like weighted cache entries), while still preventing
unsupported usage patterns. For example, registering removalListener
is not allowed, as removal listener is internally used by the cache
implementation.

{
// Includes entries being computed. Approximate, as allowed per method contract.
return delegate.size();
dataCache.invalidateAll();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you choose to invalidate first the dataCache and not the tokens ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't matter really

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's consider 2 threads: T1, T2

//invalidation call T1
Token token = tokens.remove(key); // 1
if (token != null) {
dataCache.invalidate(token); // 3

// get call T2
Token token = tokens.computeIfAbsent(key, ignored -> newToken); // 2
return dataCache.get(token); // 4

tokens map will contain an outdated useless token.

Per com.google.common.cache.Cache javadoc:

@param the type of the cache's values, which are not permitted to be null

In case that the value returned by dataCache.get(token) is null we should maybe remove remove the token from tokens. I'm not sure however if there's any gain in this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokens map will contain an outdated useless token.

that would be a problem, but i don't see this happening in your example.

Copy link
Member

@losipiuk losipiuk Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not look correct. It feels you can get leaked entry in dataCache.
Consider following flow:

 * (T1) invalidateAll
 * (T1) dataCache.invalidateAll();
   *  entries from cache are gone
 * (T2) put(K, V)
   * (T2) load new value into `dataCache` for token which is present in the `tokens` map
 * (T1) `tokens.clear()`

Now we have an entry in dataCache for token which is no longer present in tokens map.

Note: maybe that is not possible if we removalListeners are executed synchronously as part of dataCache.invalidateAll();. But I am. not sure if that is true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not look correct. It feels you can get leaked entry in dataCache.

correct, there is a small window where a value is loaded and subsequently made inaccessible.
this isn't a leak though, assuming the cache is bounded -- which was my assumption, but was not guaranteed.

I am adding a check in EvictableCacheBuilder mandating the cache to be bounded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel safer would be to get a copy of tokens;

that would require switching away from ConcurrentHashMap, right?

Copy link
Contributor

@findinpath findinpath Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(T1) invalidateAll call

  dataCache.invalidateAll(); //1
  tokens.clear(); //4

T2 get(key, valueLoader) call

Token<K> token = tokens.computeIfAbsent(key, ignored -> newToken); //2
return dataCache.get(token, valueLoader); //3

After //4 the dataCache still contains an "lost" entry for which there is no corresponding tokens entry.
The same problem may occur if we flip the statements in the invalidateAll call to have the tokens cleared first.

Copy link
Contributor

@findinpath findinpath Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reproduction scenario which showcases that the cache size is sometimes greater than 0 after calling invalidateAll:


    @Test
    public void testInvalidateAllAndLoadConcurrently()
            throws Exception
    {
        AtomicLong remoteState = new AtomicLong(1);

        Cache<Integer, Long> cache = EvictableCacheBuilder.newBuilder().maximumSize(10_000).build();
        Integer key = 42;
        int threads = 10_000;

        ExecutorService executor = newFixedThreadPool(threads);
        try {
            List<Future<Void>> futures = IntStream.range(0, threads)
                    .mapToObj(threadNumber -> executor.submit(() -> {
                        if (threadNumber%2 == 0) {
                            cache.get(key, remoteState::get);
                        }else{
                            cache.invalidateAll();
                            long cacheSize = cache.size();
                            if (cacheSize > 0){
                                System.out.println(cacheSize);
                            }
                        }

                        return (Void) null;
                    }))
                    .collect(toImmutableList());
            futures.forEach(MoreFutures::getFutureValue);

        }
        finally {
            executor.shutdownNow();
            executor.awaitTermination(10, SECONDS);
        }
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After //4 the dataCache still contains an "lost" entry for which there is no corresponding tokens entry.

absolutely.

@findepi findepi force-pushed the findepi/flexi-cache branch from 2299a21 to 2501486 Compare February 4, 2022 20:53
@findepi findepi requested a review from kokosing February 4, 2022 20:53
@findepi findepi force-pushed the findepi/flexi-cache branch from 2501486 to ad35166 Compare February 5, 2022 06:03
Copy link
Member

@homar homar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}

@Test(timeOut = TEST_TIMEOUT_MILLIS)
public void testDisabledCache()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is disabled

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it has maximumSize(0)

@findepi findepi force-pushed the findepi/flexi-cache branch from eaf6eeb to 6bd36cd Compare February 5, 2022 18:19
@findepi findepi force-pushed the findepi/flexi-cache branch from 6bd36cd to 42f50c2 Compare February 7, 2022 12:02
@findepi
Copy link
Member Author

findepi commented Feb 7, 2022

rebased (conflicts)

@Override
public String toString()
{
return format("CacheToken(%s; %s)", Integer.toHexString(hashCode()), key);
Copy link
Member

@losipiuk losipiuk Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From hashCode javadoc:

As much as is reasonably practical, the hashCode method defined
by class {@code Object} does return distinct integers for
distinct objects

IIUC this says same hashcode can be returned for different objects present in VM at the same time (maybe I am reading it incorrectly).
Given that does it make sense to put hashcode as part of the toString result?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i found it helpful and default Object.toString does exactly that, but i see it could be misleading.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (I read through produciton code but just skimmed test code)

@@ -719,6 +718,15 @@ public int hashCode()
{
return Objects.hash(tableHandle, tupleDomain);
}

@Override
public String toString()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is not really related to the current commit.

@findinpath findinpath self-requested a review February 9, 2022 06:17
Reusing the token-based approach of `EvictableLoadingCache` allows
implementing both evictable `Cache` (previously known as
`EvictableCache`) and evictable `LoadingCache` (`EvictableLoadingCache`)
in a concise manner. This also introducing a builder for the caches
(`EvictableCacheBuilder`), which unlocks flexibility of
`CacheBuilder` (like weighted cache entries), while still preventing
unsupported usage patterns. For example, registering `removalListener`
is not allowed, as removal listener is internally used by the cache
implementation.
@findepi
Copy link
Member Author

findepi commented Feb 9, 2022

CI #10932, some PT timeouts
also some related test failures

@findepi findepi force-pushed the findepi/flexi-cache branch from 978e914 to a48b9af Compare February 9, 2022 11:46
@findepi
Copy link
Member Author

findepi commented Feb 9, 2022

CI #10932

@findepi findepi merged commit 3904620 into trinodb:master Feb 9, 2022
@findepi findepi deleted the findepi/flexi-cache branch February 9, 2022 16:19
@github-actions github-actions bot added this to the 371 milestone Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants