-
-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cache lock provider allowing multiple locks to be acquired. #299
Fix cache lock provider allowing multiple locks to be acquired. #299
Conversation
…lock tests and changes
Co-authored-by: scharde-encora <[email protected]>
|
||
lock (_lock) | ||
using (await _lock.LockAsync().AnyContext()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use async locks: https://stackoverflow.com/a/20085260
|
||
lock (_lock) | ||
using (await _lock.LockAsync().AnyContext()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use async locks: https://stackoverflow.com/a/20085260
@@ -83,16 +83,15 @@ public async Task<ILock> AcquireAsync(string resource, TimeSpan? timeUntilExpire | |||
bool isTraceLogLevelEnabled = _logger.IsEnabled(LogLevel.Trace); | |||
bool isDebugLogLevelEnabled = _logger.IsEnabled(LogLevel.Debug); | |||
bool shouldWait = !cancellationToken.IsCancellationRequested; | |||
if (isDebugLogLevelEnabled) | |||
_logger.LogDebug("Attempting to acquire lock: {Resource}", resource); | |||
string lockId = GenerateNewLockId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved this up so all lock messages can have LockId.
@@ -907,7 +940,7 @@ public void Dispose() | |||
ItemExpired?.Dispose(); | |||
} | |||
|
|||
private class CacheEntry | |||
private record CacheEntry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this into a record, I had a hunch this could be causing an issue with ConcurrentDictionary.TryUpdate() as it uses a comparison value with IEquatable... https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2.tryupdate?view=net-8.0
@@ -884,13 +911,19 @@ private async Task DoMaintenanceAsync() | |||
|
|||
var utcNow = SystemClock.UtcNow.AddMilliseconds(50); | |||
|
|||
// Remove expired items and items that are infrequently accessed as they may be updated by add. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cache entries that are very recently modified, there might be a high churn rate and the cache value might get overridden via Add/Set. So don't touch it in the background clean up job for a little bit. This made the tests much more stable.
|
||
(string Key, long LastAccessTicks, long InstanceNumber) oldest = (null, Int64.MaxValue, 0); | ||
foreach (var kvp in _memory) | ||
{ | ||
if (kvp.Value.LastAccessTicks < oldest.LastAccessTicks | ||
|| (kvp.Value.LastAccessTicks == oldest.LastAccessTicks && kvp.Value.InstanceNumber < oldest.InstanceNumber)) | ||
bool isExpired = kvp.Value.ExpiresAt < SystemClock.UtcNow; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimized this to grab the first expired one (and break), this doesn't take into account the frequency period like maintenance loop.
|
||
string expiredKey = null; | ||
lock (_lock) | ||
using (await _lock.LockAsync().AnyContext()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use async locks: https://stackoverflow.com/a/20085260
@@ -825,9 +849,9 @@ public async Task SetExpirationAsync(string key, TimeSpan expiresIn) | |||
} | |||
|
|||
Interlocked.Increment(ref _writes); | |||
if (_memory.TryGetValue(key, out var value)) | |||
if (_memory.TryGetValue(key, out var existingEntry)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed them as there was like 3 or 4 different names for existing cache entries and it was driving me crazy while debugging.
if (cacheEntry.ExpiresAt < SystemClock.UtcNow) | ||
if (existingEntry.ExpiresAt < SystemClock.UtcNow) | ||
{ | ||
Interlocked.Increment(ref _misses); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a miss, not a hit.
if (addOnly) | ||
{ | ||
if (!_memory.TryAdd(key, entry)) | ||
_memory.AddOrUpdate(key, entry, (_, existingEntry) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converted this to an AddOrUpdate as it can better handle concurrent attempts. Learned a few things from https://medium.com/gft-engineering/correctly-using-concurrentdictionarys-addorupdate-method-94b7b41719d6
Add call back can be hit multiple times, so just pass it in and update in the update value callback to reset wasUpdated.
return entry; | ||
} | ||
if (isTraceLogLevelEnabled) | ||
_logger.LogTrace("Replacing expired cache key: {Key}", key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added logging to detect this, which was very helpful
if (_memory.TryGetValue(key, out var existingEntry) && existingEntry.ExpiresAt < SystemClock.UtcNow) | ||
{ | ||
if (_memory.TryRemove(key, out var removedEntry)) | ||
{ | ||
if (removedEntry.ExpiresAt >= SystemClock.UtcNow) | ||
throw new Exception("Removed item was not expired"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was actively removing expired items because this method wouldn't get inlined / executed synchronously all the time when this method was called. So concurrent Add/Set calls could update the expired value causing issues. I think this could be better reworked to have a single point for all removes.
{ | ||
RemoveExpiredKey(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just let maintenance remove the item to cut down on the number concurrent updates.
if (value.ExpiresAt >= SystemClock.UtcNow) | ||
return Task.FromResult<TimeSpan?>(value.ExpiresAt.Subtract(SystemClock.UtcNow)); | ||
|
||
RemoveExpiredKey(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just let maintenance remove the item to cut down on the number concurrent updates.
@@ -189,26 +195,33 @@ public Task<int> RemoveByPrefixAsync(string prefix) | |||
|
|||
internal void RemoveExpiredKey(string key, bool sendNotification = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to only call this if we call a method with a past expiration and we return immediately. Otherwise just let maintenance remove the item to cut down on the number concurrent updates.
@ejsmith Here is the minimum changes required to fix this. |
…d consume all the time and maintenance to auto abandon never happens.
…ime as the key was set to expire
…drift, see if this fixes flakey ci test.
… next run time and it scheduling was cancelled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just kind of skimming through and not seeing any issues. Will take your word and the tests passing as proof.
I noticed that multiple locks could be acquired. I've added more test coverage to try and expose the issue.
Here are some logs that lead me down this path:
After some more fixes to how we update AddAsync
Investigated further and saw that it was way more stable, but it failed when maintenance is run
This led me to rework how maintenance removes expired items.