-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data race in Get/Set after Close is called #60
Conversation
@@ -51,6 +51,7 @@ type Shard[K comparable, V any] struct { | |||
vgroup *Group[K, V] // used in secondary cache | |||
counter uint | |||
mu *RBMutex | |||
closed bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be an atomic.Bool
? since you can have multiple goroutines trying to read & write to it?
the downside is that it will make all operations slightly slower.
it could also be a bool
but guarded with a RWMutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The closed
field is protected by the shard mutex, and as you can see, I check it immediately before or after accessing the shard hashmap, so the mutex is guaranteed to already be held at that point. Meanwhile, the Close
method also first acquires the shard mutex before updating the closed
field, making it safe to use a basic bool type here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good if you documented which fields are guarded by mu
. For example: https://github.com/openfga/openfga/blob/94dd78f03dc99e51f46e7f71deaaea878e8340c9/pkg/storage/memory/memory.go#L125-L128
This doesn't change anything at runtime but it's useful documentation.
@@ -51,6 +51,7 @@ type Shard[K comparable, V any] struct { | |||
vgroup *Group[K, V] // used in secondary cache | |||
counter uint | |||
mu *RBMutex | |||
closed bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good if you documented which fields are guarded by mu
. For example: https://github.com/openfga/openfga/blob/94dd78f03dc99e51f46e7f71deaaea878e8340c9/pkg/storage/memory/memory.go#L125-L128
This doesn't change anything at runtime but it's useful documentation.
internal/store.go
Outdated
func (s *Store[K, V]) Close() { | ||
|
||
for _, s := range s.shards { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for _, s := range s.shards { | |
for _, shard := range s.shards { |
otherwise it gets confusing reading this, as s
is already the Store itself.
// Close waits for all current read and write operations to complete, | ||
// then clears the hashmap and shuts down the maintenance goroutine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Close waits for all current read and write operations to complete, | |
// then clears the hashmap and shuts down the maintenance goroutine. | |
// Close clears the hashmap and shuts down the maintenance goroutine, | |
// then waits for all current read and write operations to complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer my original version here. For each shard, shard.mu.Lock()
is called first, which ensures that the current read/write operations holding the mutex are completed before closing.
@@ -1081,6 +1104,10 @@ func (s *LoadingStore[K, V]) Get(ctx context.Context, key K) (V, error) { | |||
loaded, err, _ := shard.group.Do(key, func() (Loaded[V], error) { | |||
// load and store should be atomic | |||
shard.mu.Lock() | |||
if shard.closed { | |||
shard.mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked out the code locally and I see that the locking & unlocking code gets weird, in that sometimes a function locks but doesn't unlock (hopefully because it calls a function that does the unlock). You risk introducing deadlock scenarios with this.
My recommendation (not for this PR) is that you at least document each function:
- if it unlocks a mutex
- if it assumes that the mutex has already been locked by the caller.
I also recommend that you use this pattern:
mutex.Lock()
defer mutex.Unlock()
to prevent the issue completely (at the expense that you will be holding the mutex for a longer time).
@@ -201,3 +203,92 @@ func TestStore_SinkWritePolicyWeight(t *testing.T) { | |||
require.Equal(t, 8, int(store.policy.weightedSize)) | |||
|
|||
} | |||
|
|||
func TestStore_CloseRace(t *testing.T) { | |||
store := NewStore[int, int](1000, false, true, nil, nil, nil, 0, 0, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but holy cow that is a lot of parameters 🤣 are all of those mandatory?! If they aren't, I suggest using the "functional options pattern" described here [scroll to option 3]. My team has been using that with quite a lot of success.
@miparnisari I think the suggestions are very reasonable, and some refactoring is definitely needed. I'll address those in a separate PR and keep this one focused on fixing the close race issue. |
Fix #59