-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid race b/w ServiceDiscoverer events after cancel and re-subscribe #3005
Avoid race b/w ServiceDiscoverer events after cancel and re-subscribe #3005
Conversation
// (https://github.com/reactive-streams/reactive-streams-jvm?tab=readme-ov-file#1.8) new events will | ||
// stop eventually but not guaranteed to stop immediately after cancellation or could race with cancel. | ||
// Therefore, we should check that this is the current Subscriber before processing new events. | ||
if (healthCheckConfig != null && currentSubscriber != this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that for the RoundRobinLoadBalancer
there is still possibility for a race if ServiceDiscoverer
implementation doesn't follow the "use a single thread" recommendation: https://github.com/apple/servicetalk/pull/3002/files#diff-994a09bcaa716664549357bca8bfa8288ee0dd98b0280b139fd30b15cd69bd8eR30-R32
However, taking into account that:
- we are close to switching to
DefaultLoadBalancer
; - the default DNS
ServiceDiscoverer
is single-threaded, and for other implementations it's a trivial fix; - possibility for that race is significantly reduced by the current check;
I don't see a reason to overcomplicate this implementation unless someone has a simple suggestion.
Example of a race:
- SD Thread#1 delivers events for old
EventSubscriber
, crosses this check and OS doesn't execute it further. - Caller Thread#2 triggers cancellation, resubscribes and flips the current subscriber.
- SD Thread#3 delivers "state of the world" for a new
EventSubscriber
. - OS continues execution of Thread#1 and it corrupts the
usedHosts
state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with your assessment that we need not address it: this race has been here for a long time and we've survived until now so a little longer wont hurt. However, if you felt like fixing it I think it could be done with a lock protecting all EventSubscriber.onNext
calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one concern.
...loadbalancer-experimental/src/main/java/io/servicetalk/loadbalancer/DefaultLoadBalancer.java
Show resolved
Hide resolved
...loadbalancer-experimental/src/main/java/io/servicetalk/loadbalancer/DefaultLoadBalancer.java
Outdated
Show resolved
Hide resolved
// (https://github.com/reactive-streams/reactive-streams-jvm?tab=readme-ov-file#1.8) new events will | ||
// stop eventually but not guaranteed to stop immediately after cancellation or could race with cancel. | ||
// Therefore, we should check that this is the current Subscriber before processing new events. | ||
if (healthCheckConfig != null && currentSubscriber != this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with your assessment that we need not address it: this race has been here for a long time and we've survived until now so a little longer wont hurt. However, if you felt like fixing it I think it could be done with a lock protecting all EventSubscriber.onNext
calls.
Motivation: apple#2514 added an ability for `LoadBalancer` to cancel discovery stream and re-subscribe in attempt to trigger a new resolution when it's internal state became `UNHEALTHY`. However, it did not account for a possibility of new events racing with cancellation or a Publisher not stopping updates immediately. As a result, it's possible to end up in a corrupted state if old `EventSubscriber` receives new events after a new `EventSubscriber` received a new "state of the world". Modifications: - Track the current `EventSubscriber` and process events only if it's current. - Enhance `resubscribeToEventsWhenAllHostsAreUnhealthy()` to simulate described behavior. Result: New "state of the world" can not be corrupted by events emitted for the old `EventSubscriber`.
3cf255f
to
0ac49f6
Compare
Motivation:
#2514 added an ability for
LoadBalancer
to cancel discovery stream andre-subscribe in attempt to trigger a new resolution when its internal
state became
UNHEALTHY
. However, it did not account for a possibilityof new events racing with cancellation or a Publisher not stopping
updates immediately. As a result, it's possible to end up in a corrupted
state if the old
EventSubscriber
receives new events after a newEventSubscriber
received a new "state of the world".Modifications:
EventSubscriber
and process events only if it'scurrent.
resubscribeToEventsWhenAllHostsAreUnhealthy()
to simulatethe described behavior.
Result:
New "state of the world" can not be corrupted by events emitted for the
old
EventSubscriber
.Depends on #3004, review only the 2nd commit: d48d8f5