Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a race condition that may make OperatorMaterialize emit too many terminal notifications #5850

Merged
merged 3 commits into from
Feb 15, 2018

Conversation

pkolaczk
Copy link

@pkolaczk pkolaczk commented Feb 14, 2018

We're using RxJava 1.3.5 and one of our tests occasionally fails (very hard to reproduce). The test asserts that anObservable.toBlocking.toList (RxScala) expression throws a particular exception, because we expect the observable to be in error state by emitting a single onError. However, in rare cases, we observe that there is no exception thrown, and an empty list is returned instead. We verified that the problem is not the observable we use. The observable properly fires the doOnError handler prior to the test assertion failure. Therefore we suspected something wrong happening in the toBlocking.toList conversion. I debugged that in fact this conversion uses OperatorMaterialize as a part of BlockingOperatorToIterator.

I took a look at the OperatorMaterialize and found an obvious bug. The busy variable is never set to true anywhere, despite the comments showing the author of the code intended it to be true when the drain loop was running.

The PR fixes this problem by setting busy to true at the entry of the drain method.

However I still have some doubts about the original code:

  • Why does drain have to be called from requestMore? Why isn't it enough to call it from onCompleted and onError? It looks like the drain logic does anything only if the terminalNotification is set, and it can be set only by onCompleted and onError.
  • What is this drain logic supposed to do? Looks rather complex and the intent of it is not obvious. If we don't call it from requestMore then maybe we don't need the drain logic at all? Why not emit the terminal notifications directly from onError and onCompleted ?

Thanks,
Piotr Kołaczkowski

@codecov
Copy link

codecov bot commented Feb 14, 2018

Codecov Report

Merging #5850 into 1.x will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff              @@
##                1.x    #5850      +/-   ##
============================================
- Coverage     84.28%   84.27%   -0.01%     
- Complexity     2889     2891       +2     
============================================
  Files           290      290              
  Lines         18264    18265       +1     
  Branches       2495     2495              
============================================
  Hits          15393    15393              
- Misses         1989     1990       +1     
  Partials        882      882
Impacted Files Coverage Δ Complexity Δ
...ava/rx/internal/operators/OperatorMaterialize.java 98.38% <100%> (+13.14%) 3 <0> (ø) ⬇️
...ernal/operators/OnSubscribeToObservableFuture.java 92.59% <0%> (-7.41%) 3% <0%> (ø)
...ain/java/rx/internal/operators/OnSubscribeAmb.java 79.13% <0%> (-5.04%) 13% <0%> (ø)
.../rx/internal/schedulers/CachedThreadScheduler.java 86.4% <0%> (-3.89%) 6% <0%> (ø)
src/main/java/rx/observers/SerializedObserver.java 96.73% <0%> (-3.27%) 18% <0%> (-2%)
.../java/rx/internal/operators/BackpressureUtils.java 70.45% <0%> (-2.28%) 28% <0%> (-1%)
src/main/java/rx/subjects/UnicastSubject.java 82.71% <0%> (-1.86%) 9% <0%> (ø)
...x/internal/operators/DeferredScalarSubscriber.java 98.27% <0%> (-1.73%) 24% <0%> (-1%)
...in/java/rx/internal/operators/OnSubscribeRedo.java 80.8% <0%> (-1.61%) 16% <0%> (ø)
src/main/java/rx/observables/SyncOnSubscribe.java 91.79% <0%> (-1.5%) 8% <0%> (ø)
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ba8bb2...c1b7a07. Read the comment docs.

@akarnokd akarnokd added this to the 1.4 milestone Feb 14, 2018
@akarnokd
Copy link
Member

Why does drain have to be called from requestMore?
What is this drain logic supposed to do?

Let's assume the source generates N item, completes and the consumer requests N notifications. Since the source completion is another Notification, it can't be emitted unless the consumer request yet another notification. Since this can happen after the original onCompleted, the request itself has to trigger the emission of that completion notification. Because onXXX methods can race with request calls and trigger emission, there has to be a serialization of signals towards the consumer. The busy/missed drain loop ensures this.

Further reading.

Copy link
Member

@akarnokd akarnokd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you'd adapted this v2 test to verify this change. The TestUtil.race() is there for v1.

@pkolaczk
Copy link
Author

Thanks! I get it. You're right, somehow I missed the fact that onCompleted may be called earlier than the request (still thinking in terms of cold observables we have in our system, that never emit anything until requested).

BTW: This is not the real cause of the bug I'm hunting for, because I've just found another occurrence that was not invoking iterators / lists at all. So I dug more into it, and I found another race in OperatorMerge. I submitted a PR.

Copy link
Collaborator

@davidmoten davidmoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find @pkolaczk, thanks!

@pkolaczk pkolaczk changed the title Fix a race condition that may make OperatorMaterialize miss an item Fix a race condition that may make OperatorMaterialize emit too many terminal notifictions Feb 15, 2018
@pkolaczk pkolaczk changed the title Fix a race condition that may make OperatorMaterialize emit too many terminal notifictions Fix a race condition that may make OperatorMaterialize emit too many terminal notifications Feb 15, 2018
@pkolaczk
Copy link
Author

Unit test confirms that without the patch OperatorMaterialize can emit two onComplete notifications.

@pkolaczk
Copy link
Author

May I rebase and fix the commit message of the fix? It says "miss the item" while in fact it may emit too many items.

terminal notification more than once

The guards in `OperatorMaterialize.ParentSubscriber#drain` were never
working, because `busy` was actually never set to true.
Therefore it was possible that the `drain` loop was executed by
more than one thread concurrently, which could led to undefined
behavior.

This fix sets `busy` to true at the entry of `drain`.
@akarnokd
Copy link
Member

If you really want to.

@pkolaczk pkolaczk force-pushed the materialize-busy-fix branch from 3e754d9 to ad5b16d Compare February 15, 2018 11:15
@pkolaczk
Copy link
Author

Misleading commit messages are bad.

@akarnokd
Copy link
Member

This PR keeps failing due to overcrowded Travis CI. In v2, we solved this with sudo: required in .travis.yml. Could you please update the PR to make this happen on 1.x? https://github.com/ReactiveX/RxJava/blob/1.x/.travis.yml#L4

@akarnokd akarnokd merged commit c40a06f into ReactiveX:1.x Feb 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants