Cleanup & Fix data races in ApolloWebSocket #307

js · 2018-07-03T06:12:11Z

This PR commits the ultimate sin of mixing several changes in the same PR, sorry not sorry.

The ApolloWebSocket had a vastly different coding style than the rest of the project, and diverged a bit from standard swift style as well, so since I'm a total snob I had to fix that before I could fix the actual problem:

Due to the fact that Starscream delivers its callbacks on the main queue by default (it can be configured to another DispatchQueue but we'd still have the same problem) and the NetworkTransport's send(....) is operated from whatever queue the calling AsynchronousOperation gets assigned to, this can lead to race conditions on some of the state variables, as detected by the Thread Sanitizer. So this PR adds a lock around some of the state variables (mainly the acking of messages) as well as processing of starscream callbacks on a serial queue.
I'm not too happy about the latter there as I feel there should be another design around coordinating thread access with the NetworkTransport implementations, what do you think?

- lowercase enum cases - switch on actual enum cases, not its raw values - avoid using rawValue for things not RawRepresentable - avoid using abbreviations - Slightly more consistent use of spaces

martijnwalraven · 2018-07-03T06:23:07Z

Thanks! I'm not too familiar with the code, so I think @knutaa is the best person to give detailed feedback on the changes.

As for the concurrency issues, would it be possible to have all processing happen on a single serial queue? If we configure Starscream to use that as its callbackQueue, and async dispatch to that queue from the WebSocketNetworkTransport's send(), wouldn't that mean we could get rid of explicit locks?

js · 2018-07-03T07:02:12Z

Maybe, there's a few other critical paths outside the callbacks though, such as (un)subscribing and write() (which mutates the messageQueue)

I gave it a quick shot locally, but I get test failures by just setting the callbackQueue on the starscream websocket object (and no other changes), haven't dug into why yet…

martijnwalraven · 2018-07-03T07:21:14Z

Regardless if we can get rid of the lock, I think avoiding hitting the main queue when WebSocket messages are received would make me feel a lot better. Thanks for looking into this, I didn't realize that was what Starscream was doing!

psi-gh · 2018-07-04T11:29:31Z

I'm really waiting for this because it's a fix #308

knutaa · 2018-07-04T13:26:20Z

The websocket in WebSocketTransport declared as

internal let websocket: WebSocketClient

is the protocol and not the full Starscream WebSocket. This is done in support of the mocked testing.

It should be possible to set the callback processingQueue with something like

// websocket represent the WebSocketClient protocol, not the Starscream WebSocket
// this is done in support of the test mocking of the protocol
if let websocket = self.websocket as? WebSocket {
  websocket.callbackQueue = processingQueue
}
self.websocket.connect()

Would it then be possible to wrap the content of write() and replace the locks in sendHelper and unsubscribe? Other locks should be redundant.

The current use of processingQueue for the delegate methods should (of course?) be redundant.

js · 2018-07-05T06:34:25Z

Yeah I didn't discover the callbackQueue Starscream API until after I made this PR. Also, if callbackQueue is made part of the WebSocketClient then we don't have the awkwardness of asking "hey so are you really a WebSocket?".

I quickly tried setting the callbackQueue and wrapping write(..) in the same queue, but there's a test failure in testSubscribeMultipleReview() iirc where the last change isn't always pushed to the subscription. Didn't have time to dig more into it though.

Honestly I feel a bit bad about dumping this PR in your lap. It contains some opinionated stylistic and future proofing changes mixed in with some actual fixes.

js · 2018-07-05T07:02:08Z

in case it isn't obvious you run the tests with TSAN by going to Edit Scheme (opt click the run icon) for the ApolloWebSockets scheme, then select the Test configuration in the sidebar thing, then the Diagnostics tab > ThreadSanitizer in case any one else want to have a look the data race it finds.

js · 2018-07-05T12:05:34Z

Based on #310 here is a compare with just the concurrency fixes

It still uses a NSLock, because, well, dispatch queues are a pretty heavy hammer for a small locking problem, and it tends to complicate things by turning a synchronous problem into an asynchronous one.

TSAN still triggers on the state mutation FetchQueryOperation.fetchFromNetwork() in the main Apollo project though.

designatednerd · 2019-07-02T09:40:44Z

Tying this to #600 for when I look into that issue - thanks for doing this @js!

designatednerd · 2019-07-23T11:00:16Z

@js ping me when you get back from vacation, let's try to get this good to merge 😃

designatednerd · 2019-08-28T16:46:00Z

@js You still on vacation or do you think you might have a chance to take a look at this?

js · 2019-08-30T06:48:33Z

@designatednerd I'll take a look over the weekend

aivcec · 2019-09-24T11:17:16Z

Any updates on this? I am experiencing race condition issues on disconnecting from multiple subscriptions and am interested in seeing this merged.

js · 2019-09-26T08:53:54Z

Sorry, I simply haven't found the time to look at this. Took a brief look last night but I've completely lost all context I had a year ago :) and my project still doesn't use graphql subscriptions so haven't found the excuse to weave it in there

When running the tests (the main apollo test bundle), TSAN seems triggered by something with the ReadWriteLock inside the barrier blocks but it doesn't give much of the usual backtrace/hints as to what it thinks the problem is… 🤷‍♂

I'm closing this, but if anyone else wants the interesting bits seems to be in ec42e1d

js added 8 commits July 2, 2018 09:21

Match WebSocketTransport code style with rest of project

da078bc

- lowercase enum cases - switch on actual enum cases, not its raw values - avoid using rawValue for things not RawRepresentable - avoid using abbreviations - Slightly more consistent use of spaces

rename params to requestHeaders, only pass them to the URLRequest

27c0806

Rename connectingParams to connectingPayload, and use GraphQLMap type

34809a6

Break protocol conformance into extensions

d72e41b

Don`t expose the underlying websocket, make it non-optional

be1a76c

Leave websocket URLRequest header config up to the caller

38f7259

Make reconnect public

53b8973

Fix TSAN detected data races

ec42e1d

knutaa mentioned this pull request Jul 4, 2018

fix #308 #309

Closed

2 tasks

js mentioned this pull request Jul 5, 2018

ApolloWebSockets code style, #310

Merged

js closed this Sep 26, 2019

aivcec mentioned this pull request Oct 22, 2019

Add documentation for Subscriptions + ApolloWebSocket #853

Merged

aivcec mentioned this pull request Nov 3, 2019

Fixing data races in subscriptions #880

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup & Fix data races in ApolloWebSocket #307

Cleanup & Fix data races in ApolloWebSocket #307

js commented Jul 3, 2018

martijnwalraven commented Jul 3, 2018

js commented Jul 3, 2018

martijnwalraven commented Jul 3, 2018

psi-gh commented Jul 4, 2018

knutaa commented Jul 4, 2018 •

edited

Loading

js commented Jul 5, 2018

js commented Jul 5, 2018

js commented Jul 5, 2018

designatednerd commented Jul 2, 2019

designatednerd commented Jul 23, 2019

designatednerd commented Aug 28, 2019

js commented Aug 30, 2019

aivcec commented Sep 24, 2019

js commented Sep 26, 2019

Cleanup & Fix data races in ApolloWebSocket #307

Cleanup & Fix data races in ApolloWebSocket #307

Conversation

js commented Jul 3, 2018

martijnwalraven commented Jul 3, 2018

js commented Jul 3, 2018

martijnwalraven commented Jul 3, 2018

psi-gh commented Jul 4, 2018

knutaa commented Jul 4, 2018 • edited Loading

js commented Jul 5, 2018

js commented Jul 5, 2018

js commented Jul 5, 2018

designatednerd commented Jul 2, 2019

designatednerd commented Jul 23, 2019

designatednerd commented Aug 28, 2019

js commented Aug 30, 2019

aivcec commented Sep 24, 2019

js commented Sep 26, 2019

knutaa commented Jul 4, 2018 •

edited

Loading