Data race in Emitter and unfortunate handling of threads #774

hakonk · 2023-05-05T12:51:05Z

Describe the bug
While running an app that makes use of SnowplowTracker (v5.0.0) and enabling Thread Sanitizer in Xcode, I have one some occasions encountered data races in the Emitter class in the SnowplowTracker library in a production app. More specifically, the data races involve concurrent access to Emitter.isSending

To Reproduce
Here is a minimal example that demonstrates the issue.

Create an iOS app with a single view:

TestSnowplowApp.swift:

import SwiftUI

@main
struct TestSnowplowApp: App {
    var body: some Scene {
        WindowGroup {
            ContentView()
        }
    }
}

Make use of SnowplowTracker in the ContentView struct:

ContentView.swift:

import SwiftUI
import SnowplowTracker

class SnowplowSetup {
    static let tracker: TrackerController = Snowplow.createTracker(namespace: "testapp", network: NetworkConfiguration(endpoint: "").customPostPath("")) {
        TrackerConfiguration()
            .screenViewAutotracking(false)
    }!
}

struct ContentView: View {
    var body: some View {
        Button {
            for _ in 0..<100 {
                // These events will fail to be dispatched as the network configuration is invalid.
                // It is however sufficient to trigger a TSan breakpoint
                _ = SnowplowSetup.tracker.track(ScreenView(name: "MyScreen"))
            }
        } label: {
            Text("Send a bunch of requests")
        }
    }
}

Enable Thread sanitizer in the Diagnostics tab for the app's scheme and enable a Runtime Issue breakpoint where "Thread Sanitizer" is opted in for.
Run the app in the simulator in debug mode and press the button "Send a bunch of requests" several times until the breakpoint hits.
Observe that the breakpoint is reached.

Expected behavior
Concurrent access internally in the library should not trigger a TSan breakpoint.

Device information (please complete the following information):

Simulator: [iPhone 14 Pro]
Xcode: [14.3]
OS: [MacOS 13.3.1]

Additional context
Here is an excerpt from Emitter.swift:360-382 that I believe sheds some light on the issue:

func sendGuard() {
    if isSending || pausedEmit { // <- isSending is accessed while the lock is not obtained yet
        return
    }
    objc_sync_enter(self)
    if !isSending && !pausedEmit {
        isSending = true
        attemptEmit()
    }
    objc_sync_exit(self)
}
// [...]
func flush() {
    if Thread.isMainThread {
        DispatchQueue.global(qos: .default).async { [self] in
            sendGuard() // <- can potentially be called on several threads of the concurrent dispatch queue
        }
    } else {
        sendGuard()
    }
}

This code is inherently not thread safe because isSending is read in sendGuard prior to obtaining the lock. It is also read many other places. Therefore one may still get a data race even if isSending is written to, even if that mutation takes place after the lock has been obtained. Also, since sendGuard is dispatched on the global dispatch queue, which is a concurrent dispatch queue, one can be almost certain the code will execute on several threads.

Another issue, which isn't directly tied to the data race we're seeing, is that on line 437 in Emitter.swift, the thread is put to sleep for 5 seconds. This is unfortunate because, as I understand it, one can put several threads that belong to the concurrent dispatch queue's pool to sleep. While I'm not sure how this affects Swift Concurrency's forward progress principle, I can't imagine it contributes in a positive manner.

I hope this issue sufficiently describes the problem. Otherwise I'm happy to provide more context.

The text was updated successfully, but these errors were encountered:

matus-tomlein · 2023-05-06T09:06:23Z

Thank you for reporting this @hakonk! I think we might be able to get around the issue by having a second lock to access isSending and pausedEmit.

If you have any other ideas or are open to contribute a solution for this, it's very welcome!

hakonk · 2023-05-08T12:28:20Z

We had a look at the particular data race, and it is not very problematic considering it's a boolean value. After looking some more at the code it seems the main issue is that a global concurrent dispatch queue is used to serialize and send events. This is generally discouraged because it can create too many threads, see for example: https://forums.swift.org/t/how-to-currently-offload-stateless-blocking-work-hidden-executor-api/59128/2.

In Emitter we've uncovered the following:

addPayload is called frequently. For each call to addPayload, flush is called as well. In flush, sendGuard is called, which in turn calls attemptEmit, and in this method a network request is sent and waited for synchronously with a semaphore. If no events succeed to be emitted, the thread is put to sleep for five seconds, if some events are successful, attemptEmit is called recursively to exhaust the event store. Since we either wait for the network requests to complete synchronously, or for the thread to sleep, the thread is potentially blocked for a relatively long time.

The screenshot of from the Time Profiler in Instruments shows that several threads are employed to perfrom work in the Snowplow library. In this example we sent 10 events instead of 100 as in the example app above. If I understand the literature correctly, the several threads will be created even if they don't block for long, so the issue is not really the blocking calls even thought that is unfortunate.

One way to improve this is simply to use private serial queues. That way you get mutual exclusion and offloading of the main queue. Also, instead of waiting synchronously for URLSession to complete with a semaphore, you could just use the async API as is, that way there is no need to block the thread.

hakonk · 2023-05-08T12:57:09Z

While the data race isn't that problematic in terms of concurrent read/write, I guess it could create race conditions that can result in bugs, so that may be worth checking into.

… threads (close #774)

matus-tomlein · 2023-05-17T09:28:12Z

Hi @hakonk, thanks for the additional investigation and suggestions!

We have put together a PR that addresses the issue with unprotected access to the isSending property – #780. We'd be happy to get your feedback on the PR if you are interested!

I understand that your suggestions go a step further to improve the thread handling. We'll keep that in mind in future refactoring of the tracker internals.

hakonk added the type:defect Bugs or weaknesses. The issue has to contain steps to reproduce. label May 5, 2023

matus-tomlein added a commit that referenced this issue May 17, 2023

Handle unprotected access to sending state in Emitter from concurrent…

cb711bd

… threads (close #774)

matus-tomlein mentioned this issue May 17, 2023

Handle unprotected access to sending state in Emitter from concurrent threads (close #774) #780

Merged

matus-tomlein self-assigned this May 19, 2023

matus-tomlein added this to the 5.2.0 milestone May 19, 2023

matus-tomlein mentioned this issue Jun 2, 2023

Release/5.2.0 #788

Merged

matus-tomlein closed this as completed in 2199537 Jun 2, 2023

hakonk mentioned this issue Sep 14, 2023

Improved concurrency model #820

Closed

matus-tomlein mentioned this issue Oct 16, 2023

Improve concurrency model using a single internal dispatch queue (close #820) #842

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data race in Emitter and unfortunate handling of threads #774

Data race in Emitter and unfortunate handling of threads #774

hakonk commented May 5, 2023 •

edited

Loading

matus-tomlein commented May 6, 2023

hakonk commented May 8, 2023

hakonk commented May 8, 2023

matus-tomlein commented May 17, 2023

Data race in Emitter and unfortunate handling of threads #774

Data race in Emitter and unfortunate handling of threads #774

Comments

hakonk commented May 5, 2023 • edited Loading

matus-tomlein commented May 6, 2023

hakonk commented May 8, 2023

hakonk commented May 8, 2023

matus-tomlein commented May 17, 2023

hakonk commented May 5, 2023 •

edited

Loading