Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race in Emitter and unfortunate handling of threads #774

Closed
hakonk opened this issue May 5, 2023 · 4 comments
Closed

Data race in Emitter and unfortunate handling of threads #774

hakonk opened this issue May 5, 2023 · 4 comments
Assignees
Labels
type:defect Bugs or weaknesses. The issue has to contain steps to reproduce.
Milestone

Comments

@hakonk
Copy link

hakonk commented May 5, 2023

Describe the bug
While running an app that makes use of SnowplowTracker (v5.0.0) and enabling Thread Sanitizer in Xcode, I have one some occasions encountered data races in the Emitter class in the SnowplowTracker library in a production app. More specifically, the data races involve concurrent access to Emitter.isSending

To Reproduce
Here is a minimal example that demonstrates the issue.

  1. Create an iOS app with a single view:

TestSnowplowApp.swift:

import SwiftUI

@main
struct TestSnowplowApp: App {
    var body: some Scene {
        WindowGroup {
            ContentView()
        }
    }
}
  1. Make use of SnowplowTracker in the ContentView struct:

ContentView.swift:

import SwiftUI
import SnowplowTracker

class SnowplowSetup {
    static let tracker: TrackerController = Snowplow.createTracker(namespace: "testapp", network: NetworkConfiguration(endpoint: "").customPostPath("")) {
        TrackerConfiguration()
            .screenViewAutotracking(false)
    }!
}

struct ContentView: View {
    var body: some View {
        Button {
            for _ in 0..<100 {
                // These events will fail to be dispatched as the network configuration is invalid.
                // It is however sufficient to trigger a TSan breakpoint
                _ = SnowplowSetup.tracker.track(ScreenView(name: "MyScreen"))
            }
        } label: {
            Text("Send a bunch of requests")
        }
    }
}
  1. Enable Thread sanitizer in the Diagnostics tab for the app's scheme and enable a Runtime Issue breakpoint where "Thread Sanitizer" is opted in for.

  2. Run the app in the simulator in debug mode and press the button "Send a bunch of requests" several times until the breakpoint hits.

  3. Observe that the breakpoint is reached.

Expected behavior
Concurrent access internally in the library should not trigger a TSan breakpoint.

Device information (please complete the following information):

  • Simulator: [iPhone 14 Pro]
  • Xcode: [14.3]
  • OS: [MacOS 13.3.1]

Additional context
Here is an excerpt from Emitter.swift:360-382 that I believe sheds some light on the issue:

func sendGuard() {
    if isSending || pausedEmit { // <- isSending is accessed while the lock is not obtained yet
        return
    }
    objc_sync_enter(self)
    if !isSending && !pausedEmit {
        isSending = true
        attemptEmit()
    }
    objc_sync_exit(self)
}
// [...]
func flush() {
    if Thread.isMainThread {
        DispatchQueue.global(qos: .default).async { [self] in
            sendGuard() // <- can potentially be called on several threads of the concurrent dispatch queue
        }
    } else {
        sendGuard()
    }
}

This code is inherently not thread safe because isSending is read in sendGuard prior to obtaining the lock. It is also read many other places. Therefore one may still get a data race even if isSending is written to, even if that mutation takes place after the lock has been obtained. Also, since sendGuard is dispatched on the global dispatch queue, which is a concurrent dispatch queue, one can be almost certain the code will execute on several threads.

Another issue, which isn't directly tied to the data race we're seeing, is that on line 437 in Emitter.swift, the thread is put to sleep for 5 seconds. This is unfortunate because, as I understand it, one can put several threads that belong to the concurrent dispatch queue's pool to sleep. While I'm not sure how this affects Swift Concurrency's forward progress principle, I can't imagine it contributes in a positive manner.

I hope this issue sufficiently describes the problem. Otherwise I'm happy to provide more context.

@hakonk hakonk added the type:defect Bugs or weaknesses. The issue has to contain steps to reproduce. label May 5, 2023
@matus-tomlein
Copy link
Contributor

Thank you for reporting this @hakonk! I think we might be able to get around the issue by having a second lock to access isSending and pausedEmit.

If you have any other ideas or are open to contribute a solution for this, it's very welcome!

@hakonk
Copy link
Author

hakonk commented May 8, 2023

We had a look at the particular data race, and it is not very problematic considering it's a boolean value. After looking some more at the code it seems the main issue is that a global concurrent dispatch queue is used to serialize and send events. This is generally discouraged because it can create too many threads, see for example: https://forums.swift.org/t/how-to-currently-offload-stateless-blocking-work-hidden-executor-api/59128/2.

In Emitter we've uncovered the following:

addPayload is called frequently. For each call to addPayload, flush is called as well. In flush, sendGuard is called, which in turn calls attemptEmit, and in this method a network request is sent and waited for synchronously with a semaphore. If no events succeed to be emitted, the thread is put to sleep for five seconds, if some events are successful, attemptEmit is called recursively to exhaust the event store. Since we either wait for the network requests to complete synchronously, or for the thread to sleep, the thread is potentially blocked for a relatively long time.

Screenshot 2023-05-08 at 14 02 45

The screenshot of from the Time Profiler in Instruments shows that several threads are employed to perfrom work in the Snowplow library. In this example we sent 10 events instead of 100 as in the example app above. If I understand the literature correctly, the several threads will be created even if they don't block for long, so the issue is not really the blocking calls even thought that is unfortunate.

One way to improve this is simply to use private serial queues. That way you get mutual exclusion and offloading of the main queue. Also, instead of waiting synchronously for URLSession to complete with a semaphore, you could just use the async API as is, that way there is no need to block the thread.

@hakonk
Copy link
Author

hakonk commented May 8, 2023

While the data race isn't that problematic in terms of concurrent read/write, I guess it could create race conditions that can result in bugs, so that may be worth checking into.

@matus-tomlein
Copy link
Contributor

Hi @hakonk, thanks for the additional investigation and suggestions!

We have put together a PR that addresses the issue with unprotected access to the isSending property – #780. We'd be happy to get your feedback on the PR if you are interested!

I understand that your suggestions go a step further to improve the thread handling. We'll keep that in mind in future refactoring of the tracker internals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:defect Bugs or weaknesses. The issue has to contain steps to reproduce.
Projects
None yet
Development

No branches or pull requests

2 participants