-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashing on iOS after receiving an error from Network stack while idle #1132
Comments
Yeah, this crash seems like it shouldn't be possible, so that means there's some misunderstanding in the logic around the management of the GRPCIdleHandler and the ConnectionManager. |
Yeah, that precondition just seems excessively aggressive. There is no particular reason that an idle channel shouldn't be able to error. It's certainly surprising, but insufficiently defensive. In this case I'm prepared to believe that the Either way, grpc is probably being a bit aggressive here. |
Motivation: When a connection is in the idle state it is unlikely, but possible, to see channel errors. This is most likely to happen when a channel has just gone idle, and will shortly become entirely unavailable, but some other I/O operation sees an error in the meantime. However, it could arguably happen before any activity has been seen at all. Right now we handle this by crashing. That's not great, mostly because it's entirely possible to see weird reordering of events or delayed error delivery from other handlers. It would be better to treat these similarly to the way we treat errors in transientFailure or shutdown: as fundamentally not adding drastically new information. After all, consider the two options: either this error is channel fatal, or it is not. In either case, being idle is immaterial. If the error is fatal to the channel, we'll see the channel go inactive shortly and be happy. Alternatively, the channel may be just fine, in which case we can safely attempt to re-use it later. If the channel is in a weird broken state we'll likely hit an immediate error on reconnection and move on with our lives. The one minor difference I've proposed here is that it's probably worth us notifying about errors of this kind. While they aren't likely to cause a bug in grpc, they could be a source of all kinds of weird issues, and at the very least they probably represent a bug somewhere else in the code. We'd like to know about them if we can. Modifications: - Remove the hard-crash on error in idle, replace it with a log. Result: Fewer crashes. Fixes grpc#1132.
Motivation: When a connection is in the idle state it is unlikely, but possible, to see channel errors. This is most likely to happen when a channel has just gone idle, and will shortly become entirely unavailable, but some other I/O operation sees an error in the meantime. However, it could arguably happen before any activity has been seen at all. Right now we handle this by crashing. That's not great, mostly because it's entirely possible to see weird reordering of events or delayed error delivery from other handlers. It would be better to treat these similarly to the way we treat errors in transientFailure or shutdown: as fundamentally not adding drastically new information. After all, consider the two options: either this error is channel fatal, or it is not. In either case, being idle is immaterial. If the error is fatal to the channel, we'll see the channel go inactive shortly and be happy. Alternatively, the channel may be just fine, in which case we can safely attempt to re-use it later. If the channel is in a weird broken state we'll likely hit an immediate error on reconnection and move on with our lives. The one minor difference I've proposed here is that it's probably worth us notifying about errors of this kind. While they aren't likely to cause a bug in grpc, they could be a source of all kinds of weird issues, and at the very least they probably represent a bug somewhere else in the code. We'd like to know about them if we can. Modifications: - Remove the hard-crash on error in idle, replace it with a log. Result: Fewer crashes. Fixes grpc#1132.
Motivation: When a connection is in the idle state it is unlikely, but possible, to see channel errors. This is most likely to happen when a channel has just gone idle, and will shortly become entirely unavailable, but some other I/O operation sees an error in the meantime. However, it could arguably happen before any activity has been seen at all. Right now we handle this by crashing. That's not great, mostly because it's entirely possible to see weird reordering of events or delayed error delivery from other handlers. It would be better to treat these similarly to the way we treat errors in transientFailure or shutdown: as fundamentally not adding drastically new information. After all, consider the two options: either this error is channel fatal, or it is not. In either case, being idle is immaterial. If the error is fatal to the channel, we'll see the channel go inactive shortly and be happy. Alternatively, the channel may be just fine, in which case we can safely attempt to re-use it later. If the channel is in a weird broken state we'll likely hit an immediate error on reconnection and move on with our lives. The one minor difference I've proposed here is that it's probably worth us notifying about errors of this kind. While they aren't likely to cause a bug in grpc, they could be a source of all kinds of weird issues, and at the very least they probably represent a bug somewhere else in the code. We'd like to know about them if we can. Modifications: - Remove the hard-crash on error in idle, replace it with a log. Result: Fewer crashes. Fixes #1132.
Describe the bug
Versions:
grpc-swift 1.0.0-alpha.22
protoc-gen-grpc-swift
Platforms:
iOS 13.3.1, 13.4.1, 13.6.1, 14.2, 14.3, 14.4
In our application we're using swift-grpc 1.0.0-alpha-22 to stream audio data from a mobile client up to our server.
We're seeing in our production crash logs the following stack trace where immediately before the crash the state of our grpc connection transitions to idle and then we receive this error.
We appear to be hitting the precondition on ConnectionManager:470
To reproduce
I haven't been able to reproduce the issue so far locally but the only way I've actually gotten a similar stack trace is via a connection termination from the server due to a 120s timeout.
Steps to reproduce the stack trace (note current state hasn't been idle for me so haven't hit the crash)
Expected behaviour
We either don't receive errors from the network framework while idle or we don't crash if we do.
Additional information
I've tried debugging it a number of times and it seems to be very intermittent but I can't seem to create the timing where we transition to idle before we receive a network error nor can I figure out how to avoid said timing.
The text was updated successfully, but these errors were encountered: