Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore errors in idle state. #1133

Merged
merged 1 commit into from
Feb 11, 2021
Merged

Conversation

Lukasa
Copy link
Collaborator

@Lukasa Lukasa commented Feb 10, 2021

Motivation:

When a connection is in the idle state it is unlikely, but possible, to
see channel errors. This is most likely to happen when a channel has
just gone idle, and will shortly become entirely unavailable, but some
other I/O operation sees an error in the meantime. However, it could
arguably happen before any activity has been seen at all.

Right now we handle this by crashing. That's not great, mostly because
it's entirely possible to see weird reordering of events or delayed
error delivery from other handlers. It would be better to treat these
similarly to the way we treat errors in transientFailure or shutdown: as
fundamentally not adding drastically new information.

After all, consider the two options: either this error is channel fatal,
or it is not. In either case, being idle is immaterial. If the error is
fatal to the channel, we'll see the channel go inactive shortly and be
happy. Alternatively, the channel may be just fine, in which case we can
safely attempt to re-use it later. If the channel is in a weird broken
state we'll likely hit an immediate error on reconnection and move on
with our lives.

The one minor difference I've proposed here is that it's probably worth
us notifying about errors of this kind. While they aren't likely to
cause a bug in grpc, they could be a source of all kinds of weird
issues, and at the very least they probably represent a bug somewhere
else in the code. We'd like to know about them if we can.

Modifications:

  • Remove the hard-crash on error in idle, replace it with a log.

Result:

Fewer crashes.

Fixes #1132.

@Lukasa Lukasa added the 🔨 semver/patch No public API change. label Feb 10, 2021
@Lukasa Lukasa requested a review from glbrntt February 10, 2021 18:46
Motivation:

When a connection is in the idle state it is unlikely, but possible, to
see channel errors. This is most likely to happen when a channel has
just gone idle, and will shortly become entirely unavailable, but some
other I/O operation sees an error in the meantime. However, it could
arguably happen before any activity has been seen at all.

Right now we handle this by crashing. That's not great, mostly because
it's entirely possible to see weird reordering of events or delayed
error delivery from other handlers. It would be better to treat these
similarly to the way we treat errors in transientFailure or shutdown: as
fundamentally not adding drastically new information.

After all, consider the two options: either this error is channel fatal,
or it is not. In either case, being idle is immaterial. If the error is
fatal to the channel, we'll see the channel go inactive shortly and be
happy. Alternatively, the channel may be just fine, in which case we can
safely attempt to re-use it later. If the channel is in a weird broken
state we'll likely hit an immediate error on reconnection and move on
with our lives.

The one minor difference I've proposed here is that it's probably worth
us notifying about errors of this kind. While they aren't likely to
cause a bug in grpc, they could be a source of all kinds of weird
issues, and at the very least they probably represent a bug somewhere
else in the code. We'd like to know about them if we can.

Modifications:

- Remove the hard-crash on error in idle, replace it with a log.

Result:

Fewer crashes.

Fixes grpc#1132.
@Lukasa Lukasa force-pushed the cb-tolerate-error-in-idle branch from 89ca45d to bcb314f Compare February 11, 2021 07:36
Copy link
Collaborator

@glbrntt glbrntt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks @Lukasa!

@Lukasa Lukasa merged commit 897a5bd into grpc:main Feb 11, 2021
@Lukasa Lukasa deleted the cb-tolerate-error-in-idle branch February 11, 2021 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔨 semver/patch No public API change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crashing on iOS after receiving an error from Network stack while idle
2 participants