Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: errors don't implement Temporary interface #35286

Closed
kirillx opened this issue Oct 31, 2019 · 10 comments
Closed

x/net/http2: errors don't implement Temporary interface #35286

kirillx opened this issue Oct 31, 2019 · 10 comments
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@kirillx
Copy link
Contributor

kirillx commented Oct 31, 2019

What version of Go are you using (go version)?

1.13

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

Linux

What did you do?

HTTP client application issuing std POST/GET requests.

What is wrong?

Sometimes http client returns the below errors and there is no good way to verify that they are temporary and might be retried:
a) http.http2GoAwayError (e.g. 'http2: server sent GOAWAY and closed the connection; LastStreamID=8329, ErrCode=NO_ERROR, debug="max_age"'
b) http.http2StreamError (e.g. 'stream error: stream ID 309; INTERNAL_ERROR')

After googling one can even find that some detect such errors and retry on them like in cloud.google.com/go/storage/reader.go:
if strings.HasSuffix(err.Error(), "INTERNAL_ERROR") && strings.Contains(reflect.TypeOf(err).String(), "http2") {

So if there is no objections I would make these 2 errors implement Temporary() interface returning true.

@bradfitz bradfitz changed the title http2 errors don't implement Temporary interface x/net/http2: errors don't implement Temporary interface Oct 31, 2019
@bcmills bcmills added help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Nov 5, 2019
@bcmills bcmills added this to the Unplanned milestone Nov 5, 2019
@kirillx
Copy link
Contributor Author

kirillx commented Jan 22, 2020

@bcmills what kind of help wanted?

@bcmills
Copy link
Contributor

bcmills commented Jan 22, 2020

@kirillx, figure out which of the errors should be considered Temporary and why, and implement the interface accordingly for the ones that are.

(The hard part of the CL to fix it is likely to be the documentation: it will need comments explaining why the errors are, in fact, temporary.)

@kirillx
Copy link
Contributor Author

kirillx commented Jan 27, 2020

@bcmills

  1. with http.http2GoAwayError I think justification is trivial, this error happens typically if server decides enough requests per connection serviced or after some timeout or graceful shutdown (https://http2.github.io/http2-spec/#GOAWAY).

  2. with INTERNAL_ERROR it is more complicated to justify as it requires analysis of all 3rd party servers and HTTP2 spec is not 100% clear when it should be used:
    INTERNAL_ERROR (0x2): The endpoint encountered an unexpected internal error.
    e.g. the very same package server may issue INTERNAL_ERROR in clientStream.writeRequestBody() if encodeTrailers() fails, which I believe is not a normal situation.
    I believe spec was meant this error to be used similar to HTTP 500 and real-life experience clearly demonstrates it is retryable (like HTTP 500) and as I mentioned even google SDK does so on the error.

You can also see lots of such reports in github:
#22235
#26338
googleapis/google-cloud-go#987

BTW, why these errors are not exported? They are in x/net/http2 package, but seems like they are made somehow private during bundling to http/h2_bundle.go ... Exporting error types would make it possible checking for it's type at least and codes. Any ideas?

kirillx added a commit to kirillx/net that referenced this issue Jan 27, 2020
... returning true.

Justification from golang/go#35286 (#35286):

1. http.http2GoAwayError (e.g. 'http2: server sent GOAWAY and closed the
connection; LastStreamID=8329, ErrCode=NO_ERROR, debug="max_age"')

This error happens typically if server decides enough requests per connection
serviced or after some timeout or graceful shutdown
(https://http2.github.io/http2-spec/#GOAWAY).

2. http.http2StreamError with INTERNAL_ERROR code (e.g. 'stream error: stream ID 309; INTERNAL_ERROR')

According to HTTP2 SPEC:
INTERNAL_ERROR (0x2): The endpoint encountered an unexpected internal error.

i.e. this error to be used similar to HTTP 500 and real-life experience clearly
demonstrates it is retryable (like HTTP 500).

Google Cloud SDK also implements retries on this error at cloud.google.com/go/storage/reader.go:
if strings.HasSuffix(err.Error(), "INTERNAL_ERROR") && strings.Contains(reflect.TypeOf(err).String(), "http2")

Signed-off-by: Kirill Korotaev <[email protected]>
@bcmills
Copy link
Contributor

bcmills commented Jan 27, 2020

Treating INTERNAL_ERROR as “temporary” seems like too much of a stretch.

With human intervention, nearly any error can be “temporary” — the relevant question from an API perspective, I think, is whether the problem is likely to resolve on its own without intervention, and in the case of a 500 status that seems much less likely.

Retrying a 500 without knowing its cause seems dubious at best — sometimes it will help, and sometimes it will only exacerbate a Query of Death. Given that it is easier for callers to add a retry than to remove one, that doesn't seem like a good place for the library to bias in favor of retrying.

@kirillx
Copy link
Contributor Author

kirillx commented Jan 27, 2020

  1. Retrying 500 is an official way of handling it in almost any cloud SDK and REST API, including google and AWS ones:

https://cloud.google.com/storage/docs/exponential-backoff

"Clients should use truncated exponential backoff for all requests to Cloud Storage that return HTTP 5xx and 429 response codes..."

https://docs.aws.amazon.com/general/latest/gr/api-retries.html

If you're not using an AWS SDK, you should retry original requests that receive server (5xx) or throttling errors.

  1. To avoid Query of Death one need to use exponential backoff.

  2. Unfortunately it is not easier for callers to add a retry - error is not exported, so it is not possible to check its error type and .Code field.

@bcmills
Copy link
Contributor

bcmills commented Jan 27, 2020

Retrying 500 is an official way of handling it in almost any cloud SDK and REST API

“almost any cloud SDK and REST API” is not the same thing as “the HTTP protocol”. The net/http package implements the latter, not the former.

@bcmills
Copy link
Contributor

bcmills commented Jan 27, 2020

Unfortunately it is not easier for callers to add a retry - error is not exported, so it is not possible to check its error type and .Code field.

It is exported from golang.org/x/net/http2, just not from net/http.

However, even that could potentially be remedied by exporting the type (or by transforming it to something exported) from within the net/http package, or by adding a new (backward-compatible) field to the http.Response struct for HTTP2 error codes.

@kirillx
Copy link
Contributor Author

kirillx commented Jan 28, 2020

I don't really want to argue about retriable errors. Protocol spec doesn't define it well, besides some basics like Retry-After headers and that streams after GoAway are retriable. Yet, none of the POSIX specs define that EMFILE and ETIMEDOUT are temporary errors, which are treated by Go so.
All the timeouts (BTW including temporary tlsHandshakeTimeoutError) - maybe a sign of server being overloaded, yet they are retriable.

From my POV per HTTP2 SPEC INTERNAL_ERROR is not smth which happens normally if all the parties obey the protocol. Thus this error is kind of temporary.

It is exported from golang.org/x/net/http2, just not from net/http.

Does it still help when all the SDKs are using net/http? Am I missing smth? Would be thankful for details on how it helps with 3rd party packages.

However, even that could potentially be remedied by exporting the type (or by transforming it to something exported) from within the net/http package, or by adding a new (backward-compatible) field to the http.Response struct for HTTP2 error codes.

Should I add

type Http2StreamError = http2StreamError

to export it? I can change the patch this way if you feel it is better suited.

@jeanbza
Copy link
Contributor

jeanbza commented Aug 28, 2020

GOAWAY

GOAWAY seems clearly not something that is designed to be retried. From the spec:

The GOAWAY frame (type=0x7) is used to initiate shutdown of a connection or to signal serious error conditions.

Nothing in that seems to indicate "you should retry this". =)

There are cases where a server might accept a retry, but we probably shouldn't design around "it happens sometimes".


500 Internal Server Error

I spent a good deal of time researching this subject over the last two years. Myself and several engineers collaborate to create http://aip.dev/194, which lays out which RPC codes should retried and under which circumstances.

Although http://aip.dev/194 is geared towards RPCs, some codes have easy HTTP corollaries: gRPC's INTERNAL is equivalent to HTTP's 500 Internal Server Error.

INTERNAL - and therefore 500 Internal Server Error - mean different things in different contexts. For some APIs, or some methods, 500 might be temporary (or retryable). For others, there might not be such a guarantee.

@kirillx , you mention cloud.google.com/go/storage/reader.go (retry logic lives here now) - this API is explicitly designed to only emit 500s when they are retryable.

But, I don't think the Go stdlib can guarantee that all 500 errors are temporary.

I think this is WAI. If you know your API returns retryable 500, it's fairly easy to identify them.

@seankhliao
Copy link
Member

given Temporary's deprecation in #45729, I don't think this is going to happen

@golang golang locked and limited conversation to collaborators Jun 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants