-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API server panics when writing response #29001
Comments
@roberthbailey @smarterclayton so it begins :) |
There is a log'd issue upstream, in process on godep update. |
You mean finding issues that were previously masked. Early in the 1.4 dev cycle seems like a great time to find those. |
That'd be a great tag line:
Kubernetes: Destruct testing your go libraries since 2014
|
Automatic merge from submit-queue Update golang.org/x/net godep for http2 issues being seen during scale This is to address #29001 , "Header called after Handler finished". There are a number of race fixes.
I didn't have time to run tests today, as I was fixing NodeController. I'm OOO for the rest of the week, but maybe @wojtek-t will run the test? |
I will try to confirm that the fix works later this week. (Though the single run may not be enough, as this seemed to be a race). Anyway - I will keep this updated after running some tests later this week. |
I have run an experiment over night, and this happened again (with the fix). So this is still a problem. |
k, given that you're still seeing a server race condition I'm inclined to default off again and perhaps try again in the 1.5 w/ golang 1.7 timeframe. |
closed via #29283 |
If you can crash apiserver with an HTTP 2 connection, then we should: b) is because otherwise it's super easy to dos apiserver. On Wed, Jul 20, 2016 at 8:32 AM, Timothy St. Clair <[email protected]
|
@lavalamp - I agree that we should prevent http 2 connections server-side, but can you clarify why it makes easier to dos apiserver? Is it because it will be crashlooping or something else? |
Yeah, a malicious client can just keep making http2 requests... On Wed, Jul 20, 2016 at 11:00 AM, Wojciech Tyczynski <
|
Hi, I own Go's net/http and http2 stuff. I saw golang/go#16451 but it appears that the bug is actually in Kubernetes. When you get the error message: "panic: Header called after Handler finished" That means the ResponseWriter.Header() method was called after the ServeHTTP completed. So, I went looking up your panic's call stack looking for suspects. My first guess was
... because anything named fancy is just asking for trouble. But it seems fine, despite having no docs on why it's fancy. But working up the stack more, I see things with "timeout" in the name, which makes me think there are goroutines involved running ServeHTTP async. And sure enough: k8s.io/kubernetes/pkg/apiserver/handlers.go: func (t *timeoutHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
after, msg := t.timeout(r)
if after == nil {
t.handler.ServeHTTP(w, r)
return
}
done := make(chan struct{}, 1)
tw := newTimeoutWriter(w)
go func() {
t.handler.ServeHTTP(tw, r)
done <- struct{}{}
}()
select {
case <-done:
return
case <-after:
tw.timeout(msg)
}
} ... good, you're passing a custom ResponseWriter to the new ServeHTTP started in the goroutine. So I'd expect But: type timeoutWriter interface {
http.ResponseWriter
timeout(string)
}
func newTimeoutWriter(w http.ResponseWriter) timeoutWriter {
base := &baseTimeoutWriter{w: w}
...
return base
}
type baseTimeoutWriter struct {
w http.ResponseWriter
mu sync.Mutex
timedOut bool
wroteHeader bool
hijacked bool
}
func (tw *baseTimeoutWriter) Header() http.Header {
return tw.w.Header()
} The baseTimeOutWriter's Header method doesn't have any synchronization. It's always accessing tw.w (which is the original ServeHTTP's goroutine's ResponseWriter), even the timeoutHandler.ServeHTTP has returned. But https://golang.org/pkg/net/http/#ResponseWriter says:
So, a data race (as is the case in HTTP/1 I imagine) or a panic (with HTTP/2) are both possible. Arguably the HTTP/2 behavior is better because it told you there's a problem rather than having a silent data race which you might not notice unless your tests run under the race detector were particularly thorough concurrency/timeout-wise. You'll need to need to make your timeoutHandler and its timeoutWriter a bit more aware of each other and synchronize any method calls on the original ServerHTTP's ResponseWriter with the original ServeHTTP's completion. |
Thank you for that write up, I knew the timeout handler was incredibly On Thu, Jul 21, 2016 at 2:36 PM, Brad Fitzpatrick [email protected]
|
We only using the timeout handler because we didn't have a real
timeout method on http. If we're ready to use context timeouts we
should start that here and remove the goroutine.
|
Thanks for looking into it @bradfitz @smarterclayton - can we use context timeouts without http2? If so, that might be the best option. |
We should start plumbing context for 1.4. That's an easy win, and a lot of On Mon, Jul 25, 2016 at 1:37 PM, Daniel Smith [email protected]
|
Timeout handler and context are complementary for each other. They do not provide exact same functionality. It highly depends on what we are trying to achieve in long term. Context on request can propagate deadline, cancellation to inner processors. It can help with shutting down go routines and fail fast. A lot of stuff cannot respect context, for example large decoding loop can take seconds. Timeout handler, on the other hand, is used to write HTTP timeout header as long as the inner handler does not finish in the given timeout. It does not do things like cancellation. Do we want both functionality or just what context provides? |
I guess we probably need both. Timeout handler needs to produce a response On Mon, Jul 25, 2016 at 10:51 AM, Xiang Li [email protected] wrote:
|
@lavalamp hijacked is easier to "achieve" than "flush". They are all achievable with some limitation. Once we ever write anything to a HTTP response over the wire, it is impossible to timeout it. So there are good reasons for not supporting hijack and flush to not mess up HTTP responses. There is another discussion happening on the original change here: #10656 (comment). Seems like we do not really rely either hijack and flush for timeout handler. I would suggest to go back to use standard lib. And then write code to respect context properly. And then add flush, hijack functionality if we really need to. |
I am concerned about buffering all responses to memory just to support timeout (as the standard handler does) |
@liggitt Can you provide a better way to do it? If you ever write anything to the actual writer, there is no chance to get it back in my opinion. Also I do not think it really matters. @bradfitz I think this (#29001 (comment)) is the main concern. Opinion? |
I would maybe argue (as I started to above) that we need to handle all the On Mon, Jul 25, 2016 at 2:03 PM, Daniel Smith [email protected]
|
@smarterclayton That is OK. Once we wrote anything to the connection, we just cannot expect the timeout handler to still work by sending a timeout header. It should just shutdown the connection instead. Agree? |
Agree. On Mon, Jul 25, 2016 at 2:50 PM, Xiang Li [email protected] wrote:
|
Agree. On Mon, Jul 25, 2016 at 12:00 PM, Clayton Coleman [email protected]
|
Btw, if the ServeHTTP goroutine panics, that will do the best possible thing for both HTTP/1 and HTTP/2. In HTTP/1, assuming you're replying with at least HTTP/1.1 and you've already flushed the headers so it's using HTTP chunking, it'll kill the TCP connection immediately without a proper 0-byte EOF chunk, so the peer will recognize the response as bogus. In HTTP/2 the server will just RST_STREAM the stream, leaving the TCP connection open, but resetting the stream to the peer so it'll have an error, like the HTTP/1 case. You can use this to have your timeout writer send writes over a channel to the ServeHTTP goroutine in a What are you wanting to use Hijack for? Unless you're explicitly doing WebSockets over HTTP/1.1, I would ignore Hijack. It basically has no other valid purpose. Hijack doesn't even work for HTTP/2, intentionally, since it doesn't mean anything in that context. |
We do websockets over HTTP/1.1, as it happens. We also do SPDY to support On Mon, Jul 25, 2016 at 2:02 PM, Brad Fitzpatrick [email protected]
|
Okay, so use Hijacker for that only. I would avoid it everywhere else.
Why SPDY? SPDY's supposed to be dying and dead soon, and Go's HTTP/2 lets |
SPDY is there because we wrote that before HTTP/2 was a thing, and we have On Mon, Jul 25, 2016 at 2:16 PM, Brad Fitzpatrick [email protected]
|
All that being said, our biggest blocker moving off SPDY is not being
able to reuse enough of the go client to avoid having to reimplement
the protocol ourselves. @ncdc is there an issue somewhere describing
that?
|
@smarterclayton, @ncdc, I'm super curious, because the HTTP/2 in Go 1.6+ should be sufficient. Let me know how I can help. |
@bradfitz we previously chatted about this in golang/go#13444. We have interfaces in Kubernetes that look like this: // Connection represents an upgraded HTTP connection.
type Connection interface {
// CreateStream creates a new Stream with the supplied headers.
CreateStream(headers http.Header) (Stream, error)
// Close resets all streams and closes the connection.
Close() error
// CloseChan returns a channel that is closed when the underlying connection is closed.
CloseChan() <-chan bool
// SetIdleTimeout sets the amount of time the connection may remain idle before
// it is automatically closed.
SetIdleTimeout(timeout time.Duration)
}
// Stream represents a bidirectional communications channel that is part of an
// upgraded connection.
type Stream interface {
io.ReadWriteCloser
// Reset closes both directions of the stream, indicating that neither client
// or server can use it any more.
Reset() error
// Headers returns the headers used to create the stream.
Headers() http.Header
// Identifier returns the stream's ID.
Identifier() uint32
} We hijack the HTTP/1.1 connection and then use spdystream to create streams. We then I assume we can use what's in Go 1.6 for HTTP/2 to use the request and response bodies for this purpose. It would require that we write our own multiplexing/framing code for our streams (in addition to the 3 I mentioned above, we also have an "error" stream and another for sending terminal resizing dimensions). I'm not sure it would be super easy or even possible to shoehorn an HTTP/2 implementation into the above interfaces, though. @smarterclayton I don't believe there's currently an issue |
@bradfitz oh, one other thing: because our data flow is client executable -> kube-apiserver -> kubelet -> container, we end up using a proxy to transmit the data between kube-apiserver and kubelet. This is how we're currently doing it: kubernetes/pkg/registry/generic/rest/proxy.go Lines 128 to 193 in ef0c9f0
The last time I looked at this, |
What is the status on the original issue, which is a race in plumbing problem. |
Automatic merge from submit-queue apiserver: fix timeout handler Protect access of the original writer. Panics if anything has wrote into the original writer or the writer is hijacked when times out. Fix #29001 /cc @smarterclayton @lavalamp The next step would be respect the request context once 1.7 is out. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29594) <!-- Reviewable:end -->
re-opening, this is not closed. |
@timothysc This should be fixed by #29594. Did you see this happen again? |
k, we should open new other items discussed in this issue then too. |
When I was running large scale kubemark test (2000 Nodes with 95 pods each) my API server crashed with the error:
It happened around 110k-120k Pods in the cluster.
It looks like a bug in http2 handling. cc @timothysc @smarterclayton @wojtek-t @lavalamp @kubernetes/sig-api-machinery
The text was updated successfully, but these errors were encountered: