-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating GEP-1742 to clarify timeout formatting #2155
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -206,7 +206,7 @@ sequenceDiagram | |||||||||
participant C as Client | ||||||||||
participant P as Proxy | ||||||||||
participant U as Upstream | ||||||||||
|
||||||||||
C->>P: Connection Started | ||||||||||
activate U | ||||||||||
activate C | ||||||||||
|
@@ -224,7 +224,7 @@ sequenceDiagram | |||||||||
activate U | ||||||||||
note left of U: timeout queue<br/>(wait for available server) | ||||||||||
deactivate U | ||||||||||
|
||||||||||
P->>U: Connection Started | ||||||||||
activate U | ||||||||||
P->>U: Starts sending Request | ||||||||||
|
@@ -370,7 +370,7 @@ sequenceDiagram | |||||||||
|
||||||||||
Both timeout fields are string duration values as specified by | ||||||||||
[Golang time.ParseDuration](https://pkg.go.dev/time#ParseDuration) and MUST be >= 1ms | ||||||||||
or 0 to disable (no timeout). | ||||||||||
and <= 100y. | ||||||||||
|
||||||||||
### GO | ||||||||||
|
||||||||||
|
@@ -388,8 +388,6 @@ type HTTPRouteRule struct { | |||||||||
} | ||||||||||
|
||||||||||
// HTTPRouteTimeouts defines timeouts that can be configured for an HTTPRoute. | ||||||||||
// Timeout values are formatted like 1h/1m/1s/1ms as parsed by Golang time.ParseDuration | ||||||||||
// and MUST BE >= 1ms or 0 to disable (no timeout). | ||||||||||
type HTTPRouteTimeouts struct { | ||||||||||
// Request specifies the duration for processing an HTTP client request after which the | ||||||||||
// gateway will time out if unable to send a response. | ||||||||||
|
@@ -403,13 +401,22 @@ type HTTPRouteTimeouts struct { | |||||||||
// request stream has been received instead of immediately after the transaction is | ||||||||||
// initiated by the client. | ||||||||||
// | ||||||||||
// This value follows a subset of Golang and XML Duration formatting, for example 1m, 1s, | ||||||||||
// or 1ms. This value MUST BE >= 0 and <= 9999h. When "0" is specified, it indicates that | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hour is not an accepted value in your kubebuilder validation pattern. 9999h is 599940m, so if you want to allow that amount, you would need to change the pattern to allow 6 digits. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch, thanks! This is an artifact of when I originally had hours as an allowed unit, will update pending the conversation above about whether or not we should add that back. |
||||||||||
// no timeout should be implemented. If an implementation cannot support the granularity | ||||||||||
// specified, it MUST round up to the closest value it supports. For example, if 1ms is | ||||||||||
// specified, and an implementation can only configure timeouts in seconds, it MUST configure | ||||||||||
// 1 second as the timeout. In the special case of a "0" timeout, the implementation SHOULD | ||||||||||
// NOT implement a timeout. If that is not possible it MUST configure the maximum timeout | ||||||||||
// value it supports. | ||||||||||
// | ||||||||||
// When this field is unspecified, request timeout behavior is implementation-dependent. | ||||||||||
// | ||||||||||
// Support: Extended | ||||||||||
// | ||||||||||
// +optional | ||||||||||
// +kubebuilder:validation:Format=duration | ||||||||||
Request *metav1.Duration `json:"request,omitempty"` | ||||||||||
// +kubebuilder:validation:Pattern=^(0|[1-9][0-9]{0,3}(m|s|ms))$ | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 9999m is not long enough if you were talking about a max of 1 year. Also, I suppose there could be websockets that require longer than 9999m. If not now, then in the future. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you are using websockets I would imagine you want infinite? I cannot imagine a scenario where you want a timeout over 9999m but not infinite? Is there a use case I am not aware of? I suppose if you don't want to allow super-long streams.. |
||||||||||
Request *metav1.Duration `json:"request,omitempty"` | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
// BackendRequest specifies a timeout for an individual request from the gateway | ||||||||||
// to a backend service. This covers the time from when the request first starts being | ||||||||||
|
@@ -419,14 +426,22 @@ type HTTPRouteTimeouts struct { | |||||||||
// may result in more than one call from the gateway to the destination backend service, | ||||||||||
// for example, if automatic retries are supported. | ||||||||||
// | ||||||||||
// Because the Request timeout encompasses the BackendRequest timeout, | ||||||||||
// the value of BackendRequest defaults to and must be <= the value of Request timeout. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think defaulting BankendRequest to the value or Request is both easily implementable and a sensible approach. See my related comment here: https://github.com/kubernetes-sigs/gateway-api/pull/2013/files#r1245517080 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Commented in that thread first, copying over here for additional visibility: Unfortunately we don't have a means to implement variable defaulting unless we want to create and maintain a defaulting webhook, and I'd strongly prefer to avoid that maintenance cost. We could suggest that if a value is not set here, implementations should consider this the same value as the overall request timeout, but I can't think of any way to validate that, and I'm also not sure that's what we want. Let's say that an implementation has a default of 2 retries with a 10s per-try timeout. If a user configures an overall request timeout of 20s, I think they'd probably prefer the implementation default behavior here instead of that 1 value actually affecting the per-try timeout value as well. In general I'd like to suggest working towards the principle of least surprise, and I think that means avoiding a variable default like this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can't you set static defaults through CRD, like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, static would be easier here, but what had been suggested here in the original GEP was that if this field was not specified but the overall request timeout was specified, this field should default to the value of the overall request timeout. I don't think we can/should do that. Also since this is a new field and many implementations already have default timeouts, I don't think we can add any static timeout in a backwards compatible way. |
||||||||||
// This value follows a subset of Golang and XML Duration formatting, for example 1m, 1s, | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This applies to text above this line. Can you add an explanation about whether the Request time encompasses/includes the BackendRequest time? Is Request = BackendRequest + other processing time, or is total time = Request + BackendRequest, implying that Request does NOT include the BackendRequest time? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with @candita. This change should only remove the value <= part. The doc still needs some words to be clear that the the Request timeout encoumpasses the BackendRequest timeout. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another thought. I understand that we can't enforce that
Suggested change
|
||||||||||
// or 1ms. This value MUST BE >= 0 and <= 9999h. When "0" is specified, it indicates that | ||||||||||
// no timeout should be implemented. If an implementation cannot support the granularity | ||||||||||
// specified, it MUST round up to the closest value it supports. For example, if 1ms is | ||||||||||
// specified, and an implementation can only configure timeouts in seconds, it MUST configure | ||||||||||
// 1 second as the timeout. In the special case of a "0" timeout, the implementation SHOULD | ||||||||||
// NOT implement a timeout. If that is not possible it MUST configure the maximum timeout | ||||||||||
// value it supports. | ||||||||||
// | ||||||||||
// When this field is unspecified, request timeout behavior is implementation-dependent. | ||||||||||
// | ||||||||||
// Support: Extended | ||||||||||
// | ||||||||||
// +optional | ||||||||||
// +kubebuilder:validation:Format=duration | ||||||||||
BackendRequest *metav1.Duration `json:"backendRequest,omitempty"` | ||||||||||
// +kubebuilder:validation:Pattern=^(0|[1-9][0-9]{0,3}(m|s|ms))$ | ||||||||||
BackendRequest *string `json:"backendRequest,omitempty"` | ||||||||||
} | ||||||||||
``` | ||||||||||
|
||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments related to this in the API PR (https://github.com/kubernetes-sigs/gateway-api/pull/2013/files#r1245498559), but thinking about this more, I really don't see the value of specifying an upper limit.
The more I think about this, it seems that any value greater than a day or so, basically means no timeout. Certainly nobody would want to set a timeout to something like 5 years and really expect that the client and service will both be up and running in a paused state until it returns a value before the 5y deadline.
Unless I'm missing something, a user will want to configure the timeout to something relatively short, and any larger value really just mean "don't timeout", i.e., just hang until the client connection is terminated. So that's why I think the current design is right. User can set the timeout to some small value or 0 to disable timeouts. A user can also chose to set a pretty huge value which the implementation will likely support (up to the implementation limit of probably an int64 max value ns of ~300 years), but any specific number they chose that is large will work the same as 0 for all practical purposes. So documenting some arbitrary large max value is really meaningless IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more related thought. If we want to say anything about max value it could be something like
the maximum value is implementation dependent, but MUST BE at least 1y
, or something like that.That way the user can set it to any sensible value they'd like (>= 1ms) or 0 for infinite (don't timeout). Note that an implementation can implement this infinite (0) timeout by setting the timeout to it's internal max value which will have the same effect given that it's 1y or more.
@robscott btw, you said:
How would an implementation test that a 10 year timeout works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right, it's not really possible to write a test that waits this long, and any value that we can feasibly test is likely lower than at least some users would want. I think the real win here is that implementations can know that they support the full range of possible values allowed by the API just by understanding their dataplane.
I think that's reasonable, but I'm struggling to think of any use case that would benefit from > 1y, so distinguishing beyond that is likely not worthwhile (again I could be wrong here). My strong bias is to start with tighter limits and only loosen them if a clearly defined need is presented.
Yeah I think that's probably accurate, I'm not sure what the right upper limit is here, but lower is probably better as a starting point.
There's a very strong precedent in Kubernetes API review to prevent unbounded values. Even if we were to push forward without an upper limit here, I can't imagine this passing the final level of API review before v0.8.0. Both @thockin and @khenidak have strongly pushed to avoid any unbounded values in this API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things:
String-encoded durations is not the norm in kube APIs, and I think we should question whether that's the right way to represent durations. The norm is
int32
with the units encoded in the name - e.g.timeoutSeconds int32
, and the unit is almost always seconds.It's not really unbounded if it is a float32 - there is a natural bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point @thockin. I had thought of something like
timeoutSeconds int32
as well, it would certainly simplify the validation initially. A concern I'd have is that some users will want to be able to express more precise timeouts, for example GCP and Envoy allow you to express timeouts in terms of nanoseconds, and most implementations support at least milliseconds as an input. That leaves us with a few options:timeout.seconds: 1
+timeout.nanoseconds: 5000000
==1.005s
). This roughly matches Envoy and GCP URLMaps which both appear to use proto duration.metav1.Duration
as proposed by this GEP. This matches Istio, Linkerd, Contour, and many of the underlying dataplanes like NGINX and HAProxySo with all of that said, we could start with 1 and leave room for 2. For example, that future could look like this:
GEP Today:
GEP Tomorrow:
GEP Next Year:
I'm not entirely sure that I prefer that over the more open-ended duration format, but it would be simpler to validate and more in line with other Kubernetes APIs. Does this make sense to you @thockin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind allowing 0, but in YAML it will have to be
"0"
because the field will be transported as a string.0s
does not suffer that problem.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm talking about the difference between "Invalid value: timeout: must match regex ...." and "Invalid value: timeout: must be greater than or equal to 0ms"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not that comparing
0s
to0ms
to0h
must succeed, as must "1000ms" and1s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I don't know of any way to do that with CRD validation. We could do that with webhook validation, but that is far less reliable so I'd rather cover as much as possible directly with validation defined on the CRD.
Agreed, we should likely have some conformance tests to confirm that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching up here, this is quite the rock to turn over!
I'm supportive of using a string based definition that's a subset of
metav1.Duration
, however we do it, and also agree with ensuring that we have conformance tests that validate as much behavior as possible.