Updating GEP-1742 to clarify timeout formatting #2155

robscott · 2023-06-28T17:47:27Z

What type of PR is this?
/kind cleanup
/kind gep

What this PR does / why we need it:
While reviewing the implementation of this GEP, I realized that the upper and lower bounds of these timeout values were not what I expected. This makes the following changes:

1. Introduces an upper bound of 10 years
In Kubernetes APIs we always try to have a clear upper and lower bound for fields. This enables implementations to write tests that ensure they can support the min and max values for any field in the API. As far as I can tell, 10 years is lower than all implementations support, but still likely sufficiently high to not be a limiting factor for anyone. Note that our API versioning guidelines allow us to loosen validation in future releases, but we try to avoid tightening validation wherever possible. This means that we should aim to start with the tightest validation that is reasonable for a feature.

2. Removes the possibility of setting 0 to mean "no timeout"
Although both HAProxy and Envoy support 0 as "no timeout", I'm not sure how broadly supportable this concept of an "infinite" timeout would be beyond those two dataplanes. If 0 is really just intended to represent "max timeout", why not define a max timeout supported by the API and suggest users set that if that's really what they want? That feels both more specific and less likely to surprise users. If we ever determine that we really do need a "0" value here, we can loosen validation in the future to support it, but once we've allowed that value it's nearly impossible to undo that.

3. Clarifies that values should be rounded up to the closest value an implementation can support
There are likely some implementations that are not going to be able to support the level of granularity supported by Golang's concept of duration. This provides guidance to those implementations for how they should handle this. We likely also will want to add something to status for this in the future, but there's not an obvious condition to attach that to yet.

Does this PR introduce a user-facing change?:

NONE

/cc @frankbu @SRodi @candita @shaneutt @howardjohn

k8s-ci-robot · 2023-06-28T17:47:31Z

@robscott: GitHub didn't allow me to request PR reviews from the following users: frankbu, srodi, candita.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

What type of PR is this?
/kind cleanup
/kind gep

What this PR does / why we need it:
While reviewing the implementation of this GEP, I realized that the upper and lower bounds of these timeout values were not what I expected. This makes the following changes:

1. Introduces an upper bound of 10 years
In Kubernetes APIs we always try to have a clear upper and lower bound for fields. This enables implementations to write tests that ensure they can support the min and max values for any field in the API. As far as I can tell, 10 years is lower than all implementations support, but still likely sufficiently high to not be a limiting factor for anyone. Note that our API versioning guidelines allow us to loosen validation in future releases, but we try to avoid tightening validation wherever possible. This means that we should aim to start with the tightest validation that is reasonable for a feature.

2. Removes the possibility of setting 0 to mean "no timeout"
Although both HAProxy and Envoy support 0 as "no timeout", I'm not sure how broadly supportable this concept of an "infinite" timeout would be beyond those two dataplanes. If 0 is really just intended to represent "max timeout", why not define a max timeout supported by the API and suggest users set that if that's really what they want? That feels both more specific and less likely to surprise users. If we ever determine that we really do need a "0" value here, we can loosen validation in the future to support it, but once we've allowed that value it's nearly impossible to undo that.

3. Clarifies that values should be rounded up to the closest value an implementation can support
There are likely some implementations that are not going to be able to support the level of granularity supported by Golang's concept of duration. This provides guidance to those implementations for how they should handle this. We likely also will want to add something to status for this in the future, but there's not an obvious condition to attach that to yet.

Does this PR introduce a user-facing change?:
NONE
/cc @frankbu @SRodi @candita @shaneutt @howardjohn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2023-06-28T17:47:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: robscott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~geps/OWNERS~~ [robscott]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

robscott · 2023-06-28T17:47:41Z

Adding a hold for consensus.

/hold

robscott · 2023-06-28T18:51:53Z

/retest

frankbu · 2023-06-28T19:31:07Z

geps/gep-1742.md

@@ -364,7 +364,7 @@ sequenceDiagram

 Both timeout fields are string duration values as specified by
 [Golang time.ParseDuration](https://pkg.go.dev/time#ParseDuration) and MUST be >= 1ms
-or 0 to disable (no timeout).
+and <= 100y.


I added some comments related to this in the API PR (https://github.com/kubernetes-sigs/gateway-api/pull/2013/files#r1245498559), but thinking about this more, I really don't see the value of specifying an upper limit.

The more I think about this, it seems that any value greater than a day or so, basically means no timeout. Certainly nobody would want to set a timeout to something like 5 years and really expect that the client and service will both be up and running in a paused state until it returns a value before the 5y deadline.

Unless I'm missing something, a user will want to configure the timeout to something relatively short, and any larger value really just mean "don't timeout", i.e., just hang until the client connection is terminated. So that's why I think the current design is right. User can set the timeout to some small value or 0 to disable timeouts. A user can also chose to set a pretty huge value which the implementation will likely support (up to the implementation limit of probably an int64 max value ns of ~300 years), but any specific number they chose that is large will work the same as 0 for all practical purposes. So documenting some arbitrary large max value is really meaningless IMO.

One more related thought. If we want to say anything about max value it could be something like the maximum value is implementation dependent, but MUST BE at least 1y, or something like that.

That way the user can set it to any sensible value they'd like (>= 1ms) or 0 for infinite (don't timeout). Note that an implementation can implement this infinite (0) timeout by setting the timeout to it's internal max value which will have the same effect given that it's 1y or more.

@robscott btw, you said:

In Kubernetes APIs we always try to have a clear upper and lower bound for fields. This enables implementations to write tests that ensure they can support the min and max values for any field in the API.

How would an implementation test that a 10 year timeout works?

How would an implementation test that a 10 year timeout works?

Yeah, you're right, it's not really possible to write a test that waits this long, and any value that we can feasibly test is likely lower than at least some users would want. I think the real win here is that implementations can know that they support the full range of possible values allowed by the API just by understanding their dataplane.

If we want to say anything about max value it could be something like the maximum value is implementation dependent, but MUST BE at least 1y, or something like that.

I think that's reasonable, but I'm struggling to think of any use case that would benefit from > 1y, so distinguishing beyond that is likely not worthwhile (again I could be wrong here). My strong bias is to start with tighter limits and only loosen them if a clearly defined need is presented.

The more I think about this, it seems that any value greater than a day or so, basically means no timeout. Certainly nobody would want to set a timeout to something like 5 years and really expect that the client and service will both be up and running in a paused state until it returns a value before the 5y deadline.

Yeah I think that's probably accurate, I'm not sure what the right upper limit is here, but lower is probably better as a starting point.

So documenting some arbitrary large max value is really meaningless IMO.

There's a very strong precedent in Kubernetes API review to prevent unbounded values. Even if we were to push forward without an upper limit here, I can't imagine this passing the final level of API review before v0.8.0. Both @thockin and @khenidak have strongly pushed to avoid any unbounded values in this API.

Two things:

String-encoded durations is not the norm in kube APIs, and I think we should question whether that's the right way to represent durations. The norm is int32 with the units encoded in the name - e.g. timeoutSeconds int32, and the unit is almost always seconds.

It's not really unbounded if it is a float32 - there is a natural bound.

That's a good point @thockin. I had thought of something like timeoutSeconds int32 as well, it would certainly simplify the validation initially. A concern I'd have is that some users will want to be able to express more precise timeouts, for example GCP and Envoy allow you to express timeouts in terms of nanoseconds, and most implementations support at least milliseconds as an input. That leaves us with a few options:

Only support one unit (probably seconds or milliseconds) - this matches the approach of ingress-nginx (seconds), GCP BackendServices (seconds), Kong Ingress (ms), and HAProxy Ingress (ms)

Support multiple units, combining the values (timeout.seconds: 1 + timeout.nanoseconds: 5000000 == 1.005s). This roughly matches Envoy and GCP URLMaps which both appear to use proto duration.

Support something like metav1.Duration as proposed by this GEP. This matches Istio, Linkerd, Contour, and many of the underlying dataplanes like NGINX and HAProxy

So with all of that said, we could start with 1 and leave room for 2. For example, that future could look like this:

GEP Today:

timeouts: request: 10s

GEP Tomorrow:

timeouts: request: seconds: 10

GEP Next Year:

timeouts: request: seconds: 10 nanoseconds: 500000

I'm not entirely sure that I prefer that over the more open-ended duration format, but it would be simpler to validate and more in line with other Kubernetes APIs. Does this make sense to you @thockin?

I don't mind allowing 0, but in YAML it will have to be "0" because the field will be transported as a string. 0s does not suffer that problem.

I think we can leave out unary + but keep unary -.

I would strongly prefer to leave that out from valid Gateway API values. Although we can define a broader type that allows negative values,

I'm talking about the difference between "Invalid value: timeout: must match regex ...." and "Invalid value: timeout: must be greater than or equal to 0ms"

I don't mind allowing 0, but in YAML it will have to be "0" because the field will be transported as a string. 0s does not suffer that problem.

Also not that comparing 0s to 0ms to 0h must succeed, as must "1000ms" and 1s

I'm talking about the difference between "Invalid value: timeout: must match regex ...." and "Invalid value: timeout: must be greater than or equal to 0ms"

Unfortunately I don't know of any way to do that with CRD validation. We could do that with webhook validation, but that is far less reliable so I'd rather cover as much as possible directly with validation defined on the CRD.

Also not that comparing 0s to 0ms to 0h must succeed, as must "1000ms" and 1s

Agreed, we should likely have some conformance tests to confirm that.

Catching up here, this is quite the rock to turn over!

I'm supportive of using a string based definition that's a subset of metav1.Duration, however we do it, and also agree with ensuring that we have conformance tests that validate as much behavior as possible.

frankbu · 2023-06-28T19:34:17Z

geps/gep-1742.md

@@ -413,8 +417,14 @@ type HTTPRouteTimeouts struct {
 	// may result in more than one call from the gateway to the destination backend service,
 	// for example, if automatic retries are supported.
 	//
-	// Because the Request timeout encompasses the BackendRequest timeout,
-	// the value of BackendRequest defaults to and must be <= the value of Request timeout.


I think defaulting BankendRequest to the value or Request is both easily implementable and a sensible approach. See my related comment here: https://github.com/kubernetes-sigs/gateway-api/pull/2013/files#r1245517080

Commented in that thread first, copying over here for additional visibility:

Unfortunately we don't have a means to implement variable defaulting unless we want to create and maintain a defaulting webhook, and I'd strongly prefer to avoid that maintenance cost.

We could suggest that if a value is not set here, implementations should consider this the same value as the overall request timeout, but I can't think of any way to validate that, and I'm also not sure that's what we want. Let's say that an implementation has a default of 2 retries with a 10s per-try timeout. If a user configures an overall request timeout of 20s, I think they'd probably prefer the implementation default behavior here instead of that 1 value actually affecting the per-try timeout value as well. In general I'd like to suggest working towards the principle of least surprise, and I think that means avoiding a variable default like this.

Can't you set static defaults through CRD, like +default= ?

Yep, static would be easier here, but what had been suggested here in the original GEP was that if this field was not specified but the overall request timeout was specified, this field should default to the value of the overall request timeout. I don't think we can/should do that. Also since this is a new field and many implementations already have default timeouts, I don't think we can add any static timeout in a backwards compatible way.

howardjohn · 2023-07-03T18:02:49Z

I am sure you can make a regexp that matches the Go duration syntax? We just happen to use a webhook, there are likely better options.

…

On Mon, Jul 3, 2023 at 6:58 AM Frank Budinsky ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In geps/gep-1742.md <#2155 (comment)> : > @@ -364,7 +364,7 @@ sequenceDiagram Both timeout fields are string duration values as specified by [Golang time.ParseDuration](https://pkg.go.dev/time#ParseDuration) and MUST be >= 1ms -or 0 to disable (no timeout). +and <= 100y. Istio uses protobuf to define all its APIs. Here's a duration field <https://github.com/istio/api/blob/f930ebfda37a76446f229bf7e465bbbf305501af/networking/v1beta1/virtual_service.proto#L618> example. In the generated GO binding that translates to durationpb.Duration <https://pkg.go.dev/google.golang.org/protobuf/types/known/durationpb>, similar API to metav1.Duration. — Reply to this email directly, view it on GitHub <#2155 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEYGXKXPJ2SBBWVOTUJEGLXOLFZJANCNFSM6AAAAAAZXOR5BE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

k8s-ci-robot · 2023-07-10T20:02:31Z

@robscott: GitHub didn't allow me to request PR reviews from the following users: srodi.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

I've updated this PR to use the approach I've described in the comment above, PTAL.

/cc @frankbu @SRodi @shaneutt @thockin @howardjohn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

candita · 2023-07-11T02:24:28Z

geps/gep-1742.md

@@ -403,13 +401,22 @@ type HTTPRouteTimeouts struct {
 	// request stream has been received instead of immediately after the transaction is
 	// initiated by the client.
 	//
+	// This value follows a subset of Golang and XML Duration formatting, for example 1m, 1s,
+	// or 1ms. This value MUST BE >= 0 and <= 9999h. When "0" is specified, it indicates that


Hour is not an accepted value in your kubebuilder validation pattern. 9999h is 599940m, so if you want to allow that amount, you would need to change the pattern to allow 6 digits.

Good catch, thanks! This is an artifact of when I originally had hours as an allowed unit, will update pending the conversation above about whether or not we should add that back.

candita · 2023-07-11T02:24:51Z

geps/gep-1742.md

 	// When this field is unspecified, request timeout behavior is implementation-dependent.
 	//
 	// Support: Extended
 	//
 	// +optional
-	// +kubebuilder:validation:Format=duration
-	Request *metav1.Duration `json:"request,omitempty"`
+	// +kubebuilder:validation:Pattern=^(0|[1-9][0-9]{0,3}(m|s|ms))$


9999m is not long enough if you were talking about a max of 1 year. Also, I suppose there could be websockets that require longer than 9999m. If not now, then in the future.

If you are using websockets I would imagine you want infinite?

I cannot imagine a scenario where you want a timeout over 9999m but not infinite? Is there a use case I am not aware of?

I suppose if you don't want to allow super-long streams..

candita · 2023-07-11T02:25:18Z

geps/gep-1742.md

-	// +kubebuilder:validation:Format=duration
-	Request *metav1.Duration `json:"request,omitempty"`
+	// +kubebuilder:validation:Pattern=^(0|[1-9][0-9]{0,3}(m|s|ms))$
+  Request *metav1.Duration `json:"request,omitempty"`


Suggested change

Request *metav1.Duration `json:"request,omitempty"`

Request *metav1.Duration `json:"request,omitempty"`

candita · 2023-07-11T02:38:29Z

geps/gep-1742.md

@@ -419,14 +426,22 @@ type HTTPRouteTimeouts struct {
 	// may result in more than one call from the gateway to the destination backend service,
 	// for example, if automatic retries are supported.
 	//
-	// Because the Request timeout encompasses the BackendRequest timeout,
-	// the value of BackendRequest defaults to and must be <= the value of Request timeout.
+	// This value follows a subset of Golang and XML Duration formatting, for example 1m, 1s,


This applies to text above this line. Can you add an explanation about whether the Request time encompasses/includes the BackendRequest time? Is Request = BackendRequest + other processing time, or is total time = Request + BackendRequest, implying that Request does NOT include the BackendRequest time?

I agree with @candita. This change should only remove the value <= part. The doc still needs some words to be clear that the the Request timeout encoumpasses the BackendRequest timeout.

Another thought. I understand that we can't enforce that BackendRequest timeout must be <= Request timeout, but maybe we could/should still say something like this?

Suggested change

// This value follows a subset of Golang and XML Duration formatting, for example 1m, 1s,

// Because the Request timeout encompasses the BackendRequest timeout,

// the value of BackendRequest should be <= the value of Request timeout to have any effect.

// This value follows a subset of Golang Duration formatting, for example 1m, 1s,

thockin · 2023-07-18T19:18:06Z

If we open this can of worms we will end up with a "standard" type, which can eventually use all the declarative validation. Before we lock this down, we should think about how this duration-as-string can be used in CEL.

…

On Tue, Jul 18, 2023, 12:08 PM Rob Scott ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In geps/gep-1742.md <#2155 (comment)> : > @@ -364,7 +364,7 @@ sequenceDiagram Both timeout fields are string duration values as specified by [Golang time.ParseDuration](https://pkg.go.dev/time#ParseDuration) and MUST be >= 1ms -or 0 to disable (no timeout). +and <= 100y. I'm talking about the difference between "Invalid value: timeout: must match regex ...." and "Invalid value: timeout: must be greater than or equal to 0ms" Unfortunately I don't know of any way to do that with CRD validation. We could do that with webhook validation, but that is far less reliable so I'd rather cover as much as possible directly with validation defined on the CRD. Also not that comparing 0s to 0ms to 0h must succeed, as must "1000ms" and 1s Agreed, we should likely have some conformance tests to confirm that. — Reply to this email directly, view it on GitHub <#2155 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVD3GI6A6UIQBY4AKYTXQ3NKTANCNFSM6AAAAAAZXOR5BE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

jpbetz · 2023-07-18T19:36:56Z

If we open this can of worms we will end up with a "standard" type, which can eventually use all the declarative validation. Before we lock this down, we should think about how this duration-as-string can be used in CEL.

Any field in the openAPI schema of a CRD can type: string, format:duration today. In any x-kubernetes-validation strings, the duration is understood by CEL as a duration type making it trivial to validate against min/max limits, and can be used in conjunction with datetime fields to do time math.

xref: https://kubernetes.io/docs/reference/using-api/cel/

jpbetz · 2023-07-18T20:06:36Z

You can also cast a string to a duration in CEL: duration(self) < duration("1h") for type: string (but without format: duration set) if needed.

thockin · 2023-07-18T22:33:43Z

Is there a spec for strings that can be casted to duration?

jpbetz · 2023-07-18T22:47:30Z

Is there a spec for strings that can be casted to duration?

Soo, I'm not thrilled with the current situation, but here it is:

From the CEL spec: "Duration strings should support the following suffixes: "h" (hour), "m" (minute), "s" (second), "ms" (millisecond), "us" (microsecond), and "ns" (nanosecond). Duration strings may be zero, negative, fractional, and/or compound. Examples: "0", "-1.5h", "1m6s"" (I believe "µs" supported by the implementation since I believe it uses the golang impl under the hood)

For comparison:

kube-openapi: https://github.com/kubernetes/kube-openapi/blob/master/pkg/validation/strfmt/duration.go#L33-L42 (supports "d" (day) and "w" or "wk" (week) in addition to what is supported by golang impl-- it calls down to the golang impl except for these cases)

Golang: https://pkg.go.dev/time#ParseDuration (supports "µs" as an alternative to "us")

I'm not aware of a recognized standard for any of this..

thockin · 2023-07-18T23:20:57Z

This issue is proposing a kube standard for fields which represent duration. We're trying to keep it small and easy, since it will require client-side parsing to use (meaning we can't ever change it, really). We described a subset of CEL and Go - no fractional or compound, and noting smaller than ms or larger than h. No literal 0, because YAML is too pervasive and will coerce to int. Fractional would mean allowing things like "0.001us" and defining floating-point precision. Compound is just harder to spec, parse, and test. All of those are potentially debatable. As far as I can see, we don't have any actual duration-string fields in GA APIs, yet. Tell me if I am wrong there? I don't care if a CEL expression allows a broader set for casting, as long as none of our APIs allow that to be saved. We would need to be careful that CEL emits values in our format (e.g. always 90s and never 1m30s), which could be able argument for adopting CEL's spec directly. I think we need to either own our own fate or else ban these strings.

…

On Tue, Jul 18, 2023, 3:47 PM Joe Betz ***@***.***> wrote: Is there a spec for strings that can be casted to duration? Soo, I'm not thrilled with the current situation, but here it is: From the CEL spec <https://github.com/google/cel-spec/blob/master/doc/langdef.md>: "Duration strings should support the following suffixes: "h" (hour), "m" (minute), "s" (second), "ms" (millisecond), "us" (microsecond), and "ns" (nanosecond). Duration strings may be zero, negative, fractional, and/or compound. Examples: "0", "-1.5h", "1m6s"" For comparison: openapi: https://github.com/kubernetes/kube-openapi/blob/master/pkg/validation/strfmt/duration.go#L33-L42 (also supports "d" (day), "w" or "wk" (week) Golang: https://pkg.go.dev/time#ParseDuration (almost identical but also supports "µs" as an alternative to "us") I'm not aware of a recognized standard for any of this.. — Reply to this email directly, view it on GitHub <#2155 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVGUPBQUBZYQDK5PPRLXQ4HA3ANCNFSM6AAAAAAZXOR5BE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

frankbu · 2023-07-31T21:08:05Z

@robscott What's the latest status on this duration format decision? I've added conformance tests for the new timeout features assuming we're going with the restricted string format: #2254

If this is going to continue to block the timeout features, maybe we should just switch the fields to uint32 millisecond values and live with the less nice API?

robscott · 2023-07-31T22:28:06Z

@frankbu sorry this has taken so long! I think there's a rough consensus that we should aim for a format that is a subset of both OpenAPI/CEL and Go duration formats. That would include the following limitations:

No literal 0 ("0s" would be supported)
No fractional values
No compound values
Limited units (avoid day, week, and µs)

The remaining problem is if we can separate validation of formatting from min/max values. CEL should enable us to do that, but first we need to add CEL (x-ref #2253 and others)

frankbu · 2023-08-01T15:03:55Z

No literal 0 ("0s" would be supported)

This one doesn't really sound necessary to me. I understand that 0 in yaml will be assumed int (not string), but that just means you need to quote the value: "0", which I'd rather do than add the s: 0s. We could also do what Contour does and allow the value of "infinity" for no timeout, but personally, I prefer 0.

thockin · 2023-08-01T23:09:59Z

I think a good first step would be to try to convince the openapi layer to validate fields of a specific type against the format we want. I don't know where to do that, but I think we should shoot for a primitive field type. E.g. Timeout kubeapi.DurationString such that it doesn't require any explicit validation in CEL (assuming any parseable value is OK). Then you could use it with CEL validation to check the bounds, but not the format.

And if we think CEL is the path forward, maybe just adopting CEL's format is best?

kflynn · 2023-08-03T02:32:24Z

@thockin, your suggestion of maybe starting with CEL's format sounded pretty sane to me. I spent some time yesterday looking over code for CEL and other extant formats, then spent some time ~~argu~~discussing things with @robscott (😇), and ultimately ended up wanting to recommend a slightly different answer for v0.8.0.

tl;dr: Let's use the CEL duration type, restricted by a regex of ^([0-9]{1,5}(h|m|s|ms)){1,4}$. That'll let us put some bounds on the input, while being compatible with a wide range of extant code and allowing us to use CEL for min/max validation.

Keep reading for all the details.

I found four different extant formats that seem obviously worth considering. (This starts out very Go-centric because Go is the implementation language for Kubernetes itself. Bear with me.)

`metav1.Duration` (aka `time.ParseDuration`)

k8s.io/apimachinery/pkg/apis/meta/v1 defines a Duration type. Its Unmarshal method calls time.ParseDuration. Per its docs:

A duration string is a possibly signed sequence of decimal numbers, each with
optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid
time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".

So: negatives, multiple components, and floating-point are OK, but no units larger than hours are supported. Also note that where the doc says “µs”, both U+00B5 (Micro Sign) and U+03BC (Greek Small Letter Mu) are accepted, beautifully demonstrating the complexities of Unicode-enabled source code.

(Source: https://github.com/golang/go/blob/master/src/time/format.go)

CEL (aka, again, `time.ParseDuration`)

CEL in the form of google/cel-go defines a Duration type that can be converted from a string using the ConvertToType function in common/types/string.go. That function simply calls time.ParseDuration and uses the result.

(Source: https://github.com/google/cel-go/blob/master/common/types/string.go)

APIServer “OpenAPI” code (aka `strfmt.ParseDuration`)

Though the README for k8s.io/apimachinery says that the APIServer is one of its primary clients, k8s.io/apiserver/pkg/cel/common/values.go defines its own UnstructuredToVal that uses strfmt.ParseDuration to parse string values marked with the duration format.

strfmt.ParseDuration starts by calling time.ParseDuration on its input, and if that works, it uses the result – so everything supported by time.ParseDuration is also supported by strfmt.ParseDuration. If time.ParseDuration doesn’t work, strfmt.ParseDuration will separately parse multiple components, but it does not support floating-point, and it does support “d” and “w” for days and weeks (neither of which take leap seconds or leap years into account). It also supports more spelled-out units (e.g. “milli” for “ms” and “hour” for “h”).

(Why, if the file has cel in its name, do I call this “OpenAPI” code? Because this split between the string type and the duration format is an OpenAPI thing, not a CEL thing, and because the APIServer is described as using OpenAPI for CRD validation.)

(Source: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/cel/common/values.go)

ISO8601/RFC3339

We would be remiss not to include ISO8601 here. It’s much less user-friendly than the Go duration types above, but it is the standard representation of time for many use cases. An example ISO8601 duration is “P1DT1H1M1.1S” for 1 day, 1 hour, 1 minute, and 1.1 seconds: the leading “P” is required, no units smaller than seconds are supported (although fractional seconds are supported), and all the units are case-sensitive.

RFC3339 is mostly equivalent to ISO8601, but is case-insensitive. The leading “P” or “p” is still required.

Recommendation

Given that everything above except ISO8601 has a common denominator of Go's time.ParseDuration, I think we should start with that format. It's already used in apimachinery, CEL, and the APIServer. It’s (obviously) supported in Go. It’s also supported in Rust in the kube-rs crate (as of kube-rs pull 1224, merged on 5 June 2023), and in Python in the durationpy package. These three languages should encompass the vast majority of the development in the Kubernetes ecosystem.

That being said, it definitely has downsides. Summarizing a lot of earlier discussion:

Supporting floating-point values opens its own can of worms around precision, corner cases, normalization, etc., and time.ParseDuration doesn’t discuss these. Looking at the source, it uses float64 (for everything, in fact, even if the number is actually written without a fractional part), and Go seems to just assume that saying “IEEE754” will cover the rest.
Negative durations can easily come up when you’re doing calculations with time values, but it’s tough to think of a use case for specifying a negative duration. time.ParseDuration’s ability to handle a negative duration may never be relevant for us.
time.ParseDuration doesn't support anything larger than hours. While timeouts are very unlikely to need days or weeks, the type is a Duration, not a Timeout, and days and weeks can be quite convenient when describing e.g. the lifetime of a TLS certificate (even though the disclaimer about leap seconds and days is critical).
time.ParseDuration is extremely open-ended in what it accepts. For example, it doesn't limit the maximum length of the input or the number of individual components. It does have a maximum numeric value of 1<<63 for numeric values (both for individual components and for the overall sum), but that's not really a meaningful limit.

Given those downsides, some further restriction of time.ParseDuration seems in order. @robscott and I think that it would work well to only allow durations that match

^([0-9]{1,5}(h|m|s|ms)){1,4}$

which is to say:

Negative values and floating-point numbers are not allowed.
Each component must have a numeric value with a maximum of five digits.
At most four components are allowed.
Hours, minutes, seconds, and milliseconds are supported.
Microseconds, days, and weeks are considered out of scope for, but could be added later.

That will allow us to put some bounds on the inputs while also just using the native CEL duration type (since anything that the regex matches can be parsed by time.ParseDuration, and that's what the CEL duration type uses). It should play nicely with using CEL for min/max validation, and it still allows "0s" for "no timeout".

Critically, it restricts the scope for v0.8.0 to something manageable. We can leave discussion of whether we want to support "d" and "w", for example, for v1.0.0; the only change we're talking about for v0.8.0 is adding a regex.

(For the curious, the maximum duration of 99999h99999m99999s99999ms is a rather unwieldy way to express about 11.6 years.)

frankbu · 2023-08-04T13:38:07Z

@kflynn your suggestion SGTM! Just wondering why you think we need compound units (i.e., something like 1s500ms instead of just 1500ms)? Seems like extra complication for no real value add.

kflynn · 2023-08-04T15:22:15Z

@frankbu There are two reasons:

People are already accustomed to it in multiple other contexts, which means they'll try to use it (possibly just by copying and pasting without thinking about it from e.g. some other CRD that allows compound durations). When it doesn't work, they'll be confused and upset and they'll blame us. Since the capability to parse compound durations is already in the code, I'd prefer to just avoid making them confused and upset. 🙂 (Dropping floating-point support already concerns me on this front, FTR.)
Being forced to drop to the smaller unit instead of support compound durations raises some usability concerns. Going from 1s500ms to 1500ms isn't really that bad, but going from 3h45m to 225m means I had to actually think about the math (🙂) and it also means that a casual reader is likely to be thinking "oh, that's a minutes-long duration" before realizing that it's actually hours. (I realize that it's very unlikely that we'll need to set timeouts of nearly four hours; this is mostly a concern for other uses of durations.)

frankbu · 2023-08-04T15:56:33Z

(Dropping floating-point support already concerns me on this front, FTR.)

Personally, I think not supporting floating point is the much bigger chance of people getting burned with cut-and-paste. I think values like 1.5s or 3.5h are more common than 1s500ms or 3h30m.

but going from 3h45m to 225m means I had to actually think about the math (🙂)

True, but once your talking more than a couple of hours (easy math), what's the chance that you still need finer granularity than an hour, e.g., 48h2m 😃

Anyway, I'd be inclined to go with the more restrictive format, but it's not that big a deal either way IMO.

frankbu · 2023-08-04T16:06:43Z

Personally, I think not supporting floating point is the much bigger chance of people getting burned with cut-and-paste.

@kflynn Maybe allowing 1 digit of decimal precision would be a good compromise, i.e. 0.5sbut not 0.333s?

kflynn · 2023-08-04T18:54:08Z

@frankbu Yeah, that is definitely my worry about floats. 😐 I'm more prepared to defend that, because floats are pretty awful under the hood, so I'm more invested in ditching it. (Even 0.1 can't actually be represented exactly in an IEEE754 float.)

So: the proposal in GEP-2257 doesn't support floating-point at all, and I'd be delighted to land that and update GEP-1742 to use it... and if the screams about floating-point being gone are load, well, it's Experimental, we can figure out how to handle it. 🙂

frankbu · 2023-08-14T15:49:11Z

Hey @robscott, now that #2258 has merged, are you planning to update this PR to align with the new duration format? If you're too busy and would like to close this one, I can take a pass at a new PR to do it and fix the other GEP-1742 issues. As soon as we get this done, we will also update #2013 so that it can hopefully get merged.

robscott · 2023-08-18T19:09:29Z

Thanks to everyone for the feedback here and to @kflynn for finalizing this with GEP #2257 and then replacing this with #2316. Going to close this one out.

/close

k8s-ci-robot · 2023-08-18T19:09:34Z

@robscott: Closed this PR.

In response to this:

Thanks to everyone for the feedback here and to @kflynn for finalizing this with GEP #2257 and then replacing this with #2316. Going to close this one out.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Jun 28, 2023

k8s-ci-robot requested a review from shaneutt June 28, 2023 17:47

k8s-ci-robot added the kind/gep PRs related to Gateway Enhancement Proposal(GEP) label Jun 28, 2023

k8s-ci-robot requested a review from howardjohn June 28, 2023 17:47

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 28, 2023

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 28, 2023

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 28, 2023

robscott mentioned this pull request Jun 28, 2023

HTTPRoute timeout #2013

Merged

robscott mentioned this pull request Jun 28, 2023

Add IncompatibleFilters reason for Accepted condition #2150

Merged

frankbu reviewed Jun 28, 2023

View reviewed changes

Updating GEP-1742 to clarify timeout formatting

ce1f531

robscott force-pushed the timeout-limits branch from 1187bc2 to ce1f531 Compare July 10, 2023 20:00

robscott changed the title ~~Updating GEP-1742 to clarify min and max values~~ Updating GEP-1742 to clarify timeout formatting Jul 10, 2023

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 10, 2023

k8s-ci-robot requested review from thockin and frankbu July 10, 2023 20:02

candita reviewed Jul 11, 2023

View reviewed changes

frankbu mentioned this pull request Aug 4, 2023

GEP-2257: Gateway API Duration Format #2258

Merged

frankbu mentioned this pull request Aug 18, 2023

Update GEP-1742 to refer to GEP-2257 #2316

Closed

k8s-ci-robot closed this Aug 18, 2023

SRodi mentioned this pull request Oct 6, 2023

REQUEST: New membership for @SRodi kubernetes/org#4505

Closed

9 tasks

	Request *metav1.Duration `json:"request,omitempty"`
	Request *metav1.Duration `json:"request,omitempty"`

Updating GEP-1742 to clarify timeout formatting #2155

Updating GEP-1742 to clarify timeout formatting #2155

Conversation

robscott commented Jun 28, 2023

k8s-ci-robot commented Jun 28, 2023

k8s-ci-robot commented Jun 28, 2023

robscott commented Jun 28, 2023

robscott commented Jun 28, 2023

frankbu Jun 28, 2023 • edited Loading

Choose a reason for hiding this comment

frankbu Jun 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robscott Jun 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robscott Jun 29, 2023 • edited Loading

Choose a reason for hiding this comment

howardjohn commented Jul 3, 2023 via email

k8s-ci-robot commented Jul 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

candita Jul 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin commented Jul 18, 2023 via email

jpbetz commented Jul 18, 2023 • edited Loading

jpbetz commented Jul 18, 2023 • edited Loading

thockin commented Jul 18, 2023

jpbetz commented Jul 18, 2023 • edited Loading

thockin commented Jul 18, 2023 via email

frankbu commented Jul 31, 2023

robscott commented Jul 31, 2023

frankbu commented Aug 1, 2023 • edited Loading

thockin commented Aug 1, 2023

kflynn commented Aug 3, 2023

metav1.Duration (aka time.ParseDuration)

CEL (aka, again, time.ParseDuration)

APIServer “OpenAPI” code (aka strfmt.ParseDuration)

ISO8601/RFC3339

Recommendation

frankbu commented Aug 4, 2023

kflynn commented Aug 4, 2023

frankbu commented Aug 4, 2023

frankbu commented Aug 4, 2023

kflynn commented Aug 4, 2023

frankbu commented Aug 14, 2023

robscott commented Aug 18, 2023

k8s-ci-robot commented Aug 18, 2023

frankbu Jun 28, 2023 •

edited

Loading

frankbu Jun 28, 2023 •

edited

Loading

robscott Jun 29, 2023 •

edited

Loading

robscott Jun 29, 2023 •

edited

Loading

candita Jul 11, 2023 •

edited

Loading

jpbetz commented Jul 18, 2023 •

edited

Loading

jpbetz commented Jul 18, 2023 •

edited

Loading

jpbetz commented Jul 18, 2023 •

edited

Loading

frankbu commented Aug 1, 2023 •

edited

Loading

`metav1.Duration` (aka `time.ParseDuration`)

CEL (aka, again, `time.ParseDuration`)

APIServer “OpenAPI” code (aka `strfmt.ParseDuration`)