Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define how to reflect partial validity of routing rules in the HTTPRoute status #1696

Closed
pleshakov opened this issue Feb 4, 2023 · 11 comments · Fixed by #2429
Closed

Define how to reflect partial validity of routing rules in the HTTPRoute status #1696

pleshakov opened this issue Feb 4, 2023 · 11 comments · Fixed by #2429
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-blocker MUST be completed to complete the milestone
Milestone

Comments

@pleshakov
Copy link
Contributor

What would you like to be added:

If we have an HTTPRoute resource with some if its routing rules valid, some -- not valid (for example, PathMatchType in HTTPPathMatch is not one of Exact;PathPrefix;RegularExpression), define how to report that in the HTTPRoute status.

Why this is needed:

Current documentation is a bit contradictory:

The HTTPRoute docs say:

	// Unknown values here must result in the implementation setting the
	// Accepted Condition for the Route to `status: False`, with a
	// Reason of `UnsupportedValue`.

However, in GEP-1364, it is mentioned:

HTTPRoute with two rules, one valid and one which specifies a HTTPRequestRedirect filter _and a HTTPURLRewrite filter. Accepted is true, because the valid rule can produce some config in the data plane. We'll need to raise the more specific error condition for an incompatible filter combination as well to make the partial validity clear.

This comes from #1674 and a discussion on Kubernetes Gateway API Slack sig-network-gateway-api https://kubernetes.slack.com/archives/CR0H13KGA/p1675122817360569

@pleshakov pleshakov added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 4, 2023
@sunjayBhatia
Copy link
Member

sunjayBhatia commented Feb 4, 2023

This was an attempt at addressing part of this, specifically the fact that there are incompatible filters present, still WIP: #1540

also xref: #1521

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 21, 2023
@shaneutt shaneutt added this to the v1.0.0 milestone Jul 3, 2023
@shaneutt
Copy link
Member

shaneutt commented Jul 24, 2023

I have a strong inclination towards a fail-fast mentality here, don't allow any "partially invalid states". We should discuss this one at the next sync. Also, @robscott and I want to consider this a release blocker for GA for now, as it would be quite ideal to resolve this prior to v1.

@shaneutt shaneutt added the release-blocker MUST be completed to complete the milestone label Jul 24, 2023
@kubernetes-sigs kubernetes-sigs deleted a comment from k8s-ci-robot Jul 24, 2023
@rainest
Copy link
Contributor

rainest commented Jul 24, 2023

Copying my comments from meeting chat with some edits here:

Insofar as we need to choose some level of granularity to start discarding invalid configuration, object-level (discard every rule in the HTTPRoute if one is invalid) does at least feel like the most natural option in Kubernetes. Discrete objects applies universally across all kinds of resources, so it's easier to form a consistent mental model around that than having to think about the internals of each individual kind of object.

What @robscott said in today's weekly meeting around "don't collocate all your rules in one HTTPRoute" is reasonable guidance to make that less painful, though it has some other concerns around how you arrange configuration that has like needs across configuration that must across all rules in the object (e.g. Kong's plugins apply to every rule in Ingress because they're set via annotation). Templated config could help manage that buuuuuut nobody really likes templating their config: it's powerful, but not that fun to manage.

Object-level is also consistent with the behavior of a validating admission webhook, and ideally most rejections would happen there, since rejecting configuration has a lower blast radius (none, really) than accepting it and maybe deleting nearby configuration depending on the exact implementation behavior.

@pleshakov
Copy link
Contributor Author

if the name field is added to a routing rule (#995 ), perhaps a way to report partial validity could be extend status to support per rule validation?

@sunjayBhatia
Copy link
Member

Going to unassign myself from this one, the addition of a PartiallyInvalid condition was nixed and part of that was included in #2150

@sunjayBhatia sunjayBhatia removed their assignment Aug 14, 2023
@shaneutt shaneutt moved this from In Progress to Next in Gateway API: The Road to GA Aug 14, 2023
@robscott
Copy link
Member

Talked about this with @shaneutt and @youngnick a bit today as we were working through the GA project board. Here's a summary:

We're uncomfortable with leaning too much into supporting large Routes. While concepts like per-Rule status conditions could be helpful long term, we're not convinced that they're necessary quite yet and a bit worried that they would further encourage the usage of large Route resources. With that said, even relatively small Routes with 2 separate Rules could run into this state, so we need at least some rudimentary way to communicate this.

We also have to recognize that stating that controllers can support partially invalid Routes has some risks. For example, if a rule with a /foo prefix match is present but invalid, all the traffic that would normally be routed to that rule may get routed to other rules like a / prefix match and overload those endpoints.

There are also legitimate arguments in favor of other approaches:

1. Fail fast
In this mode we fail to program the entire Route if any portion of it is invalid. This has the advantage of providing predictable behavior, clear status, and is most likely to surface problems as soon as they happen. Of course the large blast radius that comes along with this approach is unfortunate, and that's particularly true in cases where it takes a significant time to program the underlying dataplane and recover from a problem. In status, this state would be represented by marking the Route as Accepted=False with corresponding reasons and/or additional conditions to represent the problem state.

2. Revert to last known good state
Some implementations may find it safer to simply revert to the last known good state. This could make sense when the controller is stateful and/or it takes a long to program updates to the underlying dataplane. In status, this would be represented with Accepted=True with the generation of the last known good state and corresponding reasons and/or additional conditions to represent the problem state set to the current generation of the resource.

With all of that said, there are likely many cases where it makes sense for a controller to support a partially valid Route (where a subset of Route rules are invalid). We think the most straightforward way to represent that state would be with a new InvalidRules condition. This condition would reuse the same reasons that are already paired with other conditions, such as RefNotPermitted, InvalidKind, UnsupportedValue, etc. If any part of the Route had been accepted, the Accepted condition would be set to True. That means it would be possible to have Accepted=True and InvalidRules=True.

Given the discussion above, I don't think we need/want to be prescriptive about whether implementations should support partially valid Routes, since each implementation will have different considerations here. Instead, it may be sufficient just to provide a standard way for implementations to represent this state when it occurs.

@youngnick
Copy link
Contributor

I agree here, except for one part:

Given the discussion above, I don't think we need/want to be prescriptive about whether implementations should support partially valid Routes, since each implementation will have different considerations here. Instead, it may be sufficient just to provide a standard way for implementations to represent this state when it occurs.

If we allow anyone to support partially valid Routes, then everyone has to support them, otherwise we're going to end up with very subtle and difficult to test implementation differences. If I move a set of resources from a GatewayClass that supports partial validity to one that doesn't, what happens?

To explain this better, let's walk through the case that we do allow implementations to differ on whether they support partial validity or not, that makes partial validity an Extended feature, that we will need to write conformance tests for. Personally, the thought of having to write a suite of conformance tests that would test both cases makes me want to curl up in a ball and hide.

But it is a thing we could do - as long as we are cognizant of the costs.

Mandating one option at least means that we only have one set of conformance tests to write, rather than two. But we'll need conformance tests for whatever we choose anyway, sigh.

@shaneutt
Copy link
Member

I still think we could come up with some (possibly a bit contrived) examples of when allowing partial would go bad, particularly if the different rules relate to each other. One scenario floating around in my head, is what if the HTTPRoute creator has a fallback rule of some kind, but all the rules that are supposed to be in front of that end up being invalid and we only turn on the fallback rule, will there be situations where that would DOS the fallback backends?

One thing seems entirely certain to me: HTTPRoutes with large number of rules are destined to be at least a bit of a mess. I think @youngnick is probably right that we can't have ambiguity here as that will just increase the mess. We need one prescriptive way these are handled, and given the constraints upon us I think we have to compromise down to a good solution because I don't think a perfect one will be attainable. The suggestion that we simply provide a condition for partial validity with human readable messages about the problem doesn't sound great to me because that basically becomes a part of our API we become beholden to, but again I think we're going to have to settle for a good solution and something along this vein might be that.

@shaneutt shaneutt added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Aug 22, 2023
@robscott
Copy link
Member

/assign

@shaneutt shaneutt moved this from Next to In Progress in Gateway API: The Road to GA Sep 11, 2023
@mikemorris
Copy link
Contributor

If we allow anyone to support partially valid Routes, then everyone has to support them, otherwise we're going to end up with very subtle and difficult to test implementation differences. If I move a set of resources from a GatewayClass that supports partial validity to one that doesn't, what happens?

I think this may not be that problematic in practice - the main use case I see for allowing partially invalid states is reducing the blast radius of applying bad config in stateless controllers. Checking to ensure no resources are in a partially invalid state before migrating feels reasonable, as does completely rejecting partially invalid routes when migrating to a new controller if partial validity is not supported, given that feels like an action unlikely to be made on a whim in a production environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-blocker MUST be completed to complete the milestone
Projects
No open projects
9 participants