Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GEP-1364 proposal #1383
Add GEP-1364 proposal #1383
Changes from 5 commits
03ed313
83b5cb9
c6cffd4
3d6385f
6c4888a
717e228
3b01d97
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually hadn't realized the uniqueness on
type
constraint - this should probably be documented more clearly within the Gateway API spec, as it has some bearing on how "multiple error" cases may be expressed, and explains the structure ofRouteParentStatus
, which will be increasingly relevant if we start expecting resources likeHTTPRoute
to bind to multiple parentRefs (for GAMMA and N/S concurrent usage or route delegation).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd had some conversations about making this required for conformant Gateway implementations. Probably out of scope for this GEP, just wanted to make sure it was intentionally excluded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would
Attaching
ever be valid as a condition? Maybe just verb or adjective is sufficient here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sort of usage is specifically called out as not desirable in the API conventions, and I agree. Conditions are supposed to be level-triggered, not edge-triggered, so having a condition that says that something is happening is a bit weird. (Should
Attaching
transition tofalse
once the attaching process is complete?)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For posterity, can we cite some sources for that assertion? It would probably actually be good as a part of this status effort to document somewhere in the code that we're avoiding this due to conventions, and then point to those conventions, as this will help future contributors to more quickly avoid pursuing such things (or give them a path to who they need to convince in the greater kubernetes community).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost a direct quote from the API conventions as it is, should I put a link in here to the exact line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exact quote is as follows, I can just inline if it's better, but I was trying to avoid just pasting in the entire section:
Condition type names should describe the current observed state of the resource, rather than describing the current state transitions. This typically means that the name should be an adjective ("Ready", "OutOfDisk") or a past-tense verb ("Succeeded", "Failed") rather than a present-tense verb ("Deploying"). Intermediate states may be indicated by setting the status of the condition to Unknown.
For state transitions which take a long period of time (e.g. more than 1 minute), it is reasonable to treat the transition itself as an observed state. In these cases, the Condition (such as "Resizing") itself should not be transient, and should instead be signalled using the True/False/Unknown pattern. This allows other observers to determine the last update from the controller, whether successful or failed. In cases where the state transition is unable to complete and continued reconciliation is not feasible, the Reason and Message should be used to indicate that the transition failed.
I don't believe that we generally have any process that we want to handle in the way called out in the exception bullet point. Happy to be guided here on what you'd like to see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a direct link to the language would be nice, but not a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conditions which appear and disappear from a resource can make status reporting UIs harder, so I'd recommend keeping the Condition set consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should have a set of permanent and reliable conditions that are always present, but I personally don't think it makes sense for error conditions to always be present, especially since that tends to lead to confusing double negative states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is reasonable for the controller to do this - as an indication the resource is in use by the controller.
It becomes problematic if a resource may be processed by multiple controllers - for example the discussion
of HttpRoute applying to both mesh and Gateway can only work if the type has a prefix (MeshReady).
In general it would be good to indicate how to handle resources where multiple controllers operate on. While
it's good to know which controller mess with the resource - maybe the controller should skip this step if the
resource is not ignored ( a mesh controller will ignore any route that don't target both mesh and gateway as example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most cases, the Conditions slice is in a separate struct that's part of a slice that's namespaced by controller name - so that controllers should be able to update only their own Conditions. I'd expect that mesh controllers should have a distinct controller name string, so this method should continue working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thockin has said that getting multiple controllers to cooperate on node health status was very subtle, so I'd be careful and maybe distill his experience if you need multiple controllers writing one status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our practice of including
controllerName
for each distinct set of conditions should largely allow multiple controllers to cooperate on status for the same resource, but open to other potential improvements.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do
controllerName
conditions roll up toAccepted
?Also, controllers probably need to be sure to use PATCH on the
status
sub-resource or do conditional PUTs and avoid extra writes if the conditions haven't changed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some resources, all conditions are namespaced by
controllerName
. I agree we need to ensure that the rollup behavior is specified. I'm going to add a section about partial validity that should help.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For
HTTPRoute
as a specific example, the structure ofRouteParentStatus
appears well-suited to handle a separate list of conditions for eachparentRef
or controller, so a mesh orService
relationship for a route would have a separate set of conditions andAccepted
status than those used for relaying the status of a concurrent parent relationship to aGateway
, with no need to collaborate on writing to a single status.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly about this, but any reason not just to use
Accepted
everywhere instead ofAttached
in most places andAccepted
in this one place?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same thought and am +1 to this,
Accepted
seems a little bit easier to understand for users IMO and can be consistently used across all resources.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will make UI display somewhat harder (green dots / checkmarks / exclamation points), but I realize it may align with human reading a bit more easily.
I don't have a strong opinion here, just want to highlight the cost elsewhere in the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is
Accepted
distinct fromAttached
here? WouldAccepted
ever be true ifResolvedRefs
wasfalse
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't updated, but with the change to
Accepted
everywhere, this is now correct, thanks.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My own personal preference would be for conditions with a positive polarity to represent the normal healthy state and always be present, and for conditions with a negative polarity to represent very specific error states and to only be present when true. I think we should do everything we can to avoid the presence of double negatives like "Detached == False", but recognize that there are some error states that are best represented with a negative condition. I just don't think those negative conditions should be present when false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. As I mentioned above -- as long we document and check for it in conformance, this is the only guarantee that the conditions are meaningful programmatically, which is basically what we are looking for when we build for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to have a mixed-polarity set of conditions, then I think we need to define:
Attached
,Programmed
,Ready
, andValid
if we do it)Attached
,Programmed
, andValid
if we do it).Ready
)Ready
).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is necessarily true -- it just needs to be explicitly documented for and tested. I think the previous attempt by K8s to force one polarity just simply didn't work out because it didn't line up with people's intuition. We should really try to keep things intuitive as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that anything consuming the Conditions will need to have a hard-coded list of the positive polarity conditions. Not a deal breaker, but it's another obstacle in the way of building a generic condition-parsing library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the API supports partially-valid resources. I believe this is proposing a partially-valid resource would be
Attached=false
, but the config is actually 'active' (or, I would describe it, "attached").To me this feels a bit odd. The old API sort of had this issue as well.
Thinking purely from a pedantic-correctness POV (which may not be ideal - sometimes simpler is better), I would expect there to be two conditions (I use bad names intentionally to avoid confusion): "ThisResourceWasApplied" - any part of the rule had some effect on the system, and "ThisRuleIsEntirelyValid" - no errors found in the rule at all.
Its possible my concern would be addressed with a different word than "Attached", rather than more conditions, as well
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is somewhat addressed below.. but I am not sure its accurate? It seems we are defining a very small set of "failures that do not become unattached", but there are a lot more partially-accepted resources I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guideline I've been basing the "does something become unattached?" on is "Does this produce at least some configuration in the underlying data plane?". So I guess it sounds like I need to be more explicit with that guideline, but that aside from that, we're largely in agreement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to have "Is this attached" and "is this 100% valid" as two separate things.
Part of my confusion, I think, is I am not actually sure which fields result in "not attached". It seems most issues are partially accepted. Which is fine -
Attached
just becomes basically a pretty high level acknowledgement that the controller is handling the resourceThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think I agree with @howardjohn's point here: "does this resource produce config" is distinct from "does this resource produce valid config". I'm sure some implementations don't allow invalid (e.g. duplicate route names or something) to be sent to the data plane, but it doesn't feel like that's a requirement for the Gateway API at this juncture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I wrote this pooly - by "Does this resource produce config" I meant to read "Is this resource valid enough to produce some config when combined with the rest of the resource set".
The thing I'm thinking of particularly here is that for HTTPRoute, invalid references still produce data path config because the match will respond with 500s. Whereas for TCPRoute etc, if you don't have a service to route to, there's no way you can build a forwarding path - there's nothing to forward to! So that won't produce any data path config at all, and so, in the current design, would cause the resource to fail to be attached.
I don't think anything should be allowed to produce invalid config in the underlying data plane. If so, that should mean that the resource becomes unattached.
I guess this means that we need some more clarity about what attached means. I've been assuming that if we consider a resource to be "attached" to another, that:
And so, that having a "valid" condition is not really useful because if something is not valid, it won't be attached either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for something like an HTTPRoute to define two paths, one of which is "valid enough" config, while the other is garbage?
If so, are there expectations or requirements from a conformance PoV ("always apply all or none") that we need to specify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most cases of an invalid configuration in one route rule (with others being completely valid within the same HTTPRoute) seem to be misconfigurable only in the context of a reference to another resource and fall into the
ResolvedRefs
bucket (which seems stated below and possibly agreed upon thatAttached: true
be set withResolvedRefs: false
and return HTTP code 500 for the invalid rule).One such example not covered is if you were to use a
HTTPRequestRedirectFilter
andHTTPURLRewrite
in the list of filters for an individualHTTPRouteRule
(ref). At the moment it doesn't appear we have prescribed any status code or other error handling for this error. This seems to fall into the bucket of things that cannot produce data plane configuration.Going by the statements below, if this rule were the only one present
Attached: false
should be set, since it cannot result in any configuration (with another more specific error condition). If present with other valid rules,Attached: true
should be set (with another more specific error condition again). I can definitely see the argument for settingAttached: false
however in the latter case, since stated above, theAttached
condition meansThe resource is syntatically and semantically valid, and internally consistent
.It seems that the
Valid: true
condition mentioned above ends up being somewhat equivalent to having for example for HTTPRouteResolvedRefs: true
and no error conditions (and vice versa,Valid: false
equivalent toResolvedRefs: false
or some more specific error condition). Maybe this framing helps a little with determining possible merits of having another condition?Just doing a quick pass, I don't think I saw any existing conformance tests that cover this kind of thing so I agree that would be good to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoiding "always apply all or none" is the reason this is tricky. There are mixed expectations about what happens if some config is present but bad, and I'm attempting to thread the needle here and make this as user-friendly as possible while not becoming really difficult to implement or understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a section on partial validity to hopefully make this clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On some impls, provisioning takes about a minute. That may mean that some errors that are immediately known are masked. Or I suppose they aren't masked, but we need to wait the entire minute to know if its valid. The message field could say "all fields valid, waiting for deployment" perhaps, but that isn't in the machine parse-able section.
That may not warrant change, just a thought
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worded this carefully, so as not to imply that the provisioning needs to have completed, just that it can be started. The completion of infra provisioning should be handled by
Programmed
orReady
, or both.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This starts painting an observability problem IMO. Even if conditions are only meant to be human-readable, how does a human look at an HTTPRoute resource and know "oops I have a typo in my service backend"? Yes, 500s indicate something's wrong, but I would expect to find a condition pointing to the specific issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should allow room for error conditions in addition to attached in this case so the intformation can be conveyed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I guess I should make this clearer, that in this case, it's absolutely expected that there be some additional, negative-polarity error conditions that tell you that the backendRef doesn't exist. (This is a distinct case from the backend existing, but having zero endpoints, which would not generate a "backendRef doesn't exist" error, but would end up with the same behavior).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might also get a
BackendRefEmpty
condition...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. What would empty resources look like? If they're not unattached what are they?