-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xds: NACK causes resource to be considered non-existing #8657
Comments
If the XdsClient never received a valid response (within the timeout period) after the initial DiscoveryRequest for a resource then it treats it as "resource does not exist". So that part is probably working as designed. What you are saying is that in this case the control plane sent an LDS response which was rejected/NACKed because of validation issues then we cannot treat it as "resource does not exist". When the timer fires we should then differentiate between the cases of "no response ever arrived" (which means "resource does not exist") and "invalid response arrived" (which means "resource does exist" but there's an error) and accordingly deliver "resource not found" or "timeout" to the watcher.
Correct. This should be fixed in XdsServerWrapper to not set the exception in the
|
Ah i remember this particular issue, we somehow treated
Note that we have the same issue in CDS. |
I'm familiar with that timeout, but it should be for when we don't receive a response for the resource. If it is an invalid resource, then we should just notify the watcher of the error, not the resource is missing. After we see the bad resource, why would a timer even make sense? There's no expectation that the control plane would fix the resource before the timer expires. |
Yes, makes sense. Initially I thought not failing |
A36 didn't say
|
I say nothing about a reasonable amount of time. The design of |
I'm dealing with a server returning broken configuration. There's three breakages.
Log:
The first is that the resource is considered not to exist. That is not right. The watcher should have been delivered an error and the resource wait timer cancelled. I'll note that in this case we had previously gotten a lot of
UNAVAILABLE: Credentials failed to obtain metadata
failures, and this is the first response to arrive (not included because that log was painful to copy).Even assuming that the resource is properly determined to not exist, it shouldn't cause
start()
to fail. From A36 xDS for Servers:And then there's a bug in XdsTestServer if
start()
throws, sinceserver
was never assigned.CC @sanjaypujare
The text was updated successfully, but these errors were encountered: