-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental XDS proposal. #3470
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. One minor comment and some copy editing. (I'm happy to take a closer copy editing pass later, if you'd like.)
api/envoy/api/v2/discovery.proto
Outdated
// This map must be populated when the IncrementalDiscoveryRequest is the | ||
// first in a stream. The keys are know resources names to the CDS client and | ||
// the value is the correcponding resource level version info. | ||
resource_versions map<string,string> = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typos: known and corresponding. CDS should be XDS, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the type/field name are also swapped here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re "I think the type/field name are also swapped here.": I'm not sure what you mean here. My intention is to have keys be resource names and values be their version.
api/envoy/api/v2/discovery.proto
Outdated
// | ||
// In Incremental XDS the nonce field is required and used to pair | ||
// IncrementalDiscoveryResponse to a IncrementalDiscoveryRequest ACK or NACK. | ||
// A message level system_version_info is present for debugging purposes only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intention here to try to keep implementations from relying on Envoy setting this value in requests?
I think this documentation should just describe the client behavior (e.g., the client returns the value it saw in the last ACKed response) and leave off the "debugging purposes only" statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment below, I don't think it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the comment to match having it in responses only.
Please merge master to make sure you pick up the formatter fixes. |
api/envoy/api/v2/discovery.proto
Outdated
// The system_version_info provided in the request messages will be the | ||
// system_version_info received with the most recent successfully processed | ||
// response or empty on the first request. | ||
string system_version_info = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to supply this at all? I realize this is what we do in v1, but what is the function in v2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is still useful even in the incremental case to cover the add/remove case. What version are we at if we have no resources? Obviously a deployment can leave this empty, but I would suggest we don't remove the possibility of populating it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, this is the request. Yes probably don't need that. We should leave in the response though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, kept it in the response.
api/envoy/api/v2/discovery.proto
Outdated
// | ||
// In Incremental XDS the nonce field is required and used to pair | ||
// IncrementalDiscoveryResponse to a IncrementalDiscoveryRequest ACK or NACK. | ||
// A message level system_version_info is present for debugging purposes only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment below, I don't think it is needed.
@@ -93,3 +93,92 @@ message DiscoveryResponse { | |||
// required for non-stream based xDS implementations. | |||
string nonce = 5; | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we get agreement on the basic protos and sequencing, we should add sequence diagrams and a section to https://github.com/envoyproxy/envoy/blob/master/api/XDS_PROTOCOL.md as part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
api/envoy/api/v2/discovery.proto
Outdated
@@ -93,3 +93,92 @@ message DiscoveryResponse { | |||
// required for non-stream based xDS implementations. | |||
string nonce = 5; | |||
} | |||
|
|||
// IncrementalDiscoveryRequest and IncrementalDiscoveryResponse are used in a | |||
// new GRPC endpoint for Incremental XDS. The feature is not supported for REST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/GRPC/gRPC/, s/XDS/xDS/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
api/envoy/api/v2/discovery.proto
Outdated
// With Incremental XDS, the IncrementalDiscoveryResponses do not need to | ||
// include a full snapshot of the tracked resources. Instead | ||
// IncrementalDiscoveryResponses are a diff to the state of a XDS client. | ||
// In incremental XDS there are per resource versions which allows to track |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: be consistent with whether it is "incremental xDS" or "Incremental xDS".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
api/envoy/api/v2/discovery.proto
Outdated
string type_url = 3; | ||
|
||
// A list of Resource names to add to the list of tracked resources. | ||
// The empty list is a subscription to all resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work in an incremental sense? If an incremental request has the empty list does that then become "subscribe all"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed that comment and made the field resource_name_subscribe.
api/envoy/api/v2/discovery.proto
Outdated
// This map must be populated when the IncrementalDiscoveryRequest is the | ||
// first in a stream. The keys are know resources names to the CDS client and | ||
// the value is the correcponding resource level version info. | ||
resource_versions map<string,string> = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the type/field name are also swapped here.
api/envoy/api/v2/discovery.proto
Outdated
} | ||
|
||
message IncrementalDiscoveryResponse { | ||
// The version of the response data used for debugging only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better name than "system version"? It's more like "upstream config pipeline provenance version", which is a bit of a mouthful..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to suggestions, if we can move away form the version keyword that might help avoid confusion. Something like short_description
maybe?
@nt can you please merge master, fix DCO, and look at CI issues? |
api/envoy/api/v2/discovery.proto
Outdated
// ACKed Response. | ||
// 3. Spontaneous IncrementalDiscoveryRequest from the client. | ||
// This can be done to dynamically add or remove elements from the tracked | ||
// resource_names set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe the nonce value in this case here?Should it match the last nonce we received from the server? If not, how do we avoid it being conflated with ACK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the PR. The nonce should only be included when the Requests is an ACK / NACK. otherwise it is omitted.
api/envoy/api/v2/discovery.proto
Outdated
// to a previous IncrementalDiscoveryResponse, the response_nonce must be the | ||
// nonce in the IncrementalDiscoveryResponse. | ||
string response_nonce = 6; | ||
// This is populated when the previous :ref:`DiscoveryResponse <envoy_api_msg_DiscoveryResponse>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: needs blank line separator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
api/envoy/api/v2/discovery.proto
Outdated
// The response resources. These are typed resources that match the type url | ||
// in the IncrementalDiscoveryRequest. | ||
repeated Resource resources = 2 [(gogoproto.nullable) = false]; | ||
// Resources names of resources that have be deleted and to be removed from the xDS Client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: needs blank line separator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Signed-off-by: Nicolas Thiebaud <[email protected]>
Signed-off-by: Nicolas Thiebaud <[email protected]>
re: @mattklein123 I rebased on master and fixed DCO. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great. Just some docs nits and more comment asks. @htuch can you also take a look?
api/envoy/api/v2/discovery.proto
Outdated
// 2. As a ACK or NACK response to a previous IncrementalDiscoveryResponse. | ||
// In this case the response_nonce is set to the nonce value in the Response. | ||
// ACK or NACK is determined by the absence or presence of error_detail. | ||
// In the case of a ACK the system_version_info must match the value in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is out of date, right? We no longer send system_version_info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, removed.
api/envoy/api/v2/discovery.proto
Outdated
repeated string resource_names_unsubscribe = 4; | ||
|
||
// This map must be populated when the IncrementalDiscoveryRequest is the | ||
// first in a stream. The keys are know resources names to the xDS client and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo "keys are know" / a bit confusing in general. Clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote this, let me know if it make better sense now.
api/envoy/api/v2/discovery.proto
Outdated
// required for ADS. | ||
string type_url = 2; | ||
|
||
// A list of Resource names to add to the list of tracked resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clarify here when this should be empty? For example, on an initial request for CDS when Envoy wants to be told about all relevant clusters by the management server and isn't doing any Envoy side dynamic loading stuff? I guess in general, given that we send this message in multiple different situations, it would be useful for each field to have more detail on when it is and is not set. I think it would make the documentation a lot better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a paragraph to clarify the use of these two fields. Let me know if it makes sense to you.
Signed-off-by: Nicolas Thiebaud <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Two more comments for discussion but I think in general LGTM.
// All resource names in the resource_names_subscribe list are added to the | ||
// set of tracked resources and all resource names in the resource_names_unsubscribe | ||
// list are removed from the set of tracked resources. | ||
// Unlike in non incremental xDS, an empty resource_names_subscribe or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should still describe here this is effectively a hint, right? Meaning, the management server is still allowed to send additional resources? (Though it must send the requested ones) I think it's worth clarifying this as it's similar to non-incremental in this regard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
api/envoy/api/v2/discovery.proto
Outdated
// first in a stream. The keys are the resources names of the xDS resources | ||
// known to the xDS client. The values in the map are the associated resource | ||
// level version info. | ||
map<string, string> resource_versions = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about calling this initial_local_resource_versions
just so the intent is more clear? I don't feel strongly about this. Open to other names too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is a great idea. Renamed it initial_resource_versions. I think the "local" part should be obvious.
Signed-off-by: Nicolas Thiebaud <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work! Will defer to @htuch for further review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but would like to get the protocol spelled out in detail in XDS_PROTOCOL.md.
// ACK or NACK is determined by the absence or presence of error_detail. | ||
// 3. Spontaneous IncrementalDiscoveryRequest from the client. | ||
// This can be done to dynamically add or remove elements from the tracked | ||
// resource_names set. In this case response_nonce must be omitted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to convince and explain why this doesn't have the racy behavior that originally motivated the nonce. I.e. we need sequence diagrams considering races. Can we make the changes to https://github.com/envoyproxy/envoy/blob/master/api/XDS_PROTOCOL.md as part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I think that even if a nonce-less incremental subscribe races with some other request/response, it's largely harmless; the set of resources being tracked by the management server should eventually converge on what the client is interested in tracking.
Consider this example (M is management server, E is Envoy, X/Y/Z are resources):
- M->E (CDS): {X}
- E-> M (CDS): subscribe Y
- E->M (CDS): unsubscribe Y
- M->E (CDS): {X, Y, Z}
What if steps 3 and 4 race? I don't think there is a problem here, since we eventually converge on the right thing, but there will be periods of inconsistency perhaps.
If someone can persuade me that the above is valid, I think we're good to ship, but want to just raise this.
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Signed-off-by: Nicolas Thiebaud <[email protected]>
Signed-off-by: Nicolas Thiebaud <[email protected]>
api/XDS_PROTOCOL.md
Outdated
all 100k clusters when a single cluster is modified, the management server | ||
only needs to deliver the single cluster that changed. | ||
|
||
An xDS Incremental session is always in the context of a gRPC bidirectional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/Incremental/incremental/ here, incremental is an adjective not a proper noun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
api/XDS_PROTOCOL.md
Outdated
connected to it. There is no REST version of Incremental xDS. | ||
|
||
In Incremental xDS the nonce field is required and used to pair | ||
IncrementalDiscoveryResponse to a IncrementalDiscoveryRequest ACK or NACK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add docs links (as done with DiscoveryRequest/Response) above? Also, use quote style for proto message names as done for these messages above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
api/XDS_PROTOCOL.md
Outdated
|
||
IncrementalDiscoveryRequest can be sent in 3 situations: | ||
1. Initial message in a xDS bidirectional gRPC stream. | ||
2. As a ACK or NACK response to a previous IncrementalDiscoveryResponse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/a ACK/an ACK/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
api/XDS_PROTOCOL.md
Outdated
IncrementalDiscoveryRequest can be sent in 3 situations: | ||
1. Initial message in a xDS bidirectional gRPC stream. | ||
2. As a ACK or NACK response to a previous IncrementalDiscoveryResponse. | ||
In this case the response_nonce is set to the nonce value in the Response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: use quote style for response_nonce
and error_detail
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
api/XDS_PROTOCOL.md
Outdated
This can be done to dynamically add or remove elements from the tracked | ||
resource_names set. In this case response_nonce must be omitted. | ||
|
||
In this first example the client connects as receive a first update that it ACKs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/as receive/and receives/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
api/XDS_PROTOCOL.md
Outdated
|
||
In this first example the client connects as receive a first update that it ACKs. | ||
The second update fails and the client NACKs the update. Later the xDS client | ||
spontaneously requests the "wc" resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the wc
resource_list_subscribe
arrow is the wrong way around in the diagram.
// ACK or NACK is determined by the absence or presence of error_detail. | ||
// 3. Spontaneous IncrementalDiscoveryRequest from the client. | ||
// This can be done to dynamically add or remove elements from the tracked | ||
// resource_names set. In this case response_nonce must be omitted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I think that even if a nonce-less incremental subscribe races with some other request/response, it's largely harmless; the set of resources being tracked by the management server should eventually converge on what the client is interested in tracking.
Consider this example (M is management server, E is Envoy, X/Y/Z are resources):
- M->E (CDS): {X}
- E-> M (CDS): subscribe Y
- E->M (CDS): unsubscribe Y
- M->E (CDS): {X, Y, Z}
What if steps 3 and 4 race? I don't think there is a problem here, since we eventually converge on the right thing, but there will be periods of inconsistency perhaps.
If someone can persuade me that the above is valid, I think we're good to ship, but want to just raise this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM modulo remaining comments.
Signed-off-by: Nicolas Thiebaud <[email protected]>
For some reason I can not directly reply to your race comment about nonce-less spontaneous subscribes/unsubscribes:
I do not think this is a problem as these are "hints" to the XDS server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this is great, modulo two nits. @mattklein123 any further thoughts before merging?
api/XDS_PROTOCOL.md
Outdated
[`IncrementalDiscoveryResponse`]](https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/discovery.proto#discoveryrequest) | ||
to a [`IncrementalDiscoveryRequest`](https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/discovery.proto#discoveryrequest) | ||
ACK or NACK. | ||
Optionaly, a response message level system_version_info is present for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/Optionaly/Optionally/
api/XDS_PROTOCOL.md
Outdated
connected to it. There is no REST version of Incremental xDS. | ||
|
||
In incremental xDS the nonce field is required and used to pair a | ||
[`IncrementalDiscoveryResponse`]](https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/discovery.proto#discoveryrequest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional bracket here - you can preview MD generation by clicking the View button on GH (pro tip).
Signed-off-by: Nicolas Thiebaud <[email protected]>
fixed both nits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one question. Thank you!
api/XDS_PROTOCOL.md
Outdated
|
||
![Incremental reconnect example](diagrams/incremental-reconnect.svg) | ||
|
||
The incremental xDS method may be added to CDS and RDS in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't this just be supported out of the box? It's not clear to me how any of this is ADS specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought adding it on a need-for basis made more sense to keep complexity down. We can add them now if you prefer. Let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO I would just add it now. Unless I'm misunderstanding the implementation, I think it will basically "just work." @htuch thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it will Just Work (tm).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
api/envoy/api/v2/discovery.proto
Outdated
// | ||
// A list of Resource names to add to the list of tracked resources. | ||
repeated string resource_names_subscribe = 3; | ||
// A list of Resource names to remove from the list of tracked resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: newline before this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Signed-off-by: Nicolas Thiebaud <[email protected]>
Signed-off-by: Nicolas Thiebaud <[email protected]>
@nt can you fix_format? Ready to merge when that's done. |
Signed-off-by: Nicolas Thiebaud <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks!
This PR includes proposed additions to the XDS discovery protos as discussed in this design doc.
The implementation is not included and will be done in a follow up PRs.
The tacking bug id is #3229