Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental XDS proposal. #3470

Merged
merged 11 commits into from
Jul 2, 2018
Merged

Incremental XDS proposal. #3470

merged 11 commits into from
Jul 2, 2018

Conversation

nt
Copy link
Member

@nt nt commented May 23, 2018

This PR includes proposed additions to the XDS discovery protos as discussed in this design doc.

The implementation is not included and will be done in a follow up PRs.

The tacking bug id is #3229

@htuch htuch self-assigned this May 23, 2018
Copy link
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One minor comment and some copy editing. (I'm happy to take a closer copy editing pass later, if you'd like.)

// This map must be populated when the IncrementalDiscoveryRequest is the
// first in a stream. The keys are know resources names to the CDS client and
// the value is the correcponding resource level version info.
resource_versions map<string,string> = 6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typos: known and corresponding. CDS should be XDS, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the type/field name are also swapped here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re "I think the type/field name are also swapped here.": I'm not sure what you mean here. My intention is to have keys be resource names and values be their version.

//
// In Incremental XDS the nonce field is required and used to pair
// IncrementalDiscoveryResponse to a IncrementalDiscoveryRequest ACK or NACK.
// A message level system_version_info is present for debugging purposes only.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention here to try to keep implementations from relying on Envoy setting this value in requests?

I think this documentation should just describe the client behavior (e.g., the client returns the value it saw in the last ACKed response) and leave off the "debugging purposes only" statements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below, I don't think it is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment to match having it in responses only.

@mattklein123
Copy link
Member

Please merge master to make sure you pick up the formatter fixes.

// The system_version_info provided in the request messages will be the
// system_version_info received with the most recent successfully processed
// response or empty on the first request.
string system_version_info = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to supply this at all? I realize this is what we do in v1, but what is the function in v2?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is still useful even in the incremental case to cover the add/remove case. What version are we at if we have no resources? Obviously a deployment can leave this empty, but I would suggest we don't remove the possibility of populating it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, this is the request. Yes probably don't need that. We should leave in the response though...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, kept it in the response.

//
// In Incremental XDS the nonce field is required and used to pair
// IncrementalDiscoveryResponse to a IncrementalDiscoveryRequest ACK or NACK.
// A message level system_version_info is present for debugging purposes only.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below, I don't think it is needed.

@@ -93,3 +93,92 @@ message DiscoveryResponse {
// required for non-stream based xDS implementations.
string nonce = 5;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we get agreement on the basic protos and sequencing, we should add sequence diagrams and a section to https://github.com/envoyproxy/envoy/blob/master/api/XDS_PROTOCOL.md as part of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

@@ -93,3 +93,92 @@ message DiscoveryResponse {
// required for non-stream based xDS implementations.
string nonce = 5;
}

// IncrementalDiscoveryRequest and IncrementalDiscoveryResponse are used in a
// new GRPC endpoint for Incremental XDS. The feature is not supported for REST
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/GRPC/gRPC/, s/XDS/xDS/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// With Incremental XDS, the IncrementalDiscoveryResponses do not need to
// include a full snapshot of the tracked resources. Instead
// IncrementalDiscoveryResponses are a diff to the state of a XDS client.
// In incremental XDS there are per resource versions which allows to track
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: be consistent with whether it is "incremental xDS" or "Incremental xDS".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

string type_url = 3;

// A list of Resource names to add to the list of tracked resources.
// The empty list is a subscription to all resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work in an incremental sense? If an incremental request has the empty list does that then become "subscribe all"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed that comment and made the field resource_name_subscribe.

// This map must be populated when the IncrementalDiscoveryRequest is the
// first in a stream. The keys are know resources names to the CDS client and
// the value is the correcponding resource level version info.
resource_versions map<string,string> = 6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the type/field name are also swapped here.

}

message IncrementalDiscoveryResponse {
// The version of the response data used for debugging only.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better name than "system version"? It's more like "upstream config pipeline provenance version", which is a bit of a mouthful..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to suggestions, if we can move away form the version keyword that might help avoid confusion. Something like short_description maybe?

@mattklein123
Copy link
Member

@nt can you please merge master, fix DCO, and look at CI issues?

// ACKed Response.
// 3. Spontaneous IncrementalDiscoveryRequest from the client.
// This can be done to dynamically add or remove elements from the tracked
// resource_names set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe the nonce value in this case here?Should it match the last nonce we received from the server? If not, how do we avoid it being conflated with ACK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR. The nonce should only be included when the Requests is an ACK / NACK. otherwise it is omitted.

// to a previous IncrementalDiscoveryResponse, the response_nonce must be the
// nonce in the IncrementalDiscoveryResponse.
string response_nonce = 6;
// This is populated when the previous :ref:`DiscoveryResponse <envoy_api_msg_DiscoveryResponse>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: needs blank line separator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// The response resources. These are typed resources that match the type url
// in the IncrementalDiscoveryRequest.
repeated Resource resources = 2 [(gogoproto.nullable) = false];
// Resources names of resources that have be deleted and to be removed from the xDS Client.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: needs blank line separator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Nicolas Thiebaud added 2 commits June 5, 2018 13:24
Signed-off-by: Nicolas Thiebaud <[email protected]>
Signed-off-by: Nicolas Thiebaud <[email protected]>
@nt
Copy link
Member Author

nt commented Jun 5, 2018

re: @mattklein123 I rebased on master and fixed DCO. PTAL.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great. Just some docs nits and more comment asks. @htuch can you also take a look?

// 2. As a ACK or NACK response to a previous IncrementalDiscoveryResponse.
// In this case the response_nonce is set to the nonce value in the Response.
// ACK or NACK is determined by the absence or presence of error_detail.
// In the case of a ACK the system_version_info must match the value in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is out of date, right? We no longer send system_version_info?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, removed.

repeated string resource_names_unsubscribe = 4;

// This map must be populated when the IncrementalDiscoveryRequest is the
// first in a stream. The keys are know resources names to the xDS client and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo "keys are know" / a bit confusing in general. Clarify?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this, let me know if it make better sense now.

// required for ADS.
string type_url = 2;

// A list of Resource names to add to the list of tracked resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify here when this should be empty? For example, on an initial request for CDS when Envoy wants to be told about all relevant clusters by the management server and isn't doing any Envoy side dynamic loading stuff? I guess in general, given that we send this message in multiple different situations, it would be useful for each field to have more detail on when it is and is not set. I think it would make the documentation a lot better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a paragraph to clarify the use of these two fields. Let me know if it makes sense to you.

@mattklein123 mattklein123 self-assigned this Jun 7, 2018
Signed-off-by: Nicolas Thiebaud <[email protected]>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Two more comments for discussion but I think in general LGTM.

// All resource names in the resource_names_subscribe list are added to the
// set of tracked resources and all resource names in the resource_names_unsubscribe
// list are removed from the set of tracked resources.
// Unlike in non incremental xDS, an empty resource_names_subscribe or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still describe here this is effectively a hint, right? Meaning, the management server is still allowed to send additional resources? (Though it must send the requested ones) I think it's worth clarifying this as it's similar to non-incremental in this regard?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// first in a stream. The keys are the resources names of the xDS resources
// known to the xDS client. The values in the map are the associated resource
// level version info.
map<string, string> resource_versions = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about calling this initial_local_resource_versions just so the intent is more clear? I don't feel strongly about this. Open to other names too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a great idea. Renamed it initial_resource_versions. I think the "local" part should be obvious.

Signed-off-by: Nicolas Thiebaud <[email protected]>
mattklein123
mattklein123 previously approved these changes Jun 7, 2018
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work! Will defer to @htuch for further review.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but would like to get the protocol spelled out in detail in XDS_PROTOCOL.md.

// ACK or NACK is determined by the absence or presence of error_detail.
// 3. Spontaneous IncrementalDiscoveryRequest from the client.
// This can be done to dynamically add or remove elements from the tracked
// resource_names set. In this case response_nonce must be omitted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to convince and explain why this doesn't have the racy behavior that originally motivated the nonce. I.e. we need sequence diagrams considering races. Can we make the changes to https://github.com/envoyproxy/envoy/blob/master/api/XDS_PROTOCOL.md as part of this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I think that even if a nonce-less incremental subscribe races with some other request/response, it's largely harmless; the set of resources being tracked by the management server should eventually converge on what the client is interested in tracking.

Consider this example (M is management server, E is Envoy, X/Y/Z are resources):

  1. M->E (CDS): {X}
  2. E-> M (CDS): subscribe Y
  3. E->M (CDS): unsubscribe Y
  4. M->E (CDS): {X, Y, Z}

What if steps 3 and 4 race? I don't think there is a problem here, since we eventually converge on the right thing, but there will be periods of inconsistency perhaps.

If someone can persuade me that the above is valid, I think we're good to ship, but want to just raise this.

@stale
Copy link

stale bot commented Jun 18, 2018

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Jun 18, 2018
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Jun 21, 2018
Nicolas Thiebaud added 2 commits June 21, 2018 16:22
Signed-off-by: Nicolas Thiebaud <[email protected]>
Signed-off-by: Nicolas Thiebaud <[email protected]>
all 100k clusters when a single cluster is modified, the management server
only needs to deliver the single cluster that changed.

An xDS Incremental session is always in the context of a gRPC bidirectional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/Incremental/incremental/ here, incremental is an adjective not a proper noun.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

connected to it. There is no REST version of Incremental xDS.

In Incremental xDS the nonce field is required and used to pair
IncrementalDiscoveryResponse to a IncrementalDiscoveryRequest ACK or NACK.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add docs links (as done with DiscoveryRequest/Response) above? Also, use quote style for proto message names as done for these messages above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


IncrementalDiscoveryRequest can be sent in 3 situations:
1. Initial message in a xDS bidirectional gRPC stream.
2. As a ACK or NACK response to a previous IncrementalDiscoveryResponse.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/a ACK/an ACK/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

IncrementalDiscoveryRequest can be sent in 3 situations:
1. Initial message in a xDS bidirectional gRPC stream.
2. As a ACK or NACK response to a previous IncrementalDiscoveryResponse.
In this case the response_nonce is set to the nonce value in the Response.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: use quote style for response_nonce and error_detail here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

This can be done to dynamically add or remove elements from the tracked
resource_names set. In this case response_nonce must be omitted.

In this first example the client connects as receive a first update that it ACKs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/as receive/and receives/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


In this first example the client connects as receive a first update that it ACKs.
The second update fails and the client NACKs the update. Later the xDS client
spontaneously requests the "wc" resource.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the wc resource_list_subscribe arrow is the wrong way around in the diagram.

// ACK or NACK is determined by the absence or presence of error_detail.
// 3. Spontaneous IncrementalDiscoveryRequest from the client.
// This can be done to dynamically add or remove elements from the tracked
// resource_names set. In this case response_nonce must be omitted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I think that even if a nonce-less incremental subscribe races with some other request/response, it's largely harmless; the set of resources being tracked by the management server should eventually converge on what the client is interested in tracking.

Consider this example (M is management server, E is Envoy, X/Y/Z are resources):

  1. M->E (CDS): {X}
  2. E-> M (CDS): subscribe Y
  3. E->M (CDS): unsubscribe Y
  4. M->E (CDS): {X, Y, Z}

What if steps 3 and 4 race? I don't think there is a problem here, since we eventually converge on the right thing, but there will be periods of inconsistency perhaps.

If someone can persuade me that the above is valid, I think we're good to ship, but want to just raise this.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM modulo remaining comments.

Signed-off-by: Nicolas Thiebaud <[email protected]>
@nt
Copy link
Member Author

nt commented Jun 28, 2018

For some reason I can not directly reply to your race comment about nonce-less spontaneous subscribes/unsubscribes:

  • subscribes: optionally the client can wait for the update for that specific resource before sending more sub/unsub. but really I should just wait to eventually receive the update.
  • unsubscribes: the client can locally ignore resources it no longer cares about.

I do not think this is a problem as these are "hints" to the XDS server.

htuch
htuch previously approved these changes Jun 29, 2018
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this is great, modulo two nits. @mattklein123 any further thoughts before merging?

[`IncrementalDiscoveryResponse`]](https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/discovery.proto#discoveryrequest)
to a [`IncrementalDiscoveryRequest`](https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/discovery.proto#discoveryrequest)
ACK or NACK.
Optionaly, a response message level system_version_info is present for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/Optionaly/Optionally/

connected to it. There is no REST version of Incremental xDS.

In incremental xDS the nonce field is required and used to pair a
[`IncrementalDiscoveryResponse`]](https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/discovery.proto#discoveryrequest)
Copy link
Member

@htuch htuch Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional bracket here - you can preview MD generation by clicking the View button on GH (pro tip).

Signed-off-by: Nicolas Thiebaud <[email protected]>
@nt
Copy link
Member Author

nt commented Jun 29, 2018

fixed both nits.

htuch
htuch previously approved these changes Jun 29, 2018
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one question. Thank you!


![Incremental reconnect example](diagrams/incremental-reconnect.svg)

The incremental xDS method may be added to CDS and RDS in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't this just be supported out of the box? It's not clear to me how any of this is ADS specific?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought adding it on a need-for basis made more sense to keep complexity down. We can add them now if you prefer. Let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO I would just add it now. Unless I'm misunderstanding the implementation, I think it will basically "just work." @htuch thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it will Just Work (tm).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

//
// A list of Resource names to add to the list of tracked resources.
repeated string resource_names_subscribe = 3;
// A list of Resource names to remove from the list of tracked resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newline before this line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: Nicolas Thiebaud <[email protected]>
Signed-off-by: Nicolas Thiebaud <[email protected]>
mattklein123
mattklein123 previously approved these changes Jul 2, 2018
@htuch
Copy link
Member

htuch commented Jul 2, 2018

@nt can you fix_format? Ready to merge when that's done.

Signed-off-by: Nicolas Thiebaud <[email protected]>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants