-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the mesh gateway changes to allow local
mode for cluster peering data plane traffic
#14817
Conversation
49df83c
to
c497ea4
Compare
c497ea4
to
89033ea
Compare
agent/xds/clusters.go
Outdated
return nil, fmt.Errorf("couldn't get SNI for peered service %s", sn.String()) | ||
} | ||
clusterName = node.Service.Connect.PeerMeta.PrimarySNI() | ||
isRemote = !cfgSnap.Locality.Matches(node.Node.Datacenter, node.Node.PartitionOrDefault()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this right? I copied the outline of this function from makeGatewayServiceClusters
and am not 100% sure what to do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isRemote
could just be set to true on L565 since we already know we're dialing a different cluster
Though an issue here and in endpoints.go
is that the hostname handling is missing. We need to check whether the addresses attached to these services are hostnames.
Take a look at makeUpstreamLoadAssignmentForPeerService
and makeUpstreamClusterForPeerService
.
Since the remote addresses are mesh gateway addresses they're likely to have a hostname address (cloud provider LB) and we can't send hostnames via EDS. We instead attach them to a cluster with DNS as the discovery type.
agent/xds/testdata/clusters/ingress-with-defaults-service-max-connections.latest.golden
Show resolved
Hide resolved
@@ -56,18 +56,18 @@ func TestNodeServiceWithName(t testing.T, name string) *NodeService { | |||
const peerTrustDomain = "1c053652-8512-4373-90cf-5a7f6263a994.consul" | |||
|
|||
func TestNodeServiceWithNameInPeer(t testing.T, name string, peer string) *NodeService { | |||
service := "payments" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change fixed some incorrect failover golden snapshots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice progress! I left some comments below.
My main request is that the cluster/endpoint generation needs some adjustment to account for hostname endpoints, since they can't be sent over EDS. Let me know if you have any questions!
agent/xds/testdata/clusters/ingress-with-defaults-service-max-connections.latest.golden
Show resolved
Hide resolved
agent/xds/endpoints.go
Outdated
for _, serviceGroups := range cfgSnap.MeshGateway.PeeringServices { | ||
for sn, serviceGroup := range serviceGroups { | ||
var clusterName string | ||
if len(serviceGroup) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem like we should proceed if len(serviceGroup) == 0
, since there won't be any endpoints and the clusterName won't be populated
agent/xds/clusters.go
Outdated
for sn, serviceGroup := range serviceGroups { | ||
var clusterName string | ||
var isRemote bool | ||
if len(serviceGroup) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as with endpoints, if we can't get a clusterName we probably shouldn't attempt to make a cluster
Could do:
if len(serviceGroup) == 0 {
continue
}
agent/xds/clusters.go
Outdated
return nil, fmt.Errorf("couldn't get SNI for peered service %s", sn.String()) | ||
} | ||
clusterName = node.Service.Connect.PeerMeta.PrimarySNI() | ||
isRemote = !cfgSnap.Locality.Matches(node.Node.Datacenter, node.Node.PartitionOrDefault()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isRemote
could just be set to true on L565 since we already know we're dialing a different cluster
Though an issue here and in endpoints.go
is that the hostname handling is missing. We need to check whether the addresses attached to these services are hostnames.
Take a look at makeUpstreamLoadAssignmentForPeerService
and makeUpstreamClusterForPeerService
.
Since the remote addresses are mesh gateway addresses they're likely to have a hostname address (cloud provider LB) and we can't send hostnames via EDS. We instead attach them to a cluster with DNS as the discovery type.
@@ -55,3 +55,7 @@ load helpers | |||
@test "s1 upstream made 1 connection to s2" { | |||
assert_envoy_metric_at_least 127.0.0.1:19000 "cluster.s2.default.primary-to-alpha.external.*cx_total" 1 | |||
} | |||
|
|||
@test "s1 upstream made 1 connection to s2 through the primary mesh gateway" { | |||
assert_envoy_metric_at_least 127.0.0.1:19001 "cluster.s2.default.default.alpha-to-primary.external.*cx_total" 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unusual, why is alpha-to-primary
here? That's the name that alpha
uses to refer to primary
, so why would the local primary gateway be aware of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was confused by this but was hoping there was some logic behind it. This matches the SNI created by the sidecar proxy 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see what's going on. That's how the remote side can know that this traffic is for an exported service as opposed to the usual mesh gateway route for s2
. See this comment: https://github.com/hashicorp/consul/blob/main/agent/xds/clusters.go#L1284-L1298
I'm a bit worried about whether there could be collisions on the dialing side though.
This SNI encodes: (service, namespace, partition, peerName, trustDomain)
so I think there could be a collision if there is a separate peering with all the same data but different DC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that would require overriding the trust domain to make it the same on multiple non-WAN-federated DCs. We already need people not to do this because it would also lead to control-plane MGW collisions. So maybe just something to document.
…ng data plane traffic
edccae0
to
ebe0a48
Compare
…r peering data plane traffic
ebe0a48
to
62b548f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
This PR creates Envoy clusters and envoys on mesh gateways that forward requests to a mesh gateway in another admin partition. These Envoy resources are used to enable mesh gateway local mode with cluster peering.
Testing & Reproduction steps
PR Checklist