Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the mesh gateway changes to allow local mode for cluster peering data plane traffic #14817

Merged
merged 2 commits into from
Oct 6, 2022

Conversation

erichaberkorn
Copy link
Contributor

Description

This PR creates Envoy clusters and envoys on mesh gateways that forward requests to a mesh gateway in another admin partition. These Envoy resources are used to enable mesh gateway local mode with cluster peering.

Testing & Reproduction steps

  • Manually set up cluster peering and verified that this works
  • Unit tests
  • Integration tests

PR Checklist

  • updated test coverage
  • external facing docs updated
  • not a security concern

@github-actions github-actions bot added the theme/envoy/xds Related to Envoy support label Sep 30, 2022
return nil, fmt.Errorf("couldn't get SNI for peered service %s", sn.String())
}
clusterName = node.Service.Connect.PeerMeta.PrimarySNI()
isRemote = !cfgSnap.Locality.Matches(node.Node.Datacenter, node.Node.PartitionOrDefault())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? I copied the outline of this function from makeGatewayServiceClusters and am not 100% sure what to do here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isRemote could just be set to true on L565 since we already know we're dialing a different cluster

Though an issue here and in endpoints.go is that the hostname handling is missing. We need to check whether the addresses attached to these services are hostnames.

Take a look at makeUpstreamLoadAssignmentForPeerService and makeUpstreamClusterForPeerService.

Since the remote addresses are mesh gateway addresses they're likely to have a hostname address (cloud provider LB) and we can't send hostnames via EDS. We instead attach them to a cluster with DNS as the discovery type.

@@ -56,18 +56,18 @@ func TestNodeServiceWithName(t testing.T, name string) *NodeService {
const peerTrustDomain = "1c053652-8512-4373-90cf-5a7f6263a994.consul"

func TestNodeServiceWithNameInPeer(t testing.T, name string, peer string) *NodeService {
service := "payments"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixed some incorrect failover golden snapshots.

@erichaberkorn erichaberkorn marked this pull request as ready for review October 3, 2022 14:03
@erichaberkorn erichaberkorn requested a review from a team October 3, 2022 14:03
Copy link
Contributor

@freddygv freddygv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice progress! I left some comments below.

My main request is that the cluster/endpoint generation needs some adjustment to account for hostname endpoints, since they can't be sent over EDS. Let me know if you have any questions!

for _, serviceGroups := range cfgSnap.MeshGateway.PeeringServices {
for sn, serviceGroup := range serviceGroups {
var clusterName string
if len(serviceGroup) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem like we should proceed if len(serviceGroup) == 0, since there won't be any endpoints and the clusterName won't be populated

for sn, serviceGroup := range serviceGroups {
var clusterName string
var isRemote bool
if len(serviceGroup) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as with endpoints, if we can't get a clusterName we probably shouldn't attempt to make a cluster

Could do:

if len(serviceGroup) == 0 { 
	continue
}

return nil, fmt.Errorf("couldn't get SNI for peered service %s", sn.String())
}
clusterName = node.Service.Connect.PeerMeta.PrimarySNI()
isRemote = !cfgSnap.Locality.Matches(node.Node.Datacenter, node.Node.PartitionOrDefault())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isRemote could just be set to true on L565 since we already know we're dialing a different cluster

Though an issue here and in endpoints.go is that the hostname handling is missing. We need to check whether the addresses attached to these services are hostnames.

Take a look at makeUpstreamLoadAssignmentForPeerService and makeUpstreamClusterForPeerService.

Since the remote addresses are mesh gateway addresses they're likely to have a hostname address (cloud provider LB) and we can't send hostnames via EDS. We instead attach them to a cluster with DNS as the discovery type.

@@ -55,3 +55,7 @@ load helpers
@test "s1 upstream made 1 connection to s2" {
assert_envoy_metric_at_least 127.0.0.1:19000 "cluster.s2.default.primary-to-alpha.external.*cx_total" 1
}

@test "s1 upstream made 1 connection to s2 through the primary mesh gateway" {
assert_envoy_metric_at_least 127.0.0.1:19001 "cluster.s2.default.default.alpha-to-primary.external.*cx_total" 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unusual, why is alpha-to-primary here? That's the name that alpha uses to refer to primary, so why would the local primary gateway be aware of that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was confused by this but was hoping there was some logic behind it. This matches the SNI created by the sidecar proxy 🤔

Copy link
Contributor

@freddygv freddygv Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what's going on. That's how the remote side can know that this traffic is for an exported service as opposed to the usual mesh gateway route for s2. See this comment: https://github.com/hashicorp/consul/blob/main/agent/xds/clusters.go#L1284-L1298

I'm a bit worried about whether there could be collisions on the dialing side though.

This SNI encodes: (service, namespace, partition, peerName, trustDomain) so I think there could be a collision if there is a separate peering with all the same data but different DC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that would require overriding the trust domain to make it the same on multiple non-WAN-federated DCs. We already need people not to do this because it would also lead to control-plane MGW collisions. So maybe just something to document.

@erichaberkorn erichaberkorn force-pushed the mesh-gw-dp-peer branch 2 times, most recently from edccae0 to ebe0a48 Compare October 5, 2022 16:26
Copy link
Contributor

@freddygv freddygv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@erichaberkorn erichaberkorn merged commit 1633cf2 into main Oct 6, 2022
@erichaberkorn erichaberkorn deleted the mesh-gw-dp-peer branch October 6, 2022 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/envoy/xds Related to Envoy support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants