Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual Backport of Fix resolution of service resolvers with subsets for external upstreams into release/1.13.x #16559

Merged
merged 4 commits into from
Mar 7, 2023

Conversation

andrewstucki
Copy link
Contributor

Backport

Manual backport from #16499 to release/1.13.x.

The below text is copied from the body of the original PR.


Description

While fixing #16498 I noticed that applying a ServiceResolver with subsets wasn't functional when referencing an external service proxied through a TerminatingGateway as an upstream. I'm not too familiar with the way we return service health for external services, but the problem appears to be that in our health check materializer we:

  1. Grab the CheckServiceNode values from our subscription, and then
  2. Apply any filters that were in our initial subscription request

Since we return the gateway associated with the service when we're using external services in conjuction with a TerminatingGateway:

// Look up gateway nodes associated with the service
// TODO(peering): we'll have to do something here
gwIdx, nodes, err := serviceGatewayNodes(tx, ws, serviceName, structs.ServiceKindTerminatingGateway, entMeta, structs.DefaultPeerKeyword)
if err != nil {
return 0, nil, fmt.Errorf("failed gateway nodes lookup: %v", err)
}
idx = lib.MaxUint64(idx, gwIdx)
for i := 0; i < len(nodes); i++ {
results = append(results, nodes[i])
name := structs.NewServiceName(nodes[i].ServiceName, &nodes[i].EnterpriseMeta)
serviceNames[name] = struct{}{}
}

The filter never passes and we are never able to resolve the upstream endpoint properly.

I'm not entirely sure whether this is the only change needed, but from what I could tell all of the health checks initiated via proxycfg go through this code path since they leverage either the gRPC endpoints or a direct subscription to the in-memory store.

Testing & Reproduction steps

Create a set of external services that has a ServiceResolver with subsets as in #16498 and a local service that leverages those services as an upstream. Hit the local proxy's admin cluster listing endpoint.

Without the fix (no ip address ever associates with the endpoint):

~ curl -s localhost:9092/clusters | grep v1.external | sort | head -n 1
v1.external.default.dc1.internal.cba29ba8-8796-2c26-cacd-0ee5dee70b82.consul::added_via_api::true

With the fix (contains terminating gateway ip for its endpoint):

~ curl -s localhost:9092/clusters | grep v1.external | sort | head -n 1
v1.external.default.dc1.internal.ea87fe29-2a6c-bd80-e248-5ebdbfed0a7a.consul::127.0.0.1:8443::canary::false

PR Checklist

  • updated test coverage
  • external facing docs updated
  • not a security concern

Overview of commits

Copy link
Member

@nathancoleman nathancoleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Diff is slightly different due to missing helper functions in the base branch

@andrewstucki andrewstucki merged commit b81d63f into release/1.13.x Mar 7, 2023
@andrewstucki andrewstucki deleted the release-1.13.x-backport-resolver-fix-2 branch March 7, 2023 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants