Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of Fix gateway services cleanup where proxy deregistration happens after service deregistration into release/1.16.x #18855

Conversation

hc-github-team-consul-core
Copy link
Collaborator

Backport

This PR is auto-generated from #18831 to be assessed for backporting due to the inclusion of the label backport/1.16.

The below text is copied from the body of the original PR.


Description

While trying to debug a customer issue, I noticed that when:

  1. A user specifies an ingress gateway with a target of "*"
  2. They boot up the ingress gateway and a service in the target namespace
  3. They've registered the target and its sidecar proxy individually
  4. They then deregister the target followed by the corresponding sidecar
  5. The gateway still attempts to route to the service

This happens because of a bug in the store that was causing wildcard-targeted gateway services to be pruned from tableGatewayServices only when they don't have existing sidecar proxies. In the above case, when the sidecar is deregistered second, the cleanup routine never triggers for the main service (instead a routine runs that attempts to prune the sidecar service name from tableGatewayServices rather than the service it's acting as a sidecar for).

Testing & Reproduction steps

I added a unit test to exercise this case, but I also threw together a gist that manually recreates this. I did so using an enterprise Consul build, but the steps can be adjusted for non-enterprise as well.

  1. Clone the repro gist
  2. build a local consul-enterprise build with make dev-docker and tag it with docker tag consul:local consul-enterprise:local
  3. shove an enterprise license in the environment variable CONSUL_LICENSE
  4. run ./reset.sh in the gist
  5. run kubectl delete deployment static-server
  6. run kubectl exec -it consul-server-0 -- consul catalog services and see that static-server is not longer in the catalog
  7. run kubectl exec -it consul-server-0 -- curl https://localhost:8501/v1/catalog/gateway-services/consul-dc1-ingress-gateway -k | jq

Without the fix you'll see something like (notice static-server still in the payload):

[
  {
    "Gateway": {
      "Name": "consul-dc1-ingress-gateway",
      "Partition": "default",
      "Namespace": "default"
    },
    "Service": {
      "Name": "service-two",
      "Partition": "default",
      "Namespace": "default"
    },
    "GatewayKind": "ingress-gateway",
    "Port": 8080,
    "Protocol": "http",
    "FromWildcard": true,
    "ServiceKind": "service",
    "CreateIndex": 76,
    "ModifyIndex": 76
  },
  {
    "Gateway": {
      "Name": "consul-dc1-ingress-gateway",
      "Partition": "default",
      "Namespace": "default"
    },
    "Service": {
      "Name": "static-server",
      "Partition": "default",
      "Namespace": "default"
    },
    "GatewayKind": "ingress-gateway",
    "Port": 8080,
    "Protocol": "http",
    "FromWildcard": true,
    "ServiceKind": "service",
    "CreateIndex": 57,
    "ModifyIndex": 57
  }
]

With the fix you should see something like:

[
  {
    "Gateway": {
      "Name": "consul-dc1-ingress-gateway",
      "Partition": "default",
      "Namespace": "default"
    },
    "Service": {
      "Name": "service-two",
      "Partition": "default",
      "Namespace": "default"
    },
    "GatewayKind": "ingress-gateway",
    "Port": 8080,
    "Protocol": "http",
    "FromWildcard": true,
    "CreateIndex": 607,
    "ModifyIndex": 607
  }
]

PR Checklist

  • updated test coverage
  • external facing docs updated
  • appropriate backport labels added
  • not a security concern

Overview of commits

NiniOak and others added 30 commits July 19, 2023 18:24
u[date readme.md
Align all our internal use of submodules on the latest versions.
Result of tsccr-helper -log-level=info -pin-all-workflows .

Co-authored-by: hashicorp-tsccr[bot] <hashicorp-tsccr[bot]@users.noreply.github.com>
* Fix Backport Assistant failure PR commenting

For general comments on a PR, it looks like you have to use the `/issue`
endpoint rather than `/pulls`, which requires commit/other
review-specific target details.

This matches the endpoint used in `backport-reminder.yml`.

* Remove Backport Reminder workflow

This is noisy (even when adding multiple labels, individual comments per
label are generated), and likely no longer needed: we haven't had this
work in a long time due to an expired GH token, and we now have better
automation for backport PR assignment.
This PR explicitly enables WebSocket upgrades in Envoy's UpgradeConfig for all
proxy types. (API Gateway, Ingress, and Sidecar.)

Fixes #8283
test: improve xDS cluster code coverage
Net 4222 take config file consul container
* [CONSUL-395] Update check_hostport and Usage (#40)

* [CONSUL-397] Copy envoy binary from Image (#41)

* [CONSUL-382] Support openssl in unique test dockerfile (#43)

* [CONSUL-405] Add bats to single container (#44)

* [CONSUL-414] Run Prometheus Test Cases and Validate Changes (#46)

* [CONSUL-410] Run Jaeger in Single container (#45)

* [CONSUL-412] Run test-sds-server in single container (#48)

* [CONSUL-408] Clean containers (#47)

* [CONSUL-384] Rebase and sync fork (#50)

* [CONSUL-415] Create Scenarios Troubleshooting Docs (#49)

* [CONSUL-417] Update Docs Single Container (#51)

* [CONSUL-428] Add Socat to single container (#54)

* [CONSUL-424] Replace pkill in kill_envoy function (#52)

* [CONSUL-434] Modify Docker run functions in Helper script (#53)

* [CONSUL-435] Replace docker run in set_ttl_check_state & wait_for_agent_service_register functions (#55)

* [CONSUL-438] Add netcat (nc) in the Single container Dockerfile (#56)

* [CONSUL-429] Replace Docker run with Docker exec (#57)

* [CONSUL-436] Curl timeout and run tests (#58)

* [CONSUL-443] Create dogstatsd Function (#59)

* [CONSUL-431] Update Docs Netcat (#60)

* [CONSUL-439] Parse nc Command in function (#61)

* [CONSUL-463] Review curl Exec and get_ca_root Func (#63)

* [CONSUL-453] Docker hostname in Helper functions (#64)

* [CONSUL-461] Test wipe volumes without extra cont (#66)

* [CONSUL-454] Check ports in the Server and Agent containers (#65)

* [CONSUL-441] Update windows dockerfile with version (#62)

* [CONSUL-466] Review case-grpc Failing Test (#67)

* [CONSUL-494] Review case-cfg-resolver-svc-failover (#68)

* [CONSUL-496] Replace docker_wget & docker_curl (#69)

* [CONSUL-499] Cleanup Scripts - Remove nanoserver (#70)

* [CONSUL-500] Update Troubleshooting Docs (#72)

* [CONSUL-502] Pull & Tag Envoy Windows Image (#73)

* [CONSUL-504] Replace docker run in docker_consul (#76)

* [CONSUL-505] Change admin_bind

* [CONSUL-399] Update envoy to 1.23.1 (#78)

* [CONSUL-510] Support case-wanfed-gw on Windows (#79)

* [CONSUL-506] Update troubleshooting Documentation (#80)

* [CONSUL-512] Review debug_dump_volumes Function (#81)

* [CONSUL-514] Add zipkin to Docker Image (#82)

* [CONSUL-515] Update Documentation (#83)

* [CONSUL-529] Support case-consul-exec (#86)

* [CONSUL-530] Update Documentation (#87)

* [CONSUL-530] Update default consul version 1.13.3

* [CONSUL-539] Cleanup (#91)

* [CONSUL-546] Scripts Clean-up (#92)

* [CONSUL-491] Support admin_access_log_path value for Windows (#71)

* [CONSUL-519] Implement mkfifo Alternative (#84)

* [CONSUL-542] Create OS Specific Files for Envoy Package (#88)

* [CONSUL-543] Create exec_supported.go (#89)

* [CONSUL-544] Test and Build Changes (#90)

* Implement os.DevNull

* using mmap instead of disk files

* fix import in exec-unix

* fix nmap open too many arguemtn

* go fmt on file

* changelog file

* fix go mod

* Update .changelog/17694.txt

Co-authored-by: Dhia Ayachi <[email protected]>

* different mmap library

* fix bootstrap json

* some fixes

* chocolatey version fix and image fix

* using different library

* fix Map funciton call

* fix mmap call

* fix tcp dump

* fix tcp dump

* windows tcp dump

* Fix docker run

* fix tests

* fix go mod

* fix version 16.0

* fix version

* fix version dev

* sleep to debug

* fix sleep

* fix permission issue

* fix permission issue

* fix permission issue

* fix command

* fix command

* fix funciton

* fix assert config entry status command not found

* fix command not found assert_cert_has_cn

* fix command not found assert_upstream_missing

* fix command not found assert_upstream_missing_once

* fix command not found get_upstream_endpoint

* fix command not found get_envoy_public_listener_once

* fix command not found

* fix test cases

* windows integration test workflow github

* made code similar to unix using npipe

* fix go.mod

* fix dialing of npipe

* dont wait

* check size of written json

* fix undefined n

* running

* fix dep

* fix syntax error

* fix workflow file

* windows runner

* fix runner

* fix from json

* fix runs on

* merge connect envoy

* fix cin path

* build

* fix file name

* fix file name

* fix dev build

* remove unwanted code

* fix upload

* fix bin name

* fix path

* checkout current branch

* fix path

* fix tests

* fix shell bash for windows sh files

* fix permission of run-test.sh

* removed docker dev

* added shell bash for tests

* fix tag

* fix win=true

* fix cd

* added dev

* fix variable undefined

* removed failing tests

* fix tcp dump image

* fix curl

* fix curl

* tcp dump path

* fix tcpdump path

* fix curl

* fix curl install

* stop removing intermediate containers

* fix tcpdump docker image

* revert -rm

* --rm=false

* makeing docker image before

* fix tcpdump

* removed case consul exec

* removed terminating gateway simple

* comment case wasm

* removed data dog

* comment out upload coverage

* uncomment case-consul-exec

* comment case consul exec

* if always

* logs

* using consul 1.17.0

* fix quotes

* revert quotes

* redirect to dev null

* Revert version

* revert consul connect

* fix version

* removed envoy connect

* not using function

* change log

* docker logs

* fix logs

* restructure bad authz

* rmeoved dev null

* output

* fix file descriptor

* fix cacert

* fix cacert

* fix ca cert

* cacert does not work in windows curl

* fix func

* removed docker logs

* added sleep

* fix tls

* commented case-consul-exec

* removed echo

* retry docker consul

* fix upload bin

* uncomment consul exec

* copying consul.exe to docker image

* copy fix

* fix paths

* fix path

* github workspace path

* latest version

* Revert "latest version"

This reverts commit 5a7d7b8.

* commented consul exec

* added ssl revoke best effort

* revert best effort

* removed unused files

* rename var name and change dir

* windows runner

* permission

* needs setup fix

* swtich to github runner

* fix file path

* fix path

* fix path

* fix path

* fix path

* fix path

* fix build paths

* fix tag

* nightly runs

* added matrix in github workflow, renamed files

* fix job

* fix matrix

* removed brackes

* from json

* without using job matrix

* fix quotes

* revert job matrix

* fix workflow

* fix comment

* added comment

* nightly runs

* removed datadog ci as it is already measured in linux one

* running test

* Revert "running test"

This reverts commit 7013d15.

* pr comment fixes

* running test now

* running subset of test

* running subset of test

* job matrix

* shell bash

* removed bash shell

* linux machine for job matrix

* fix output

* added cat to debug

* using ubuntu latest

* fix job matrix

* fix win true

* fix go test

* revert job matrix

---------

Co-authored-by: Jose Ignacio Lorenzo <[email protected]>
Co-authored-by: Franco Bruno Lavayen <[email protected]>
Co-authored-by: Ivan K Berlot <[email protected]>
Co-authored-by: Ezequiel Fernández Ponce <[email protected]>
Co-authored-by: joselo85 <[email protected]>
Co-authored-by: Ezequiel Fernández Ponce <[email protected]>
Co-authored-by: Dhia Ayachi <[email protected]>
* fix typos and update ecs compat table

* real info for the ecs compat matrix table

* Update website/content/docs/ecs/compatibility.mdx

Co-authored-by: Chris Thain <[email protected]>

---------

Co-authored-by: Chris Thain <[email protected]>
* proxystate: add proxystate protos to pbmesh and resolve imports and conflicts between message names
* [CC-5718] Remove HCP token requirement during bootstrap

* Re-add error for loading HCP management token

* Remove old comment

* Add changelog entry

* Remove extra validation line

* Apply suggestions from code review

Co-authored-by: lornasong <[email protected]>

---------

Co-authored-by: lornasong <[email protected]>
Doc guidance for federation with externalServers

Add guidance for proper configuration when joining to a secondary
cluster using WAN fed with external servers also enabled.

Also clarify federation requirements and fix formatting for an
unrelated value.

Update both the Helm chart reference (synced from `consul-k8s`, see
hashicorp/consul-k8s#2583) and the docs on using `externalServers`.
test: improve xDS endpoints code coverage
Add Alicia's edits to clarify log timing and other details
* [CONSUL-382] Support openssl in unique test dockerfile (#43)

* [CONSUL-405] Add bats to single container (#44)

* [CONSUL-414] Run Prometheus Test Cases and Validate Changes (#46)

* [CONSUL-410] Run Jaeger in Single container (#45)

* [CONSUL-412] Run test-sds-server in single container (#48)

* [CONSUL-408] Clean containers (#47)

* [CONSUL-384] Rebase and sync fork (#50)

* [CONSUL-415] Create Scenarios Troubleshooting Docs (#49)

* [CONSUL-417] Update Docs Single Container (#51)

* [CONSUL-428] Add Socat to single container (#54)

* [CONSUL-424] Replace pkill in kill_envoy function (#52)

* [CONSUL-434] Modify Docker run functions in Helper script (#53)

* [CONSUL-435] Replace docker run in set_ttl_check_state & wait_for_agent_service_register functions (#55)

* [CONSUL-438] Add netcat (nc) in the Single container Dockerfile (#56)

* [CONSUL-429] Replace Docker run with Docker exec (#57)

* [CONSUL-436] Curl timeout and run tests (#58)

* [CONSUL-443] Create dogstatsd Function (#59)

* [CONSUL-431] Update Docs Netcat (#60)

* [CONSUL-439] Parse nc Command in function (#61)

* [CONSUL-463] Review curl Exec and get_ca_root Func (#63)

* [CONSUL-453] Docker hostname in Helper functions (#64)

* [CONSUL-461] Test wipe volumes without extra cont (#66)

* [CONSUL-454] Check ports in the Server and Agent containers (#65)

* [CONSUL-441] Update windows dockerfile with version (#62)

* [CONSUL-466] Review case-grpc Failing Test (#67)

* [CONSUL-494] Review case-cfg-resolver-svc-failover (#68)

* [CONSUL-496] Replace docker_wget & docker_curl (#69)

* [CONSUL-499] Cleanup Scripts - Remove nanoserver (#70)

* [CONSUL-500] Update Troubleshooting Docs (#72)

* [CONSUL-502] Pull & Tag Envoy Windows Image (#73)

* [CONSUL-504] Replace docker run in docker_consul (#76)

* [CONSUL-505] Change admin_bind

* [CONSUL-399] Update envoy to 1.23.1 (#78)

* [CONSUL-510] Support case-wanfed-gw on Windows (#79)

* [CONSUL-506] Update troubleshooting Documentation (#80)

* [CONSUL-512] Review debug_dump_volumes Function (#81)

* [CONSUL-514] Add zipkin to Docker Image (#82)

* [CONSUL-515] Update Documentation (#83)

* [CONSUL-529] Support case-consul-exec (#86)

* [CONSUL-530] Update Documentation (#87)

* [CONSUL-530] Update default consul version 1.13.3

* [CONSUL-539] Cleanup (#91)

* [CONSUL-546] Scripts Clean-up (#92)

* [CONSUL-491] Support admin_access_log_path value for Windows (#71)

* [CONSUL-519] Implement mkfifo Alternative (#84)

* [CONSUL-542] Create OS Specific Files for Envoy Package (#88)

* [CONSUL-543] Create exec_supported.go (#89)

* [CONSUL-544] Test and Build Changes (#90)

* Implement os.DevNull

* using mmap instead of disk files

* fix import in exec-unix

* fix nmap open too many arguemtn

* go fmt on file

* changelog file

* fix go mod

* Update .changelog/17694.txt

Co-authored-by: Dhia Ayachi <[email protected]>

* different mmap library

* fix bootstrap json

* some fixes

* chocolatey version fix and image fix

* using different library

* fix Map funciton call

* fix mmap call

* fix tcp dump

* fix tcp dump

* windows tcp dump

* Fix docker run

* fix tests

* fix go mod

* fix version 16.0

* fix version

* fix version dev

* sleep to debug

* fix sleep

* fix permission issue

* fix permission issue

* fix permission issue

* fix command

* fix command

* fix funciton

* fix assert config entry status command not found

* fix command not found assert_cert_has_cn

* fix command not found assert_upstream_missing

* fix command not found assert_upstream_missing_once

* fix command not found get_upstream_endpoint

* fix command not found get_envoy_public_listener_once

* fix command not found

* fix test cases

* windows integration test workflow github

* made code similar to unix using npipe

* fix go.mod

* fix dialing of npipe

* dont wait

* check size of written json

* fix undefined n

* running

* fix dep

* fix syntax error

* fix workflow file

* windows runner

* fix runner

* fix from json

* fix runs on

* merge connect envoy

* fix cin path

* build

* fix file name

* fix file name

* fix dev build

* remove unwanted code

* fix upload

* fix bin name

* fix path

* checkout current branch

* fix path

* fix tests

* fix shell bash for windows sh files

* fix permission of run-test.sh

* removed docker dev

* added shell bash for tests

* fix tag

* fix win=true

* fix cd

* added dev

* fix variable undefined

* removed failing tests

* fix tcp dump image

* fix curl

* fix curl

* tcp dump path

* fix tcpdump path

* fix curl

* fix curl install

* stop removing intermediate containers

* fix tcpdump docker image

* revert -rm

* --rm=false

* makeing docker image before

* fix tcpdump

* removed case consul exec

* removed terminating gateway simple

* comment case wasm

* removed data dog

* comment out upload coverage

* uncomment case-consul-exec

* comment case consul exec

* if always

* logs

* using consul 1.17.0

* fix quotes

* revert quotes

* redirect to dev null

* Revert version

* revert consul connect

* fix version

* removed envoy connect

* not using function

* change log

* docker logs

* fix logs

* restructure bad authz

* rmeoved dev null

* output

* fix file descriptor

* fix cacert

* fix cacert

* fix ca cert

* cacert does not work in windows curl

* fix func

* removed docker logs

* added sleep

* fix tls

* commented case-consul-exec

* removed echo

* retry docker consul

* fix upload bin

* uncomment consul exec

* copying consul.exe to docker image

* copy fix

* fix paths

* fix path

* github workspace path

* latest version

* Revert "latest version"

This reverts commit 5a7d7b8.

* commented consul exec

* added ssl revoke best effort

* revert best effort

* removed unused files

* rename var name and change dir

* windows runner

* permission

* needs setup fix

* swtich to github runner

* fix file path

* fix path

* fix path

* fix path

* fix path

* fix path

* fix build paths

* fix tag

* nightly runs

* added matrix in github workflow, renamed files

* fix job

* fix matrix

* removed brackes

* from json

* without using job matrix

* fix quotes

* revert job matrix

* fix workflow

* fix comment

* added comment

* nightly runs

* removed datadog ci as it is already measured in linux one

* running test

* Revert "running test"

This reverts commit 7013d15.

* pr comment fixes

* running test now

* running subset of test

* running subset of test

* job matrix

* shell bash

* removed bash shell

* linux machine for job matrix

* fix output

* added cat to debug

* using ubuntu latest

* fix job matrix

* fix win true

* fix go test

* revert job matrix

* Fix tests

---------

Co-authored-by: Ivan K Berlot <[email protected]>
Co-authored-by: Jose Ignacio Lorenzo <[email protected]>
Co-authored-by: Franco Bruno Lavayen <[email protected]>
Co-authored-by: Ezequiel Fernández Ponce <[email protected]>
Co-authored-by: joselo85 <[email protected]>
Co-authored-by: Ezequiel Fernández Ponce <[email protected]>
Co-authored-by: Dhia Ayachi <[email protected]>
Ensure that OSS remains in sync w/ Enterprise by aligning the format of
arch matrix args for various build jobs.
…m running on docs only and ui only changes" (#18248)

Revert "NET-4996 - filter go-tests and test-integration workflows from running on docs only and ui only changes (#18236)"

This reverts commit a11dba7.
Enables querying a resource type's registration to determine if a resource is cluster, partition, or partition and namespace scoped.
Backfill changelog entry for c2bbe67 and 7402d06

Add a changelog entry for the follow-up PR since it was specific to the
fix and references the original change.
* member cli: add -filter expression to flags

* changelog

* update doc

* Add test cases

* use quote
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto approved Consul Bot automated PR

@github-actions github-actions bot added type/docs Documentation needs to be created/updated/clarified theme/api Relating to the HTTP API interface theme/health-checks Health Check functionality theme/acls ACL and token generation theme/cli Flags and documentation for the CLI interface theme/config Relating to Consul Agent configuration, including reloading theme/ui Anything related to the UI theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies theme/telemetry Anything related to telemetry or observability type/ci Relating to continuous integration (CI) tooling for testing or releases pr/dependencies PR specifically updates dependencies of project theme/envoy/xds Related to Envoy support theme/contributing Additions and enhancements to community contributing materials theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics theme/consul-terraform-sync Relating to Consul Terraform Sync and Network Infrastructure Automation labels Sep 18, 2023
@vercel vercel bot temporarily deployed to Preview – consul September 18, 2023 20:24 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/dependencies PR specifically updates dependencies of project theme/acls ACL and token generation theme/api Relating to the HTTP API interface theme/cli Flags and documentation for the CLI interface theme/config Relating to Consul Agent configuration, including reloading theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies theme/consul-terraform-sync Relating to Consul Terraform Sync and Network Infrastructure Automation theme/contributing Additions and enhancements to community contributing materials theme/envoy/xds Related to Envoy support theme/health-checks Health Check functionality theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics theme/telemetry Anything related to telemetry or observability theme/ui Anything related to the UI type/ci Relating to continuous integration (CI) tooling for testing or releases type/docs Documentation needs to be created/updated/clarified
Projects
None yet
Development

Successfully merging this pull request may close these issues.