Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Rotating ACL Tokens #53

Open
lornasong opened this issue Jan 8, 2020 · 0 comments · May be fixed by #229
Open

Support Rotating ACL Tokens #53

lornasong opened this issue Jan 8, 2020 · 0 comments · May be fixed by #229

Comments

@lornasong
Copy link
Member

When a consul-esm instance's token is revoked, maybe from rotating acl tokens, there are some unexpected outcomes for consul-esm:

  • the instance's status remains passing/healthy and is never marked critical. This can be seen at /v1/health/node/:node
  • the instance's assigned external health checks are not successfully executed. as a result of staying "passing"/"healthy", the instance's assigned external health checks are not reassigned to other actually healthy instances with appropriate tokens
  • the instance is not able to successfully deregister

The revoked token is needed to update the health check and deregister. This is expected as a result of anti-entropy.

The larger issue around supporting rotating acl tokens is already captured in hashicorp/consul#4372. The recommendation is to reregister the application (consul-esm in this case) with the new token.

Currently, consul-esm doesn't have a way to reregister itself. On stopping and restarting consul-esm, the stopped instance will fail to deregister while the newly started instance will obtain a new id. This leads to having 'dead', floating consul-esm instances in the cluster. A serious consequence is that these dead consul-esm instances retain responsibility for their external health checks since they remain marked as healthy/passing in the catalog.

This issue arises from comment: #39 (comment)

Steps to reproduce

  1. Start consul (I used v1.6.2) with ACLs enabled
  2. Register two external health checks
  3. Start consul-esm (I used v0.3.3) with relevant token needed to operate and log_level=DEBUG
  4. Start another consul-esm with a different token needed to operate and log_level=DEBUG
  5. Observe that each consul-esm is executing one of the external health checks
  6. Delete token for one of the consul-esms
  7. Observe in consul-logs that revoked-token consul-esm has failed its TTL check
  8. Query /v1/health/node/<revoked-token-consul-esm-id> and see that the status is still passing
  9. Stop revoked-token consul-esm instance (Control+C)
  10. Observe in consul-logs that consul-esm was not able to successfully deregister
  11. Observe in remaining healthy consul-esm instance that it is executing only one external health check - the one it was originally assigned - and it did not inherit the other external health check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant