Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of Stop processing ACME verifications when active node is stepped down into release/1.14.x #23285

Conversation

hc-github-team-secure-vault-core
Copy link
Collaborator

Backport

This PR is auto-generated from #23278 to be assessed for backporting due to the inclusion of the label backport/1.14.x.

The below text is copied from the body of the original PR.


  • Do not load existing ACME challenges persisted within storage on non-active nodes. This was the main culprit of the issues, secondary nodes would load existing persisted challenges trying to resolve them but writes would fail leading to the excessive logging.

    • We now handle this by not starting the ACME background thread on non-active nodes, while also checking within the scheduling loop and breaking out. That will force a re-reading of the Closing channel that should have been called by the PKI plugin's Cleanup method.
  • If a node is stepped down from being the active node while it is actively processing a verification, we could get into an infinite loop due to an ErrReadOnly error attempting to clean up a challenge entry

  • Add a maximum number of retries for errors around attempting to decode,fetch challenge/authorization entries from disk. We use double the number of "normal" max attempts for these types of errors, than we would for normal ACME retry attempts to avoid collision issues. Note that these additional retry attempts are not persisted to disk and will restart on every node start

  • Add a 1 second backoff to any disk related error to not immediately spin on disk/io errors for challenges.


Overview of commits

@hc-github-team-secure-vault-core hc-github-team-secure-vault-core force-pushed the backport/stevendpclark/vault-20315-acme-active-node-change/highly-flowing-mastiff branch from b8b4339 to 7cbe83f Compare September 26, 2023 17:59
@github-actions github-actions bot added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label Sep 26, 2023
@stevendpclark stevendpclark self-assigned this Sep 26, 2023
@stevendpclark stevendpclark added this to the 1.14.5 milestone Sep 26, 2023
@stevendpclark stevendpclark enabled auto-merge (squash) September 26, 2023 18:02
@github-actions
Copy link

Build Results:
All builds succeeded! ✅

@stevendpclark stevendpclark merged commit 46d9c55 into release/1.14.x Sep 26, 2023
@stevendpclark stevendpclark deleted the backport/stevendpclark/vault-20315-acme-active-node-change/highly-flowing-mastiff branch September 26, 2023 18:18
@github-actions
Copy link

CI Results:
All Go tests succeeded! ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants