-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix early rotation for roles with WALs, handle multiple WALs per role #28
Conversation
* Respect previous WAL's new password * Ensure only 1 WAL per role * Reinstate finding a WAL => rotate immediately * Remove patch check that would stop manual rotations when popping from the queue * Add a warning to manual rotation response when rotation not immediately successful * Remove re-storing of mount config when rotating unrelated roles * Discard all WALs with a previous rotation time of 0 * Remove deleted WAL IDs from queue items * Delete unused struct fields
19f32ab
to
e3eb497
Compare
Rebased against |
Can we consider renaming the function |
* Take exclusive lock before reading config * Validate input before creating output struct * Comment updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of smaller follow-up comments, but the overall implementation looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor things, otherwise the changes look good
merr = multierror.Append(merr, err) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we want to do the WAL delete before the role deletion, that way we have an avenue to retry the operation if the WAL cleanup fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried implementing this, but I think I'd prefer to leave WAL cleanup as a best effort, because although it has its own consistency problems, the other order does as well. If we delete WALs before the role and fail, we'll need to push the item back onto the queue, which itself could fail. In practice though, deleting roles and WALs both rely on the same storage layer, so at least these partial failure scenarios should be very rare.
We could address the inconsistency of leaving WALs lying around fairly robustly by keying WALs on a UUID whose lifetime is attached to a role's storage entry instead of keying WALs on the role name. That obviously wouldn't be a backport candidate though. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, let's leave it as is for now and revisit any improvements in a subsequent PR
* Add more WAL logging * Fix early rotation for roles with WALs, handle multiple WALs per role * Respect previous WAL's new password * Ensure only 1 WAL per role * Add a warning to manual rotation response when rotation not immediately successful * Remove re-storing of mount config when rotating unrelated roles * Discard all WALs with a previous rotation time of 0 * Remove deleted WAL IDs from queue items * Delete unused struct fields * Switch from warning to error to correct HTTP status code from 400 -> 500 * Delete WALs on failed role creation or role deletion * Take exclusive lock before reading config * Fix manual rotate not respecting WAL ID * Add tests for processing of stored WALs * Remove unnecessary multierror * Add last check on newPassword
* Add more WAL logging * Fix early rotation for roles with WALs, handle multiple WALs per role * Respect previous WAL's new password * Ensure only 1 WAL per role * Add a warning to manual rotation response when rotation not immediately successful * Remove re-storing of mount config when rotating unrelated roles * Discard all WALs with a previous rotation time of 0 * Remove deleted WAL IDs from queue items * Delete unused struct fields * Switch from warning to error to correct HTTP status code from 400 -> 500 * Delete WALs on failed role creation or role deletion * Take exclusive lock before reading config * Fix manual rotate not respecting WAL ID * Add tests for processing of stored WALs * Remove unnecessary multierror * Add last check on newPassword
* Add more WAL logging * Fix early rotation for roles with WALs, handle multiple WALs per role * Respect previous WAL's new password * Ensure only 1 WAL per role * Add a warning to manual rotation response when rotation not immediately successful * Remove re-storing of mount config when rotating unrelated roles * Discard all WALs with a previous rotation time of 0 * Remove deleted WAL IDs from queue items * Delete unused struct fields * Switch from warning to error to correct HTTP status code from 400 -> 500 * Delete WALs on failed role creation or role deletion * Take exclusive lock before reading config * Fix manual rotate not respecting WAL ID * Add tests for processing of stored WALs * Remove unnecessary multierror * Add last check on newPassword
…#31) * Add more WAL logging * Fix early rotation for roles with WALs, handle multiple WALs per role * Respect previous WAL's new password * Ensure only 1 WAL per role * Add a warning to manual rotation response when rotation not immediately successful * Remove re-storing of mount config when rotating unrelated roles * Discard all WALs with a previous rotation time of 0 * Remove deleted WAL IDs from queue items * Delete unused struct fields * Switch from warning to error to correct HTTP status code from 400 -> 500 * Delete WALs on failed role creation or role deletion * Take exclusive lock before reading config * Fix manual rotate not respecting WAL ID * Add tests for processing of stored WALs * Remove unnecessary multierror * Add last check on newPassword
…#30) * Add more WAL logging * Fix early rotation for roles with WALs, handle multiple WALs per role * Respect previous WAL's new password * Ensure only 1 WAL per role * Add a warning to manual rotation response when rotation not immediately successful * Remove re-storing of mount config when rotating unrelated roles * Discard all WALs with a previous rotation time of 0 * Remove deleted WAL IDs from queue items * Delete unused struct fields * Switch from warning to error to correct HTTP status code from 400 -> 500 * Delete WALs on failed role creation or role deletion * Take exclusive lock before reading config * Fix manual rotate not respecting WAL ID * Add tests for processing of stored WALs * Remove unnecessary multierror * Add last check on newPassword
…#29) * Add more WAL logging * Fix early rotation for roles with WALs, handle multiple WALs per role * Respect previous WAL's new password * Ensure only 1 WAL per role * Add a warning to manual rotation response when rotation not immediately successful * Remove re-storing of mount config when rotating unrelated roles * Discard all WALs with a previous rotation time of 0 * Remove deleted WAL IDs from queue items * Delete unused struct fields * Switch from warning to error to correct HTTP status code from 400 -> 500 * Delete WALs on failed role creation or role deletion * Take exclusive lock before reading config * Fix manual rotate not respecting WAL ID * Add tests for processing of stored WALs * Remove unnecessary multierror * Add last check on newPassword
Early rotation of credentials
Bug and repro description
When the plugin finds a WAL for a role on initialisation, it (understandably) assumes it needs to rotate the password for the role ASAP. However, the plugin also assumes there can be at most 1 WAL per role at any time, and that's not true. We can have arbitrarily old and stale WALs in storage generated by failures during manual attempts to set up or rotate a role. These only ever get a chance to be cleaned up one at a time for each initialisation the plugin goes through. As a result, we can see early password rotations occur when the plugin runs initialisation.
To illustrate, here is a scenario to reproduce an early rotation:
Fix
There are a few meaningful changes: