-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panic after upgrade to 1.2.4. #51
Comments
Glancing at the code, there is nothing to make me believe this was due to the 1.2.4 upgrade. It is merely the last major thing to happen to this cluster. It seems that getRoleSet can return |
@kalafut: thank you for the quick fix. Could you elaborate on this code path? Just trying to understand what happened. |
@mberhault Thanks for the bug report! This occurred in the WAL (write-ahead log) system, so it's a bit tough to say what the root cause was. Possibly some operation involving account keys crashed/paniced earlier (but after the WAL was written), and the roleset of that operation was deleted before the WAL rollback operation ran. Was there any other panic prior to this one? Another path was that after the WAL was written, there was a GCP-related error such that the WAL wasn't able to be removed. |
We had some GCS unavailability on Monday, then the minor upgrade yesterday, but I don't recall seeing a panic in quite some time. |
If your storage is on GCS then it could definitely be related, and panic-free. e.g. one "success" sequence could be like:
There are other ways too, but if GCS was having issues then this isn't unexpected. Normally the WALs would just get cleaned up, but the bug you found prevented it in this case. |
Makes sense. Thank you for the explanation. |
We just upgraded our 3 node cluster (backed by GCS) and received the following panic in the gcp secrets engine about 4 hours later:
Prior to this, we were running 1.2.3 for about a week and 1.2.2 for two months.
The secrets engine has been there for quite some time. It currently has over 1K rolesets defined (this is intentional, we make use of a lot of projects and create two rolesets per project).
The last request to this node in the logs was 3 minutes before the crash on a completely different secrets engine.
The text was updated successfully, but these errors were encountered: