-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option 'elide_list_responses' to audit backends #18128
Conversation
This PR relates to a feature request logged through HashiCorp commercial support. Vault lacks pagination in its APIs. As a result, certain list operations can return **very** large responses. The user's chosen audit sinks may experience difficulty consuming audit records that swell to tens of megabytes of JSON. In our case, one of the systems consuming audit log data could not cope, and failed. The responses of list operations are typically not very interesting, as they are mostly lists of keys, or, even when they include a "key_info" field, are not returning confidential information. They become even less interesting once HMAC-ed by the audit system. Some example Vault "list" operations that are prone to becoming very large in an active Vault installation are: auth/token/accessors/ identity/entity/id/ identity/entity-alias/id/ pki/certs/ In response, I've coded a new option that can be applied to audit backends, `elide_list_responses`. When enabled, response data is elided from audit logs, only when the operation type is "list". For added safety, the elision only applies to the "keys" and "key_info" fields within the response data - these are conventionally the only fields present in a list response - see logical.ListResponse, and logical.ListResponseWithInfo. However, other fields are technically possible if a plugin author writes unusual code, and these will be preserved in the audit log even with this option enabled. The elision replaces the values of the "keys" and "key_info" fields with an integer count of the number of entries. This allows even the elided audit logs to still be useful for answering questions like "Was any data returned?" or "How many records were listed?".
Hi Max, This makes sense, and indeed I was in a discussion just last week re this issue. I wonder if rather than doing it globally on a per audit device basis, it would be better to make it configurable per request path? So just as you can tune a mount to specify keys in the request/response object that shouldn't be hmac'd, maybe you could tune a mount to specify LIST endpoints whose responses should be suppressed. I'm not asking you to make this change at this point, this is just me thinking out loud. It's not mutually exclusive with your global approach anyway, maybe we want both. |
That's an interesting question... In one way, it would make things easier for me, in that I'm already dreading how we manage to deploy this in production, even if it's merged ... because currently Vault has no way to reconfigure an existing audit device. That's a problem, since toggling this new option would require fully removing the existing audit device and re-adding it. This causes two bad things to happen: 1) loss of audit data whilst the removal and addition is ongoing; 2) loss of the previous audit salt, so old audit logs cannot have values looked up via the sys/audit-hash/ API. (I have been wondering whether it would be practical to add live reconfiguration to audit devices...) However, there are other reasons I think setting this at the audit device level is the better:
I do agree that more flexibility is often a good thing - people do tend to come up with use-cases that beyond what designers originally imagined - but if this was to be able to be set in multiple places, we'd have to figure out overriding/inheritance semantics, which could make it complex to understand how to use. Ultimately, I think the reason that a simple boolean per audit device could be sufficient in this case, is because |
I'm persuaded. We can leave finer-grained configuration for a future iteration if needed.
We're open to the idea of this, i.e. I don't think there's any reason why it hasn't been done until now other than no one's asked for it. Should be its own PR of course, and since it will require changes on the enterprise repo side, we'd have to do some work even if you got the ball rolling, but I don't think that'll be a problem, it just may not happen as quickly as one might like. |
Just a note, but in the PKI plugin, not only would this be a good idea (as cert lists can get long), but finer grained config would be good too: we recommend un-HMACing the However, various endpoints like |
@cipherboy Perhaps we should deprecate those endpoints in the PKI secrets engine and add replacements that are separate from the |
@maxb There are already replacements, under the newer I fear that any automation which pulls from |
@maxb is attempting to deploy a commit to the HashiCorp Team on Vercel. A member of the Team first needs to authorize it. |
Well, I meant, deprecate and never remove, similar to how there's no plan that I know of to ever remove Back on topic for this PR, I've written tests... and, ahem, fixed the bug they revealed. I'll work on the documentation update next. |
Also fix existing documentation which erroneously claimed the `mode` option was usable with the `socket` and `syslog` backends.
Documentation added. I believe this PR is now ready for review and merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, I have no issues with the code, and I appreciate the nice tests and docs. We can merge as is, but I have one suggestion for you to consider first: maybe rather than replace the values for keys
and key_info
, it would be better to suppress them and introduce a new field, e.g. keys_count
. Rationale: some folks might prefer if the type of these fields didn't change from list to int when the new config option is toggled, potentially breaking audit log parsing or ingestion.
Thanks for the feedback! Let me explain why I did it this way, and then you can tell me which trade-off you would prefer to prioritise. Currently, there is a loose convention that every list response will contain a If I was to write a new field I also quite like the appeal of still being able to tell whether a particular response contained a I do like the algorithmic simplicity of the current approach, but I'm OK with swapping it out for something a bit more complex and verbose, if you feel it's going to confuse some users too much. |
That's fair, and it's certainly a concern I had too in proposing this. In my mind it seemed unlikely, but the future is always uncertain.
If we go this route, I'd say overwriting keys_count is a nonstarter, skipping elision is an option, and putting it outside the data map would be my first choice.
No, I'm persuaded. The concern I have may not arise (people may not care enough about list responses to be digesting them in their pipelines), the new behaviour you're introducing is opt-in, so people using it can always adapt their ingestion code/schemas, and I don't want to impose extra complexity before we know that it's warranted. |
This PR relates to a feature request logged through HashiCorp commercial support. Vault lacks pagination in its APIs. As a result, certain list operations can return **very** large responses. The user's chosen audit sinks may experience difficulty consuming audit records that swell to tens of megabytes of JSON. In our case, one of the systems consuming audit log data could not cope, and failed. The responses of list operations are typically not very interesting, as they are mostly lists of keys, or, even when they include a "key_info" field, are not returning confidential information. They become even less interesting once HMAC-ed by the audit system. Some example Vault "list" operations that are prone to becoming very large in an active Vault installation are: auth/token/accessors/ identity/entity/id/ identity/entity-alias/id/ pki/certs/ In response, I've coded a new option that can be applied to audit backends, `elide_list_responses`. When enabled, response data is elided from audit logs, only when the operation type is "list". For added safety, the elision only applies to the "keys" and "key_info" fields within the response data - these are conventionally the only fields present in a list response - see logical.ListResponse, and logical.ListResponseWithInfo. However, other fields are technically possible if a plugin author writes unusual code, and these will be preserved in the audit log even with this option enabled. The elision replaces the values of the "keys" and "key_info" fields with an integer count of the number of entries. This allows even the elided audit logs to still be useful for answering questions like "Was any data returned?" or "How many records were listed?".
This PR relates to a feature request logged through HashiCorp commercial support: https://app.asana.com/0/112957393038545/1201850823939444/f
Vault lacks pagination in its APIs. As a result, certain list operations can return very large responses. The user's chosen audit sinks may experience difficulty consuming audit records that swell to tens of megabytes of JSON.
In our case, one of the systems consuming audit log data could not cope, and failed.
The responses of list operations are typically not very interesting, as they are mostly lists of keys, or, even when they include a "key_info" field, are not returning confidential information. They become even less interesting once HMAC-ed by the audit system.
Some example Vault "list" operations that are prone to becoming very large in an active Vault installation are:
In response, I've coded a new option that can be applied to audit backends,
elide_list_responses
. When enabled, response data is elided from audit logs, only when the operation type is "list".For added safety, the elision only applies to the "keys" and "key_info" fields within the response data - these are conventionally the only fields present in a list response - see logical.ListResponse, and logical.ListResponseWithInfo. However, other fields are technically possible if a plugin author writes unusual code, and these will be preserved in the audit log even with this option enabled.
The elision replaces the values of the "keys" and "key_info" fields with an integer count of the number of entries. This allows even the elided audit logs to still be useful for answering questions like "Was any data returned?" or "How many records were listed?".
There are obviously a few things missing in this PR - changelog, documentation, tests.For now, the intent is to start a conversation with HashiCorp about the proposed implementation, and get that validated before filling in the rest.