-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A69: Certificate Revocation List Enhancements #382
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, suggested a bunch of changes related to wording / moving statements around
AXX-crl-enhancements.md
Outdated
Per RFC5280, during revocation checking one of three status should be returned - `RevocationUnrevoked`, `RevocationRevoked`, and `RevocationUndetermined`. The user must specify whether `RevocationUndetermined` should be treated as `RevocationRevoked` or `RevocationUnrevoked` (failing open vs. failing closed). | ||
|
||
Assumptions: | ||
1. There should be no direct linkage between this interface and the creation/distribution of CRLs. gRPC is a user of CRLs and credentials, not a PKI itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add some 'high level overview' section (maybe as a comment to a diagram?) saying that existing impls use map as an in-memory storage and a provider to update it using some IOs. PKIs are completely out of scope of this gRFC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also let's add an assumption that we can't guess CRL provider by a name of a file / metadata of strings. However if someone wants to rely on it they can do a specific impl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the second
For the first, this document has a structure defined by a template. I could have a High Level Overview sub-section in the Proposal section.
existing impls use map as an in-memory storage and a provider to update it using some IOs
I'm a little confused at this comment - the existing CRL impls don't do this?
AXX-crl-enhancements.md
Outdated
If a CRL is not found for an issuer this is fine - the verification code will return RevocationUndetermined per RFC5280, then the user will have a setting to fail-open or fail-closed indicating whether RevocationUndetermined should be treated as Unrevoked or Revoked respectively. | ||
|
||
#### API Outcomes and Error Handling | ||
Because CRLs involve reading/writing from the filesystem, we will have to deal with potential edge cases of bad files, bad updates, etc. From a high-level perspective, we are going to follow the Authz Policy design patterns - use existing information in case an update contains bad data. Startup behavior is not different from updates. Particularly in the case of updating CRLs, we don't want to error and crash the server if there is a bad update. However, we still want this error to be known, so we will have an optional and overridable error callback when CRLs fail to be read. This will let users tie in whatever alerting/monitoring they may want in these failure cases. This callback will be for notifications/alerting, it will not be a decision-making callback (it is expected the IO for CRLs will already be done in the background, so these errors would not be on the main path by design anyways). Users should use `RevocationUndertermined` combined with the FailOpen/FailClosed knob for decision making on uncertain CRLs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add a link to Authz gRFC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, and added to the related proposals at the top as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor things
A69-crl-enhancements.md
Outdated
|
||
Users can also implement the interface with whatever behaviors they specifically desire. | ||
|
||
Following [RFC5280](https://datatracker.ietf.org/doc/html/rfc5280), during revocation checking one of three status should be returned - `RevocationUnrevoked`, `RevocationRevoked`, and `RevocationUndetermined` ([RFC5280 defines `Undetermined` in the CRL Processing section](https://datatracker.ietf.org/doc/html/rfc5280#section-6.3.3). We ). There are many reasons a certificate can be revoked, but in the end gRPC cares about a certificate being revoked or not, thus we differentiate between `RevocationUnrevoked` and `RevocationRevoked`. The user must specify whether `RevocationUndetermined` should be treated as `RevocationRevoked` or `RevocationUnrevoked` (failing open vs. failing closed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some pending leftovers
We ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments on top of @erm-g's. But looks good.
@markdroth, @dfawley, can y'all take a look?
The basic APIs for the CRL Reloading features. This adds external types to represent CRL Providers, CRLs, and CertificateInfo. Internally we will use `CrlImpl` - this layer is needed to hide OpenSSL details from the user. GRFC - grpc/proposal#382 Things Done * Add external API for `CrlProvider`, `Crl`, `CertInfo` (`CertInfo` is used during CRL lookup rather than passing the entire certificate). * Add code paths in `ssl_transport_security` to utilize CRL providers * Add `StaticCrlProvider` * Refactor `crl_ssl_transport_security_test.cc` so it is more extensible and can be used with providers
This adds the directory reloader implementation of the CrlProvider. This will periodically reload CRL files in a directory per [gRFC A69](grpc/proposal#382) Included in this is the following: * A public API to create the `DirectoryReloaderCrlProvider` * A basic directory interface in gprpp and platform specific impls for getting the list of files in a directory (unfortunately prior C++17, there is no std::filesystem, so we have to have platform specific impls) * The implementation of `DirectoryReloaderCrlProvider` takes an event_engine and a directory interface. This allows us to test using the fuzzing event engine for time mocking, and to implement a test directory interface so we avoid having to make temporary directories and files in the tests. This is notably not in `include`, and the `CreateDirectoryReloaderCrlProvider` is the only way to construct one from the public API, so we don't expose the event engine and directory details to the user. --------- Co-authored-by: gtcooke94 <[email protected]>
) This adds the directory reloader implementation of the CrlProvider. This will periodically reload CRL files in a directory per [gRFC A69](grpc/proposal#382) Included in this is the following: * A public API to create the `DirectoryReloaderCrlProvider` * A basic directory interface in gprpp and platform specific impls for getting the list of files in a directory (unfortunately prior C++17, there is no std::filesystem, so we have to have platform specific impls) * The implementation of `DirectoryReloaderCrlProvider` takes an event_engine and a directory interface. This allows us to test using the fuzzing event engine for time mocking, and to implement a test directory interface so we avoid having to make temporary directories and files in the tests. This is notably not in `include`, and the `CreateDirectoryReloaderCrlProvider` is the only way to construct one from the public API, so we don't expose the event engine and directory details to the user. --------- Co-authored-by: gtcooke94 <[email protected]>
) This adds the directory reloader implementation of the CrlProvider. This will periodically reload CRL files in a directory per [gRFC A69](grpc/proposal#382) Included in this is the following: * A public API to create the `DirectoryReloaderCrlProvider` * A basic directory interface in gprpp and platform specific impls for getting the list of files in a directory (unfortunately prior C++17, there is no std::filesystem, so we have to have platform specific impls) * The implementation of `DirectoryReloaderCrlProvider` takes an event_engine and a directory interface. This allows us to test using the fuzzing event engine for time mocking, and to implement a test directory interface so we avoid having to make temporary directories and files in the tests. This is notably not in `include`, and the `CreateDirectoryReloaderCrlProvider` is the only way to construct one from the public API, so we don't expose the event engine and directory details to the user. --------- Co-authored-by: gtcooke94 <[email protected]>
A69-crl-enhancements.md
Outdated
type CrlDirectoryReloadingProvider struct { | ||
ReloadInterval time.Duration | ||
CrlDirectory string | ||
// maps issuer hash to crl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Tabs vs spaces. Make this a tab?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall design looks good! My comments here are mostly just details that need to be spelled out a little more clearly.
Please let me know if you have any questions. Thanks!
1. We cannot know users' PKI details, so our implementations should not enforce non-X509 metadata mattering (for example, file names). A user could certainly implement their own provider that cares about filenames. | ||
1. We will not support OSCP-style checking (OSCP is an alternative to CRLs, not another form of CRLs). | ||
|
||
![Basic Flow Diagram](A69_graphics/basic_diagram.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also include the source form (e.g., SVG files) for these diagrams, so that we can modify them in future if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added SVG files
A69-crl-enhancements.md
Outdated
This is a simple provider that takes in all CRLs to be used as raw strings during initialization, then returns them in the appropriate format when called. The CRLs will be stored in a mapping of a hash of the issuer to the CRL content. | ||
|
||
### Provider Implementation - Directory Provider | ||
This provider will periodically read CRL files in a given directory and update gRPCs internal representation of those CRLs. We expect this will be heavily used, as a directory of CRLs is very common for X509 CRL files. The CRLs will be stored in a mapping of a hash of the issuer to the CRL content - `map<issuer_hash, crl_object>`. The initial read of the directory and filling of this map will be synchronous during server startup when the provider is created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth explicitly stating that the reason that the initial read is synchronous is to ensure that the CRL data is already loaded before the server starts up, so that we don't delay handshakes while waiting for the CRLs to load. We should explicitly state that the creation does not fail if there is an error in this initial load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added text
// Implementation for getting and storing CRLs | ||
``` | ||
|
||
Example feature use: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extend this example to show setting the fail open/closed knob, so that it's clear that this knob is intended to be part of the TlsCreds API, not part of a specific CRL provider implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added opts.FailRevocationUndetermined = false
A69-crl-enhancements.md
Outdated
``` | ||
<pseudo code in the handshaker when verifying cert> | ||
|
||
RevocationStatus GetRevocationStatus(cert, crlProvider, settings) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest changing this pseudo-code as follows:
- Rename
settings
toopts
, to clarify that this is the same options structure created in the previous pseudo-code block. - Instead of passing in
crlProvider
as a separate parameter, useopts.provider
, since that's where we set the provider in the previous pseudo-code block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
A69-crl-enhancements.md
Outdated
|
||
In the case that the directory and all files within are read completely during an update, we will treat the directory as the exact truth. This means if a CRL was in-memory and is no longer in the directory after the update, it will be removed in-memory. | ||
|
||
In the case that there is an issue reading a file, we cannot be certain what exactly is happening. In particular, we enforce no naming convention on the CRL files, so if it cannot be read, we cannot know what it contains or what issuer it belongs to. We will do a best-effort safe update - for files which are read correctly, we can update those individual entries. Since we can't know the issuer if a file can't be read, it is unsafe to do deletion in this case. An overridable error callback will be called so the user can receive a signal that something has gone wrong with CRLs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explicitly state that this error callback will be passed as a parameter when instantiating the directory-based provider, so that it's clear that this is not part of the TlsCreds API itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
A69-crl-enhancements.md
Outdated
Example behavior for a directory reloader: | ||
|
||
At startup we read 3 CRL files (A, B, C) and construct a map as described above. During the handshakes, we return the CRLs from the map. | ||
Then, a bad update happens and CRL C ends up corrupted, but A and B are good. When this directory is reloaded, the resulting map will have the newer A and B CRL, but the older C CRL (as this will not see any update). In addition, the CrlReadErrorCallback would be invoked when CRL C fails to be read. There is no linkage between the map's CRL C and the fact that there was a read failure. It is the user's responsibility to act on file issues after being notified via the error callback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text "but the older C CRL (as this will not see any update)" is confusing. Suggest changing that to something like "but it will also include the original C CRL, since we do not delete entries from the map when there is a failure reading any of the files in the directory".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed, PTAL
If a CRL is not found for an issuer this is fine - the verification code will return RevocationUndetermined per RFC5280, then the user will have a setting to fail-open or fail-closed indicating whether RevocationUndetermined should be treated as Unrevoked or Revoked respectively. | ||
|
||
|
||
#### API Outcomes and Error Handling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to specify the behavior of this callback a little more precisely:
- The callback will be invoked at most once per update pass. If there are errors reading more than one file in a given pass, all of those errors will be returned in a single invocation of the callback.
- The callback will be passed only one parameter, which is an indication of the error in a language-specific way (e.g.,
absl::Status
in C++). The error text should include the list of files for which there were errors and what those errors were, but that information will not be passed in a structured way for the callback to understand those individual underlying errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
A69-crl-enhancements.md
Outdated
|
||
This approach adds complexity to the user facing API, and a user overriding the provider interface poorly could result in undefined or inefficient behaviors. To counteract this, we will provide generally useful implementations and good documentation on requirements if a user is writing their own implementation. | ||
|
||
We explicitly will not be supporting OSCP-style revocation . We want to stay inside the realm of RFC5280 and X509-style CRLs, and OSCP represents a departure from that standard. Notably, the API surface for OSCP is different - for OSCP one makes a call to an external service that returns the revocation status. Rather, we expect to be given CRLs and will check the revocation status in gRPC. In addition, OSCP requires external calls during the handshake, whereas with X509 CRL files everything can done locally. Handshakes are on a path for which performance in critical, so making external calls during them is an approach to avoid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In "performance in critical", change "in" to "is".
Also, I would say "important" instead of "critical" -- we do care about it, but handshakes are not on the per-call path, so their performance is not truly what I would consider critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
A69-crl-enhancements.md
Outdated
|
||
> "sets the function to get the crl for a given certificate x. When found, the crl must be assigned to *crl. This function must return 0 on failure and 1 on success. If no function to get the issuer is provided, the internal default function will be used instead." | ||
|
||
We will set this function to get the CRL from the CRLProvider if it is configured. We will add a `grpc::experimental::CrlProvider` that can be configured in `grpc::experimental::TlsChannelCredentialOptions`. This value and `set_crl_directory` cannot both be configured. The Provider will be configured in these options by the user and supplied as a pointer to grpc. Its lifetime needs to be managed by the user, and any handshake happening with a provider configured should share this provider, so the user can configure this to be shared across many connections if desired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The statement "Its lifetime needs to be managed by the user" is inconsistent with the fact that we're passing it in as a shared_ptr<>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed that statement
* [IdentityCertificateOptions](https://github.com/grpc/grpc-go/blob/2aa261560586eab6795301a3670d9dfdd7308625/security/advancedtls/advancedtls.go#L115) | ||
* [certificateListExt](https://github.com/grpc/grpc-go/blob/2aa261560586eab6795301a3670d9dfdd7308625/security/advancedtls/crl.go#L88) | ||
|
||
### C/C++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section is probably a little too detailed. I think the main things we need to cover here are:
- APIs that we're adding, both at the C++ and C-core layers.
- A brief description of where the functionality will be implemented (i.e., the provider and the fail open/closed option will be passed down from TlsCreds to the security connector and from there to the TSI layer, where they will be used when interacting with the SSL library).
I don't think we need to cover the deeper implementation details in this doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified implementation specifics significantly, and added text about us doing the CRL checks relating to RFC 5280 6.3.3
No description provided.