-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-validation & true async issuance #17
Conversation
This commit splits the operation of the CA and VA across multiple go-routines instead of doing all work on the WFE's handler routine. The VA now performs multiple validations, sleeping a random amount of time between each. This forces the ACME client to poll the authorizations to learn when they have switched from pending to invalid or valid. All validation attempts must succeed for the authorization to be valid. The WFE no longer communicates directly with the CA, instead when the VA updates the last authorization on an order to valid status it invoke's the CA's CompleteOrder function in a separate goroutine. This will complete the order & update it with a certificate URL.
Note: CI is going to fail until Chisel is updated to handle multiple validations & async issuance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mention locking for orders in the comments below. I think we also need locking for Authzs and Challenges. I noticed you flagged a race condition with a TODO, but I think it's worth handling the race conditions now, since they're hard to debug when they actually happen. Better to add the concurrency safety while we're thinking about it.
ca/ca.go
Outdated
} | ||
|
||
if order.Status != acme.StatusPending { | ||
ca.log.Printf("Error: Asked to complete orrder %s is not status pending, was status %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*order %s, which is not
ca/ca.go
Outdated
|
||
ca.log.Printf("Order %s is fully authorized. Ready to issue", order.ID) | ||
// Update the order to reflect that we're now processing it | ||
order.Status = acme.StatusProcessing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to hold a lock on the order.
ca/ca.go
Outdated
order.Status = acme.StatusProcessing | ||
|
||
csr := order.ParsedCSR | ||
domains := make([]string, len(csr.DNSNames)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to make a copy?
core/types.go
Outdated
@@ -19,6 +19,7 @@ type Order struct { | |||
ParsedCSR *x509.CertificateRequest | |||
ExpiresDate time.Time | |||
AuthorizationObjects []*Authorization | |||
CertPathPrefix string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than embedding a public-facing path (WFE's responsibility) in the internal Order object, let's embed a certificate ID, and make WFE responsible for rendering that into a certificate path.
va/va.go
Outdated
va.log.Printf("Pulled a task from the Tasks queue: %#v", task) | ||
va.log.Printf("Starting %d validations.", concurrentValidations) | ||
|
||
var wg sync.WaitGroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This effectively serializes the tasks; If one authorization is validating, another authorization can't be validating at the same time. I think that might potentially mask some client bugs. Also, this loop seems a little long, and I think the goto may not be necessary.
What if instead, this was a very short loop:
for task := range va.tasks {
go process(task)
}
And the bulk of this function could be moved into process(..)
.
va/va.go
Outdated
// concurrent validations into a slice for processing. | ||
for i := 0; i < concurrentValidations; i++ { | ||
va.log.Printf("Reading a result from task.Results") | ||
result := <-task.Results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we wait on the WaitGroup above, task.Results
is guaranteed to be exactly filled, and there is guaranteed to be no concurrent access. So there's no reason to make it a channel. It can just be a slice, and then you can range
over it. Alternately you could get rid of the WaitGroup and use just the channel, which would allow you to surface errors without waiting for all validations to come in. Basically, we need just one of these concurrency primitives here.
Also, I'd recommend factoring this out into a firstError
function or similar, that returns either an error or nil. That way you can call that function and have an if/else that very clearly handles both the error and non-error cases in one place.
if err != nil { | ||
return err | ||
result.Error = err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think switching this function from one that returns an error, to one that mutates its input to set an error, is a step backwards in clarity. We had similar code in Boulder, but moved towards the "return an error" approach.
Stepping back a bit, this is related to the choice to have a vaTask
contain both input and output. What if we defined vaTask
to be only the input to a task, and the output from a task to be a ValidationRecord? Or in other words, treat vaTask as immutable. This might be an argument to use a chan *ValidationRecord
in place of a WaitGroup
. The wg
argument to performValidation
could be replaced with a chan
, and the switch statement could do ch <- va.validateHTTP01(task)
.
Based on review feedback this commit updates the CA to stash a certificate ID in the internal order object instead of a path prefix. The WFE then constructs a certificate URL for the certificate based on the ID in the Order() handler. This keeps public facing paths as a WFE responsibility.
This commit updates the core internal types to embed RWMutex's to protect against concurrent reads/updates. The relevant wfe, db, va, and ca functions were updated to acquire read/write locks as appropriate. This commit also reworks parts of the CA and VA to have shorter, easier to read functions. The VA is updated to not serialize the validation work as it reads vaTasks off of a channel. The WaitGroup was removed and results from validation attempts are processed as they occur.
@jsha I believe all of your feedback is addressed, can you give this PR another pass when you have a chance? Thanks! |
ca/ca.go
Outdated
authz.RUnlock() | ||
} | ||
|
||
order.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to lock earlier, before you fetch order.AuthorixationObjects, Also I think a full lock would be fine; we're not worried about performance here, and there's no deadlock possibility, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch on locking too late here. Thanks.
ca/ca.go
Outdated
// Update the order to valid status and store a cert ID for the wfe to use to | ||
// render the certificate URL for the order | ||
order.Status = acme.StatusValid | ||
order.CertID = cert.ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to store a copy of cert.ID
? Why not use it directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Why not use it directly" -> You mean a pointer to the certificate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah; I wrote this comment before the earlier one. Is the certificate stored in memory separately from the order? If so, then yes a pointer to the certificate would be good. Otherwise, DER should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, should have read the latest diffs before replying. This looks great.
ca/ca.go
Outdated
order.Status = acme.StatusValid | ||
order.Certificate = order.CertPathPrefix + cert.ID | ||
ca.log.Printf("Order %s has Certificate %s", order.ID, order.Certificate) | ||
order.CertID = cert.ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this field is only used to check whether a certificate is done, maybe it would be better to create a field that contains the cert DER, since we'll actually need that later. Otherwise, it's a bit confusing why there are two "ID" fields.
Per review feedback its strange to have two IDs when one is just used to indicate that an order is complete. Instead the internal Order type should hold a pointer to the internal Certificate type. The ID and DER encoding can be read from that as required for delivering the order.
Per review feedback the `order.RLock()` on L38 was happening too late, after the loop over the `order.AuthorizationObjects`. This commit moves the lock earlier & simplifies by acquiring one write lock for the duration of the function vs a read lock and a subsequent write lock.
@jsha Another round of feedback applied - thanks! |
Updates the README
5491a24 updates the README to include Chisel usage instructions
& reference the current-most draft.
Adds multi-validation and true async issuance.
7d615b5 splits the operation of the CA and VA across multiple
go-routines instead of doing all work on the WFE's handler routine.
The VA now performs multiple validations, sleeping a random amount of
time between each. This forces the ACME client to poll the
authorizations to learn when they have switched from pending to invalid
or valid. All validation attempts must succeed for the authorization to
be valid.
The WFE no longer communicates directly with the CA, instead when the VA
updates the last authorization on an order to valid status it invokes
the CA's CompleteOrder function in a separate goroutine. This will
complete the order & update it with a certificate pointer. The WFE uses
this pointer to construct the certificate URL as required.