-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add implementation details for DELETE #14
Conversation
b98611e
to
dad4c6d
Compare
2. (Optional) If the keymanager keeps a database of known keys on disk (i.e. metadata), | ||
remove or disable the key in this database. Disabling the key first may | ||
be useful for certain storage schemes, to provide crash safety. | ||
This is implementation dependent. | ||
|
||
If the delete/disable fails, set the key's status to `error` with the | ||
reason for the failure and continue to the next key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stuff about disabling/deleting may be too Lighthouse specific to include here. I acknowledge that not all clients will have a separate on-disk record as we do, nor the same semantics. The reason we have to do the disable-then-delete dance is:
- Lighthouse will attempt to load any key listed as enabled in the metadata DB, so we have to delete from the metadata DB before deleting the keystore file to prevent Lighthouse from trying to load a deleted keystore in the case where the keystore was deleted but the call crashed or was aborted before the metadata entry could be deleted.
- On the other hand, we can't just delete the metadata and then the keystore because Lighthouse will reload keystores that are found on disk and re-add them to the metadata DB. In case of a crash before deleting the keystore, the key may remain active (although no response would have been sent in this case, so this would technically be OK, but we err on the side of caution).
Happy to remove these sections if they're deemed too complicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
marking it optional is fine i think - this is a flow guideline and it's never going to cover everyones implementation...
The more important thing I was hoping to capture is basically - batch process first, export slashing data second, and the whole idea of failure conditions...
The rest is a guide and concrete examples like you've added certainly make it a lot clearer...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is very helpful regardless
- If the export fails, set the final status for all provisionally `deleted` or `not_found` | ||
keys to `error`, with the `error` message describing the slashing protection failure. | ||
The response should include a slashing protection object with valid `metadata` and | ||
`data == []`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm uncertain whether a 500 response is preferable to this. Currently Lighthouse will send a 500 in this case, which is simpler than having to craft an empty slashing_protection
object with valid metadata. However I think returning a 200 with statuses is more consistent.
Would appreciate your thoughts on this @rolfyone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So i think given that it's a batch api, exporting slashing data isn't great, but its still a 200 with statuses of removal of keys, potentially without slashing protection data populated, otherwise corrective action via list would be needed to even know what keys got removed...
I think for delete given we have to process the batch first, the only non 200 response I can see at the moment is when the request body is invalid or empty...
It would mean they couldn't get the slashing data, but its arguably more important that they know they've deleted a bunch of keys at a bare minimum...
My hope is the slashing protection data export could be fairly robust, and exporting as many as can be found is better than erroring out of that part of the process...
If no keys are found during that export, i'd probably not include the key in the response object...
{
data: []
}
I think this description pretty much sells to me that slashing_protection in the response has to be optional, otherwise we may have to export empty slashing exports, which would be sub-optimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing that may be worth a thought/comment about.
the rest of the logic here is order important, but slashing protection data isn't. I'd be applying set logic to the keys being passed in, we only need 1 set of slashing export data per key, but we still need to apply the 'not_active' logic to all the duplicates if that makes sense...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think I'm coming around to the "always return 200 unless the request is completely malformed approach" as well.
still a 200 with statuses of removal of keys, potentially without slashing protection data populated, otherwise corrective action via list would be needed to even know what keys got removed...
It would mean they couldn't get the slashing data, but its arguably more important that they know they've deleted a bunch of keys at a bare minimum...
I'm not sure if I'm understanding this correctly, but it sounds like you're saying in the case of slashing protection export error we should still return deleted
as the status and just omit each failed key from the slashing_protection
data. I think this is unsafe because it looks identical to a successful delete for an inactive key (i.e. one with no slashing protection data). I think if there's an error during slashing protection export we have to convey that to the caller, and the only mechanism we have for doing that is the error status (or a 500 error code). I.e. the semantics should be "if slashing protection export explicitly failed for a key, then its status is error
regardless of whether it was actually deleted". In the case of a batch export failing, then it sets an error
status for all keys that hadn't already errored. This is unfortunate in that it leaves the system in a quasi unknown state, but as you said, we should make sure that slashing protection export is as robust as possible so that this doesn't occur often.
These semantics are also more consistent with an implementation that wants to export slashing protection one-at-a-time for each key, where each export may fail or succeed independently.
I think this description pretty much sells to me that slashing_protection in the response has to be optional, otherwise we may have to export empty slashing exports, which would be sub-optimal
I think given the above, the slashing protection data with data == []
is an OK compromise. The metadata can be pulled out of thin air and doesn't pose any risk.
- The flow above is only one example of a conformant implementation, implementations are free to | ||
implement the API in any way they wish while remaining compatible. E.g. if a keymanager stores all | ||
key data in a single database and is able to perform an atomic delete for all keys, then it may do | ||
this. Likewise if an implementation would like to export slashing protection data for each key one | ||
at a time and then merge these into the final response, then it may do this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good disclaimer...
The main thing to take away from these docs is more like
- order of processing for delete means batch first
- order of processing for insert means batch LAST.
this has huge implications for erroring out, and that's what is particularly helpful to align on, without being overly perscriptive of delete/disable, except to say that if a key is deleted, it is no longer able to be retrieved by the validator client that previously had access to the keystore...
I also think fleshing this out has made a few implementation details a lot clearer, so it's been a good exercise.
- If the export fails, set the final status for all provisionally `deleted` or `not_found` | ||
keys to `error`, with the `error` message describing the slashing protection failure. | ||
The response should include a slashing protection object with valid `metadata` and | ||
`data == []`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with the slashing_protection field being optional - it does make sense if the data is empty to have none passed rather than requiring an empty box with metadata...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think it's at the point where we merge it and maybe tweak anything we need to as we go
Add a flow for deleting keys that draws on experience with Lighthouse's implementation (see: sigp/lighthouse#2736).