Add implementation details for DELETE #14

michaelsproul · 2021-11-24T06:46:41Z

Add a flow for deleting keys that draws on experience with Lighthouse's implementation (see: sigp/lighthouse#2736).

michaelsproul · 2021-11-25T00:25:40Z

flows/README.md

+   2. (Optional) If the keymanager keeps a database of known keys on disk (i.e. metadata),
+      remove or disable the key in this database. Disabling the key first may
+      be useful for certain storage schemes, to provide crash safety.
+      This is implementation dependent.
+
+      If the delete/disable fails, set the key's status to `error` with the
+      reason for the failure and continue to the next key.


This stuff about disabling/deleting may be too Lighthouse specific to include here. I acknowledge that not all clients will have a separate on-disk record as we do, nor the same semantics. The reason we have to do the disable-then-delete dance is:

Lighthouse will attempt to load any key listed as enabled in the metadata DB, so we have to delete from the metadata DB before deleting the keystore file to prevent Lighthouse from trying to load a deleted keystore in the case where the keystore was deleted but the call crashed or was aborted before the metadata entry could be deleted.

On the other hand, we can't just delete the metadata and then the keystore because Lighthouse will reload keystores that are found on disk and re-add them to the metadata DB. In case of a crash before deleting the keystore, the key may remain active (although no response would have been sent in this case, so this would technically be OK, but we err on the side of caution).

Happy to remove these sections if they're deemed too complicated

marking it optional is fine i think - this is a flow guideline and it's never going to cover everyones implementation...

The more important thing I was hoping to capture is basically - batch process first, export slashing data second, and the whole idea of failure conditions...
The rest is a guide and concrete examples like you've added certainly make it a lot clearer...

I think this is very helpful regardless

michaelsproul · 2021-11-25T00:27:34Z

flows/README.md

+   - If the export fails, set the final status for all provisionally `deleted` or `not_found`
+     keys to `error`, with the `error` message describing the slashing protection failure.
+     The response should include a slashing protection object with valid `metadata` and
+     `data == []`.


I'm uncertain whether a 500 response is preferable to this. Currently Lighthouse will send a 500 in this case, which is simpler than having to craft an empty slashing_protection object with valid metadata. However I think returning a 200 with statuses is more consistent.

Would appreciate your thoughts on this @rolfyone

So i think given that it's a batch api, exporting slashing data isn't great, but its still a 200 with statuses of removal of keys, potentially without slashing protection data populated, otherwise corrective action via list would be needed to even know what keys got removed...

I think for delete given we have to process the batch first, the only non 200 response I can see at the moment is when the request body is invalid or empty...

It would mean they couldn't get the slashing data, but its arguably more important that they know they've deleted a bunch of keys at a bare minimum...

My hope is the slashing protection data export could be fairly robust, and exporting as many as can be found is better than erroring out of that part of the process...
If no keys are found during that export, i'd probably not include the key in the response object...

{ data: [] }

I think this description pretty much sells to me that slashing_protection in the response has to be optional, otherwise we may have to export empty slashing exports, which would be sub-optimal

one thing that may be worth a thought/comment about.

the rest of the logic here is order important, but slashing protection data isn't. I'd be applying set logic to the keys being passed in, we only need 1 set of slashing export data per key, but we still need to apply the 'not_active' logic to all the duplicates if that makes sense...

Yeah I think I'm coming around to the "always return 200 unless the request is completely malformed approach" as well.

still a 200 with statuses of removal of keys, potentially without slashing protection data populated, otherwise corrective action via list would be needed to even know what keys got removed...
It would mean they couldn't get the slashing data, but its arguably more important that they know they've deleted a bunch of keys at a bare minimum...

I'm not sure if I'm understanding this correctly, but it sounds like you're saying in the case of slashing protection export error we should still return deleted as the status and just omit each failed key from the slashing_protection data. I think this is unsafe because it looks identical to a successful delete for an inactive key (i.e. one with no slashing protection data). I think if there's an error during slashing protection export we have to convey that to the caller, and the only mechanism we have for doing that is the error status (or a 500 error code). I.e. the semantics should be "if slashing protection export explicitly failed for a key, then its status is error regardless of whether it was actually deleted". In the case of a batch export failing, then it sets an error status for all keys that hadn't already errored. This is unfortunate in that it leaves the system in a quasi unknown state, but as you said, we should make sure that slashing protection export is as robust as possible so that this doesn't occur often.

These semantics are also more consistent with an implementation that wants to export slashing protection one-at-a-time for each key, where each export may fail or succeed independently.

I think this description pretty much sells to me that slashing_protection in the response has to be optional, otherwise we may have to export empty slashing exports, which would be sub-optimal

I think given the above, the slashing protection data with data == [] is an OK compromise. The metadata can be pulled out of thin air and doesn't pose any risk.

flows/README.md

rolfyone · 2021-11-25T00:57:06Z

flows/README.md

+- The flow above is only one example of a conformant implementation, implementations are free to
+  implement the API in any way they wish while remaining compatible. E.g. if a keymanager stores all
+  key data in a single database and is able to perform an atomic delete for all keys, then it may do
+  this. Likewise if an implementation would like to export slashing protection data for each key one
+  at a time and then merge these into the final response, then it may do this.


I think this is a good disclaimer...

The main thing to take away from these docs is more like

order of processing for delete means batch first

order of processing for insert means batch LAST.

this has huge implications for erroring out, and that's what is particularly helpful to align on, without being overly perscriptive of delete/disable, except to say that if a key is deleted, it is no longer able to be retrieved by the validator client that previously had access to the keystore...

I also think fleshing this out has made a few implementation details a lot clearer, so it's been a good exercise.

rolfyone · 2021-11-26T20:24:11Z

flows/README.md

+   - If the export fails, set the final status for all provisionally `deleted` or `not_found`
+     keys to `error`, with the `error` message describing the slashing protection failure.
+     The response should include a slashing protection object with valid `metadata` and
+     `data == []`.


I'm ok with the slashing_protection field being optional - it does make sense if the data is empty to have none passed rather than requiring an empty box with metadata...

rolfyone

LGTM. I think it's at the point where we merge it and maybe tweak anything we need to as we go

Add implementation details for DELETE

dad4c6d

michaelsproul force-pushed the delete-flow branch from b98611e to dad4c6d Compare November 25, 2021 00:09

michaelsproul marked this pull request as ready for review November 25, 2021 00:11

michaelsproul requested a review from rolfyone November 25, 2021 00:11

michaelsproul commented Nov 25, 2021

View reviewed changes

rolfyone reviewed Nov 25, 2021

View reviewed changes

flows/README.md Outdated Show resolved Hide resolved

rolfyone reviewed Nov 25, 2021

View reviewed changes

flows/README.md Show resolved Hide resolved

rolfyone reviewed Nov 25, 2021

View reviewed changes

michaelsproul added 3 commits November 25, 2021 12:24

Clarify language on erroring out

c998dc4

Clarify slashing protection status interaction

2fe6a9e

Explicitly ban 500 responses

83aed45

rolfyone reviewed Nov 26, 2021

View reviewed changes

rolfyone approved these changes Nov 26, 2021

View reviewed changes

michaelsproul merged commit ea77940 into ethereum:master Nov 27, 2021

michaelsproul deleted the delete-flow branch November 27, 2021 00:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add implementation details for DELETE #14

Add implementation details for DELETE #14

michaelsproul commented Nov 24, 2021 •

edited

Loading

michaelsproul Nov 25, 2021

rolfyone Nov 25, 2021

dapplion Nov 27, 2021

michaelsproul Nov 25, 2021 •

edited

Loading

rolfyone Nov 25, 2021

rolfyone Nov 25, 2021

michaelsproul Nov 25, 2021

rolfyone Nov 25, 2021

rolfyone Nov 26, 2021

rolfyone left a comment

Add implementation details for DELETE #14

Add implementation details for DELETE #14

Conversation

michaelsproul commented Nov 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelsproul Nov 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rolfyone left a comment

Choose a reason for hiding this comment

michaelsproul commented Nov 24, 2021 •

edited

Loading

michaelsproul Nov 25, 2021 •

edited

Loading