Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: simpler and safer workflow to move validators from one lighthouse vc to another #2557

Open
winksaville opened this issue Sep 1, 2021 · 4 comments

Comments

@winksaville
Copy link
Contributor

Description

There should be a workflow to move one or more validators from one lighthouse vc to another. This workflow should minimize missed attestations, ideally zero, and should make all best attempts to guarantee no slashing.

Version

v1.5.1-stable

Present Behavior

Assume I have two lighthouse vc instances running on the same or different computers, VC1 and VC2. Each with a different set of enabled validators in their respective validator_definitions.yml file. I want to move a subset of the validators from VC1 to VC2. To do so I have to perform the following steps:

  1. Stop VC1.
  2. Edit VC1 valdiator_definitions.yml and disable all validators that are to be moved to VC2
  3. Export VC1 slashing protection data.
  4. Start VC1.
  5. Start count down timer for 2 epochs (approx 13 min)
  6. Stop VC2.
  7. If the validators from VC1 do not exist in VC2 validator_definitions.yml then import them to VC2.
  8. Edit VC2 validator_definitions.yml and ENABLE the validators that are to be moved from VC1 to VC2 and leave all current VC2 validators enabled.
  9. Import the slashing protection data exported in step 3 to VC2.
  10. Edit VC2 validator_definitions.yml and DISABLE only the validators that are to be moved from VC1 to VC2.
  11. Start VC2.
  12. Edit VC2 validator_definitions.yml and ENABLE the validators that are to be moved from VC1 to VC2.
  13. Wait until timer from step 5 has counted down to zero.
  14. Restart VC2 (i.e. stop VC2 and then start VC2).
  15. The validators will now be running on VC2 and if no mistakes were made in the above steps "no slashings** will occur. But there will have been a number of missed attestations that would be unnecessary in an ideal case.

Expected Behavior

Ideally, the necessary steps should be able to be accomplished by issuing a single command specifying source VC, destination VC and a list of validators to move. That command should be able handle all details as specified above and atomically move the validators without user intervention. Plus, the there should be no missed attestations or slashings.

If the ideal case is not possible than the next best solution is that the only missed attestations would be for those validators that actually moved and only a minimal possibility of slashings.

Steps to resolve

I'm sure there is a non-trivial amount of work to do, but based on feedback from a question I posted on discord, https://discord.com/channels/605577013327167508/710978433823277099/882429074436407357, @michaelsproul suggests it might be possible.

@michaelsproul
Copy link
Member

I think this is a great idea and a good candidate for inclusion in the standard validator client API being developed (ethereum/beacon-APIs#151).

On the implementation side, there are a few things that spring to mind:

  1. We should fix issues with false positives on import: Import slash protection without checking slashability #2419. I'll start on that this week.
  2. We need to restructure the slashing protection database internals so that it supports exporting a subset of validators.
  3. We should spec out the /import and /export end points, describing the structure of requests/responses and the guarantees e.g. that export should atomically disable the selected validators and return a JSON object with keystores and EIP-3076 slashing protection data.
  4. We could add a prototype implementation under the /lighthouse/ prefix until our proposal is accepted into the standard API

@winksaville
Copy link
Contributor Author

A short discussion on discord it was mentioned an API might look something like:

struct ValidtatorInfo {
  ValidatorPubkey, // Maybe Kestore has ValidatorPubkey ???
  Keystore,
  Password,
  Interchange,
  ...
}

stopValidators(vlist: Vec<ValidatorPubkey>) -> Result<Vec<ValidatorInfo>, Error>;
startValidators(Vec<ValidatorInfo>) -> Vec<(ValidatorPubkey, Result<(), Error>)>;

Note; the Vec<ValidatorInfo> passed to startValidators is not necessarily the same as returned by stopValidators as the entity moving validators may be moving them among a set of Validator Clients.

@michaelsproul
Copy link
Member

A standard API has been specced by Dapplion that would be suitable for our use-case.

As noted in my review, it is slightly different from the original vision described here in that it does not support exporting the keystores or extracting them from the validator. I think we can still make the atomic migrate work by having the CLI coordinator read the keystores from disk (possibly from the Lighthouse datadir, or elsewhere).

@winksaville
Copy link
Contributor Author

My knee jerk reaction is that we can support the standard API but create a private API that does the complete job.

Your point about security is very prudent, I'd suggest that all of API's require a separate PKI keypair to be used to establish a connection. Of course a user can always use an SSH tunnel, but I think having it built-in would be preferred.

Note: These are the opinion of a neophyte :)

bors bot pushed a commit that referenced this issue Oct 19, 2021
## Issue Addressed

Part of #2557

## Proposed Changes

Refactor the slashing protection export so that it can export data for a subset of validators.

This is the last remaining building block required for supporting the standard validator API (which I'll start to build atop this branch)

## Additional Info

Built on and requires #2598
bors bot pushed a commit that referenced this issue Jan 30, 2022
## Issue Addressed

Implements the standard key manager API from https://ethereum.github.io/keymanager-APIs/, formerly ethereum/beacon-APIs#151
Related to #2557

## Proposed Changes

- [x] Add all of the new endpoints from the standard API: GET, POST and DELETE.
- [x] Add a `validators.enabled` column to the slashing protection database to support atomic disable + export.
- [x] Add tests for all the common sequential accesses of the API
- [x] Add tests for interactions with remote signer validators
- [x] Add end-to-end tests for migration of validators from one VC to another
- [x] Implement the authentication scheme from the standard (token bearer auth)

## Additional Info

The `enabled` column in the validators SQL database is necessary to prevent a race condition when exporting slashing protection data. Without the slashing protection database having a way of knowing that a key has been disabled, a concurrent request to sign a message could insert a new record into the database. The `delete_concurrent_with_signing` test exercises this code path, and was indeed failing before the `enabled` column was added.

The validator client authentication has been modified from basic auth to bearer auth, with basic auth preserved for backwards compatibility.
bors bot pushed a commit that referenced this issue Aug 7, 2023
## Issue Addressed

Addresses #2557

## Proposed Changes

Adds the `lighthouse validator-manager` command, which provides:

- `lighthouse validator-manager create`
    - Creates a `validators.json` file and a `deposits.json` (same format as https://github.com/ethereum/staking-deposit-cli)
- `lighthouse validator-manager import`
    - Imports validators from a `validators.json` file to the VC via the HTTP API.
- `lighthouse validator-manager move`
    - Moves validators from one VC to the other, utilizing only the VC API.

## Additional Info

In 98bcb94 I've reduced some VC `ERRO` and `CRIT` warnings to `WARN` or `DEBG` for the case where a pubkey is missing from the validator store. These were being triggered when we removed a validator but still had it in caches. It seems to me that `UnknownPubkey` will only happen in the case where we've removed a validator, so downgrading the logs is prudent. All the logs are `DEBG` apart from attestations and blocks which are `WARN`. I thought having *some* logging about this condition might help us down the track.

In 856cd7e I've made the VC delete the corresponding password file when it's deleting a keystore. This seemed like nice hygiene. Notably, it'll only delete that password file after it scans the validator definitions and finds that no other validator is also using that password file.
bors bot pushed a commit that referenced this issue Aug 7, 2023
## Issue Addressed

Addresses #2557

## Proposed Changes

Adds the `lighthouse validator-manager` command, which provides:

- `lighthouse validator-manager create`
    - Creates a `validators.json` file and a `deposits.json` (same format as https://github.com/ethereum/staking-deposit-cli)
- `lighthouse validator-manager import`
    - Imports validators from a `validators.json` file to the VC via the HTTP API.
- `lighthouse validator-manager move`
    - Moves validators from one VC to the other, utilizing only the VC API.

## Additional Info

In 98bcb94 I've reduced some VC `ERRO` and `CRIT` warnings to `WARN` or `DEBG` for the case where a pubkey is missing from the validator store. These were being triggered when we removed a validator but still had it in caches. It seems to me that `UnknownPubkey` will only happen in the case where we've removed a validator, so downgrading the logs is prudent. All the logs are `DEBG` apart from attestations and blocks which are `WARN`. I thought having *some* logging about this condition might help us down the track.

In 856cd7e I've made the VC delete the corresponding password file when it's deleting a keystore. This seemed like nice hygiene. Notably, it'll only delete that password file after it scans the validator definitions and finds that no other validator is also using that password file.
bors bot pushed a commit that referenced this issue Aug 8, 2023
## Issue Addressed

Addresses #2557

## Proposed Changes

Adds the `lighthouse validator-manager` command, which provides:

- `lighthouse validator-manager create`
    - Creates a `validators.json` file and a `deposits.json` (same format as https://github.com/ethereum/staking-deposit-cli)
- `lighthouse validator-manager import`
    - Imports validators from a `validators.json` file to the VC via the HTTP API.
- `lighthouse validator-manager move`
    - Moves validators from one VC to the other, utilizing only the VC API.

## Additional Info

In 98bcb94 I've reduced some VC `ERRO` and `CRIT` warnings to `WARN` or `DEBG` for the case where a pubkey is missing from the validator store. These were being triggered when we removed a validator but still had it in caches. It seems to me that `UnknownPubkey` will only happen in the case where we've removed a validator, so downgrading the logs is prudent. All the logs are `DEBG` apart from attestations and blocks which are `WARN`. I thought having *some* logging about this condition might help us down the track.

In 856cd7e I've made the VC delete the corresponding password file when it's deleting a keystore. This seemed like nice hygiene. Notably, it'll only delete that password file after it scans the validator definitions and finds that no other validator is also using that password file.
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this issue Jan 6, 2024
Addresses sigp#2557

Adds the `lighthouse validator-manager` command, which provides:

- `lighthouse validator-manager create`
    - Creates a `validators.json` file and a `deposits.json` (same format as https://github.com/ethereum/staking-deposit-cli)
- `lighthouse validator-manager import`
    - Imports validators from a `validators.json` file to the VC via the HTTP API.
- `lighthouse validator-manager move`
    - Moves validators from one VC to the other, utilizing only the VC API.

In 98bcb94 I've reduced some VC `ERRO` and `CRIT` warnings to `WARN` or `DEBG` for the case where a pubkey is missing from the validator store. These were being triggered when we removed a validator but still had it in caches. It seems to me that `UnknownPubkey` will only happen in the case where we've removed a validator, so downgrading the logs is prudent. All the logs are `DEBG` apart from attestations and blocks which are `WARN`. I thought having *some* logging about this condition might help us down the track.

In sigp@856cd7e I've made the VC delete the corresponding password file when it's deleting a keystore. This seemed like nice hygiene. Notably, it'll only delete that password file after it scans the validator definitions and finds that no other validator is also using that password file.
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this issue Jan 6, 2024
Addresses sigp#2557

Adds the `lighthouse validator-manager` command, which provides:

- `lighthouse validator-manager create`
    - Creates a `validators.json` file and a `deposits.json` (same format as https://github.com/ethereum/staking-deposit-cli)
- `lighthouse validator-manager import`
    - Imports validators from a `validators.json` file to the VC via the HTTP API.
- `lighthouse validator-manager move`
    - Moves validators from one VC to the other, utilizing only the VC API.

In 98bcb94 I've reduced some VC `ERRO` and `CRIT` warnings to `WARN` or `DEBG` for the case where a pubkey is missing from the validator store. These were being triggered when we removed a validator but still had it in caches. It seems to me that `UnknownPubkey` will only happen in the case where we've removed a validator, so downgrading the logs is prudent. All the logs are `DEBG` apart from attestations and blocks which are `WARN`. I thought having *some* logging about this condition might help us down the track.

In sigp@856cd7e I've made the VC delete the corresponding password file when it's deleting a keystore. This seemed like nice hygiene. Notably, it'll only delete that password file after it scans the validator definitions and finds that no other validator is also using that password file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants