Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sovereign -> consumer chain changeover #288

Closed
jtremback opened this issue Aug 23, 2022 · 20 comments
Closed

Implement sovereign -> consumer chain changeover #288

jtremback opened this issue Aug 23, 2022 · 20 comments
Assignees
Labels
type: feature-request New feature or request improvement

Comments

@jtremback
Copy link
Contributor

jtremback commented Aug 23, 2022

We know this should be theoretically possible, but we have never actually tested (to my knowledge) the transition of a sovereign chain to a consumer chain.

Conditions of satisfaction:

  • Consumer chain with same state etc. as old sovereign chain is created
  • Changeover is not noticeable to users
  • Changeover does not result in IBC clients on other chains losing connection and needing to be updated by governance

There are a few different ways this could be done, we should use this thread to plan them out.

@mpoke mpoke moved this to Todo in Replicated Security Aug 24, 2022
@mpoke mpoke added the scope: testing Code review, testing, making sure the code is following the specification. label Aug 24, 2022
@jtremback
Copy link
Contributor Author

jtremback commented Aug 30, 2022

AFAIK, the important thing for IBC clients to keep their connection is for the next_validators_hash in the last update from the sovereign validator set to be the hash of the new consumer chain's validator set, the provider validator set. There was talk about no more than a 1/3 change being allowed, but this is probably not an issue if relayers are configured correctly (need someone to weigh in here, have gotten lots of conflicting info).

Here is @zmanian's idea:

  1. We need to have a governance proposal type that basically overwrite the entire staking module with validators from the cosmos hub and then freeze the valset
  2. When this happens the chain it signed a next valset which is the hub valset.
  3. The chain then halts until the hub valset starts up nodes on the chain.
  4. Then you upgrade the state machine to support ICS after the hub valset has started their nodes

As stated in point 2, replacing the validator set results in a light client state being created where the old validator set signs a next_validators_hash of the new validator set. This provides continuity and should allow the IBC connection to be maintained. I'm not sure if you need the upgrade to ICS to be a different step, or if the hub valset can just start up a chain with the ICS logic already in it. Based on my limited understanding of IBC, this sounds like it should work. It will require a new governance proposal type that can replace the entire valset, but this does not need to be part of ICS itself.

Here's an alternate technique which should work with no code outside of CCV:

  1. Export the genesis of the sovereign chain
  2. Paste the ccv consumer section into the genesis file, with the InitialValSet section set to the validator set of the sovereign chain, not the provider chain.
  3. Normally, this will fail in the consumer's GenesisState.Validate() function since it checks that the NextValidatorsHash is equal to the InitialValSet. This check will need to be disabled, either conditionally for this special sort of startup case, or in general (are we sure it's really necessary)?
  4. The sovereign validator set starts the chain back up, and lets it go through the CCV handshake. When the first VSC packet comes in, the standard CCV logic changes the validator set over to the provider validator set. This accomplishes the same thing that happens in Zaki's step 2 above.

The benefit of this technique is that it requires very little extra code to be written. The only thing that needs to happen is for us to disable the check here @mpoke is this safe to do?

@mpoke
Copy link
Contributor

mpoke commented Aug 31, 2022

There was talk about no more than a 1/3 change being allowed, but this is probably not an issue if relayers are configured correctly (need someone to weigh in here, have gotten lots of conflicting info).

That is correct. The "no more than a 1/3 change" is a requirement for the bisection protocol of light clients.

This was what @josef-widder said about this:

you can have arbitrary changes of validator sets in sequential. The verification logic is defined in [LCV-FUNC-VALID.2]. In case of sequential verification (search “immediate successor”), one checks trusted.Header.NextValidators against untrusted.Header.Validators. However, the change in validators would be visible between trusted.Header.NextValidators and trusted.Header.Validators, which is not constrained at all. This is because sequential is treated as a special case here.

The overlap is only for skipping verification and it is defined in “Returns SUCCESS” below.

@mpoke
Copy link
Contributor

mpoke commented Aug 31, 2022

The sovereign validator set starts the chain back up, and lets it go through the CCV handshake. When the first VSC packet comes in, the standard CCV logic changes the validator set over to the provider validator set.

Some concerns / comments I have re. this design:

  • Why would the sovereign validator set even bother creating blocks that will put them out of business?
  • This means that until the CCV handshake is done (until the first VSC packet is received), the staking module on the consumer chain MUST be enabled and provide validator set updates to Tendermint. Afterwards, the consumer CCV module MUST provide validator set updates to Tendermint.
  • When creating a client to a new consumer chain, the provider needs the validator set on the consumer. Currently, the assumption is that it is the same set as on the provider (see here). This will need to be changed.

@mpoke
Copy link
Contributor

mpoke commented Aug 31, 2022

Normally, this will fail in the consumer's GenesisState.Validate() function since it checks that the NextValidatorsHash is equal to the InitialValSet.

The only thing that needs to happen is for us to disable the check here @mpoke is this safe to do?

This code doesn't check that the NextValidatorsHash of the previous block equals the InitialValSet, but rather that the NextValidatorsHash of the provider consensus state (passes via the genesis file) quals the InitialValSet. The ProviderConsensusState is used to create a client to the provider (see here).

This check is actually not needed. There is no requirement for the local validator set to match the NextValidatorsHash in the consensus state of a remote chain.

@mpoke
Copy link
Contributor

mpoke commented Aug 31, 2022

Note that UpdateClient doesn't work across different revision numbers (see https://github.com/cosmos/ibc-go/blob/a83bcd5af71f3121e97141f797e4970419925992/modules/light-clients/07-tendermint/update.go#L61). I'd assume that the sovereign chain will increment its revision number once it becomes a consumer chain.

@jtremback
Copy link
Contributor Author

jtremback commented Aug 31, 2022

After talking with @mpoke today, I think that my suggested technique has too many problems. The handshake will not be able to work because the provider chain expects the consumer chain IBC client to have the provider chain validator set, not the sovereign validator set. We should probably go with Zaki's idea, which I will flesh out more here:

  1. A "validator switch" governance proposal is made on the sovereign chain. This contains:
    a. The block or time after which the switch will take place
    b. The IBC light client whose validator set will be switched to (this will be the provider chain)
  2. At or after the switch time, anyone can make a transaction supplying the validator set corresponding to the valset hash in the designated light client. This transaction won't have any effect unless the IBC light client of the provider has been updated recently.
  3. This validator set is verified by comparing it to the hash, and if valid, is sent to Tendermint, replacing the entire validator set.
  4. When this happens, an IBC light client update is produced, effectively signing over control of the chain to the new validator set. At this point the chain halts, since the existing validator set no longer has the ability to produce new blocks.
  5. The new validator set (in our case, the Cosmos Hub validator set) starts running the chain, producing new blocks.
  6. The new validator set does a chain upgrade adding and enabling the CCV consumer module (this might also be possible to do as part of step 5)

@mpoke
Copy link
Contributor

mpoke commented Sep 1, 2022

I agree that this approach is the way to go, but there are some things that need to be sorted out.

Problem statement:

  • let chainA be a sovereign chain
  • let chainB be the provider chain
  • chainA and chainB are already connected via IBC (e.g., there are transfer channel between the two chains)
  • chainA has IBC channels with other chains
  • chainA wants to become a consumer chain of chainB without breaking any IBC channels

Note: We distinguish between a sovereign chain wanting to become a consumer chain and a chain that wants to start from the beginning as a consumer chain. Here we focus only on the former.

Requirements:

  • [Hard] chainA is down for a limited amount of time (similar to other upgrades).
  • [Hard] Let h be the height of the last block committed by the validator set of chainA (as a sovereign chain). Then, block(h).NextValidatorsHash is the hash of the validator set on chainB that will validate the first block of chainA (as a consumer chain), i.e., block(h+1). This will ensure tat all IBC clients of chainA can be updated even after it becomes a consumer chain.
  • [Hard] The transfer channel used for IS reward distribution is the same as the existing transfer channel from chainA to chainB.
  • [Soft] The CCV channel is build on the existing clients.

Approach:

  • chainA upgrades and adds the consumer CCV module. The GenesisState of this module contains a clientID and a connectionID to chainB.
  • chainA starts the handshake for the CCV channel between chainA and chainB.
  • A gov proposal is submitted to chainB for adding chainA as a consumer chain. The proposal contains the ID of the client to chainA on chainB (maybe also the connection). A continues to run as a sovereign chain.
  • The gov proposal passes on chainB at height Bh. The validators in valsetB s.t. hash(valsetB) == block(Bh).NextValidatorsHash are responsible to validate the first block of chainA as a consumer chain.
  • the CCV module on chainB sends a Packet to the CCV module on chainA requesting to start the transition towards becoming a consumer chain. The packet contains the un-hashed valsetB.
  • At height hA, the CCV module on chainA receives this packet. As a result, it requests the staking module to replace the entire validator set with valsetB. At height hA + 1 Tendermint commits a block with NextValidatorsHash == hash(valsetB). At height hA + 2, in BeginBlock(), chainA stops and upgrades by removing the staking module and updating the CCV module to enable it to submit validator set updates to Tendermint.
  • At this moment, the validator in valsetB can start running the new binary for chainA starting with block hA + 2.

Note: The first two steps can happen also after the gov proposal passes.

As a side effect, the entire Channel Initialization protocol is no longer needed. In other words, once chainA starts running as a consumer chain, it already has an established CCV channel.

@jtremback
Copy link
Contributor Author

jtremback commented Sep 2, 2022

Implementation plan for Marius's proposal above:

  • become-consumer gov proposal on ccvconsumer

    • opens a channel to the provider according to governance proposal
      • using existing clients to future provider
      • using portId of provider ccv module
    • logic to receive becomeConsumerPacket:
      • send initialValidatorSet from packet to tendermint to replace current valset (this will take effect the block after next)
      • we have to make sure that the last sovereign block can't mess anything up
  • modify consumer chain creation proposal on the provider

    • add optional client id and connection id to pre-consumer chain
    • add info about desired transfer channel (what channel id it has on the consumer)
    • when spawntime is reached:
      • send initial valset from ccvconsumer genesis to pre-consumer module "becomeConsumerPacket"
  • modify consumer module/genesis

    • add a field to the genesis for the desired distribution transfer channel
    • add fields for client id and connection id
    • skip parts of handshake depending on info above

@danwt
Copy link
Contributor

danwt commented Sep 8, 2022

Is there any worry about needing to trust what is being received over ibc from the sovereign chain? (Because it will have its own valset at that time)

@mpoke
Copy link
Contributor

mpoke commented Sep 8, 2022

Is there any worry about needing to trust what is being received over ibc from the sovereign chain? (Because it will have its own valset at that time)

Nothing is received over IBC from the sovereign chain. The IBC channel is used to send the provider valset (aka the consumer initial valset) to the sovereign chain so that the control of the sovereign chain can be passed to this valset.

@jackzampolin
Copy link
Member

jackzampolin commented Sep 8, 2022

How do the client updates work for other chains connected to the sovereign -> consumer chain?

@mpoke
Copy link
Contributor

mpoke commented Sep 9, 2022

How do the client updates work for other chains connected to the sovereign -> consumer chain?

@jackzampolin This should work directly by relaying a ClientUpdate message, i.e., https://github.com/cosmos/ibc-go/blob/b601462fbce9dd34620cd9129ff9b9057ee5c186/modules/core/keeper/msg_server.go#L62. Since the last block of the sovereign has NextValidatorsHash pointing to the valset validating the first block of the consumer, this transition is the same as the valset on a chain changing completely between two subsequent blocks.

@jtremback
Copy link
Contributor Author

@mpoke is this in the spec yet?

@jtremback jtremback changed the title Test sovereign -> consumer chain changeover Implement sovereign -> consumer chain changeover Sep 20, 2022
@mpoke
Copy link
Contributor

mpoke commented Oct 5, 2022

@jtremback Yes. There is a PR open on the spec repo cosmos/ibc#840

@mpoke mpoke moved this from Todo to In Progress in Replicated Security Nov 11, 2022
@asalzmann
Copy link

We've started to work on this cc @jstr1121 @jtremback

@mpoke
Copy link
Contributor

mpoke commented Dec 13, 2022

We've started to work on this cc @jstr1121 @jtremback

@asalzmann please check the latest version of the spec, i.e., cosmos/ibc#840

@mpoke
Copy link
Contributor

mpoke commented Dec 15, 2022

We've started to work on this cc @jstr1121 @jtremback

@asalzmann please check the latest version of the spec, i.e., cosmos/ibc#840

Especially the comment in the BeginBlockInit method, i.e.,

  // pre-CCV state is over; upgrade chain to consumer chain
  //  - set preCCV to false
  //  - the existing staking module no longer provides 
  //    validator updates to the underlying consensus engine
  //  - the CCV module starts providing validator updates 
  //    to the underlying consensus engine
  //  - for safety, the existing staking module must be kept 
  //    for at least the unbonding period

@shaspitz
Copy link
Contributor

@asalzmann @jstr1121 lmk if there's anyway I can help guide any efforts! Happy to hop on a call

@mpoke mpoke added type: feature-request New feature or request improvement and removed scope: testing Code review, testing, making sure the code is following the specification. labels Jan 4, 2023
@mpoke mpoke moved this from In Progress to Waiting for review in Replicated Security Jan 10, 2023
@mpoke
Copy link
Contributor

mpoke commented Jan 10, 2023

Implementation: Stride-Labs#1

@mpoke mpoke moved this from Waiting for review to In Progress in Replicated Security Jan 25, 2023
@mpoke mpoke added this to Cosmos Hub Jan 25, 2023
@mpoke mpoke added this to the ICS v1.1.0 milestone Jan 25, 2023
@github-project-automation github-project-automation bot moved this to 🩹 Triage in Cosmos Hub Jan 25, 2023
@mpoke mpoke moved this from 🩹 Triage to 🏗 In progress in Cosmos Hub Jan 25, 2023
@mpoke mpoke modified the milestones: ICS v1.1.0, Consumer onboarding and offboarding Jan 27, 2023
@mpoke mpoke removed this from the Consumer onboarding and offboarding milestone Mar 5, 2023
@shaspitz
Copy link
Contributor

shaspitz commented Mar 6, 2023

Superseded by #756

@shaspitz shaspitz closed this as completed Mar 6, 2023
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in Cosmos Hub Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature-request New feature or request improvement
Projects
Status: ✅ Done
Development

No branches or pull requests

6 participants