-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICS28: Timeout behaviour #669
Comments
What's the attack scenario if the provider chain will no longer wait for a consumer chain that timed out, i.e., just consider all unbonding operatios to have matured on that consumer chain? Possible scenario: I find this very unlikely to happen, since it entails there is nobody to relay (i.e., not a single correct validator). I think that it's more likely that due to some non-malicious event (e.g., the network is overloaded), a packet may timeout. I don't think we should punish the delegators for that. |
1/3 of validators could timeout a channel through an eclipse attack. However, CCV clearly works under of the assumption that less than 1/3 are Byzantine. |
Thanks for opening this issue! I think this is a central question in IBC: Who is responsible for relaying? I think the timeout is currently set to something close to the unbonding period. To my understanding, a timeout (not relaying a packet for two weeks) in the interchain is thus not due to overloads or disconnections etc. If the system is overloaded for two weeks, we have more severe issues. The reason for a timeout is that no-one cares to relay a packet or there is one who doesn't understand that it is in their interest to relay the packet. We should make explicit that whoever has interest in an validator set change (e.g., validator, delegator) in the limit is responsible to make the operation complete, that is, get the packets relayed. (In my view, validator operators should run relayers). To my understanding IBC is designed in this open way on purpose: anyone who wants a packet delivered can relay it. On the on the other hand, returning stake when chains got disconnected due to timeout is definitely unsafe. So I think we should keep stake locked when a channel closes and prepare provisions within CCV for social consensus (governance) in case of a timeout to
|
Also we should keep in mind that in an ordered channel, a timeout does not only affect one packet A. All packet's that were send after A also do not make it, and everyone who is interested in relaying a packet B (that was sent after A), also is interested in relaying A. |
Why do you say that? Do you have a possible attack in mind? Not returning the stake I see it as a method to punish (i.e., disincentivize) certain behaviour. The questions are what behaviour are we punishing and whom are we punishing. What behaviour are we punishing? IMO, that's hard to tell. Unlike other misbehaviours (e.g., double signing), one validator not relaying will not affect the system; you need all validators to stop relaying for a timeout to happen. Thus, I'd rather go in the direction of incentivising validators to relay. What are validators incentivised by... money 🤑, i.e., rewards which are proportional to the voting power (see below). Whom are we punishing? I believe that by not returning the stake we are not punishing the validator operators, but rather the delegators. We shouldn't expect delegators to run relayers.
I agree. That's why I don't think it's feasible to intentionally timeout a packet (as long as the 1/3 assumption holds). The CCV channel is bidirectional - the provider sends
In general though, I think validator operators would be incentivised to relay both ways by the consumer chain rewards. If the channel times out, no more consumer chain, and thus, no more rewards for any of the operators. |
@jtremback @okwme Any views on this issue from a business perspective? |
I guess we would like to ensure a property somewhat like:
As far as I understand, both properties cannot be achieved if we unlock stake on timeout. |
Yeah, but some of the CCV properties (including the ones mentioned by you) rely on the Correct Relayer assumption. The spec is written from the perspective that this assumption holds. IMO handling timeouts is out of the scope of the specification, at least out of the scope of the properties. The question is: What can we do to ensure (at least with a high probability) that the Correct Relayer assumption holds. |
I think CCV will be used if someone wants to earn money from running a second chain. Thus the incentives are aligned to ensure the correct relayer assumption: you need to keep the channel open, e.g.,
In my view, in the unlikely case of channel closing, we should still ensure safety. I would guess that these cases are so rare that we can rely on governance to eventually figure out what to do and postpone liveness to governance intervention. |
The question of whether someone bothers to relay a packet is a distraction, IMO. We should assume that someone attempts to relay all packets*. The real question is whether a validator set censors packets. The following scenarios are based on a model where there may be differences in the provider and consumer chain validator sets. They do not make as much sense for the 100% overlap v1. A. Here is an attack that is possible if tokens are automatically released on channel timeout:
B. Here is an attack that is possible if tokens are NOT automatically released on channel timeout:
C. The most "correct" way to handle this is to slash 100% of the consumer chain validator's packets if the channel times out. Of course, this would result in destruction of the provider chain in v1. I think that the safest way for us to handle this right now is keeping tokens locked if the channel times out, like @josef-widder suggests. However, I suspect that given the 100% overlap of validator sets in v1, it should be possible to relax this and maybe allow tokens to be automatically unlocked. However, I would like to see a more rigorous analysis of this***. * I wonder if the heavy packet load we are intending for v1 (once a block) makes this less of a safe assumption than it was before. Can we waive the gas fee for ccv packets? ** In my analysis of double signing here, I am assuming that we would have code that allowed the provider chain to independently verify double signing evidence. *** I think maybe we need a more rigorous analysis of v1's 100% validator set overlap across the board. |
We already do that through the Correct Relayer assumption.
Each consumer chain entails two extra transactions per block (on each side) except slashing which should be rare. The consumer receives in each block a VSCPacket and an ACK for a VSCMaturedPacket . The provider receives in each block an ACK for a VSCPacket and a VSCMaturedPacket. (I assume that in every block there is a change in the val set).
I don't think so, since it would enable DOS attacks.
It needs at least 1/3 of the voting power. If that happens, the light clients cannot trust headers. I think we need to assume < 1/3 Byzantine voting power.
Let's focus on the next versions afterwards. Once we move away from V1, many things change. I find it difficult to discuss possible attacks for a system that I don't yet know how it will look like or what properties it will have.
That will be the safest indeed. Not yet convinced that it's necessary, but if nobody complains (e.g., Cosmos Hub validators), then we can go with it.
What do you mean by a more rigorous analysis of the validator set overlap? Do you have something specific in mind? |
@josef-widder If every operator relays, we'll get 300 IBC packets per block, and only 2 are actually needed. And this is per consumer chain.
Then we need to shift the discussion on who wants what relayed. |
Just discussed this with @mpoke. The main issue is that our protocol expects altruistic relaying, and it currently generates 2-3 packets per block, per consumer chain. This high(?) traffic volume, combined with the lack of relayer incentivization makes it more likely that the correct relayer assumption will be broken. @AdityaSripal has previously pointed out that compared to the altruistic relaying load currently, the additional load from CCV should be minor. If the altruistic relaying problem for transfers is fixed, however, CCV will remain a vestige of the problem. Some information that would inform this discussion is- How much will it cost to relay a CCV packet? Do you have any guesses @AdityaSripal? Possible courses of action:
IMO, ultimately, something like 4 is probably the most comprehensive solution, but needs more work. 2 may be more feasible, but depending on what the real costs are, we may be able to launch with 1. |
+1. The conversation here seems focused on malicious and/or deliberate timeouts induced by bad actors in the network. But unintentional timeouts due to network, host, configuration, etc. errors are the (much) more common case. Network errors like timeouts are normal, and should be expected and accommodated by higher layers. |
Thinking about it again, I think we should allow all tokens to unbond and clean up other consumer chain state when the channel times out. This does potentially allow some kind of attack where the validators intentionally censor packets to let the channel time out, but given that we are talking about a 100% validator set overlap between provider and consumer, it's hard to imagine what this attack would be. The option suggested by @josef-widder, where the bonded tokens stay locked until some active governance action is taken, is definitely the safer option, but would result in a much longer effective unbonding period (governance period + channel timeout) for some people and I think we need to have a really good reason to do this. |
The current consensus is to support both options: Introduce a CCV parameter (i.e., This would be combined with a mechanism that allows the provider chain to remove a consumer chain through governance proposals. In case @josef-widder @jovankomatovic @jtremback @AdityaSripal What do you think about this approach? |
Sounds good. I would just set the default to |
Sorry I didn't stay up on this issue, but it seems to me that making this a variable has just kicked the can down the road. We don't actually have consensus on which one is the "safer" option, and which should be the default. Also, we have now have to support more complicated code that handles both. I personally think that I think the alternative, that the a malicious 1/3+ of the validator set stops packets being sent from the consumer, but also doesn't use that same power to halt or censor the provider as well, is much more of an edge case. |
I think we should probably only have the |
IMO,
I disagree, see cosmos/interchain-security#261 (comment)
|
@jtremback Do we still need this issue opened? Are you in agreement re. a solution? |
What should be the CCV behaviour in the case the assumption of a Correct Relayer is violated and a packet does time out? In the current version if IBC, since the CCV channel is ordered, a packet timeout results in the CCV channel being closed.
Closing the CCV channel has the following implications.
On the consumer chain,
VSCPacket
s);SlashPacket
s).Thus, the validator set can no longer be trusted and the consumer chain should shut down.
Note that the provider chain will close its own endpoint either on
timeoutOnClose
or in the worst case, once its own sent packets timeout.On the provider chain,
UnbondingPeriod
to elapse on the consumer chain will never complete.Consequently, some tokens will be permanently locked on the provider chain.
On the one hand, locking these tokens would incentivise validators to relay packets in order to avoid timeouts (i.e., ensures that the Correct Relayer assumption is practical).
On the other hand, these tokens are usually staked by delegators. This means that validators could e.g., opt not to relay packets in order to stop delegators from reducing their voting power. In other words, locking these tokens would punish delegators and not validators.
The text was updated successfully, but these errors were encountered: