-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EIP-1955: Specify the Cliquey proof-of-authority engine. #1955
Conversation
…g block period, ref #9
* spelling and grammar fixes nothing major * Update EIPS/eip-cliquey.md
Removing the SIGNER_LIMIT is dangerous as it would lead straightaway to a list of potential attacks. |
With variable block length and min wait time we will always wait for the in turn block the max time + wait time which makes the attack from my previous comment even easier. |
Both the min wait time and publishing in turn blocks in case of existing out of turn blocks were implemented in Nethermind. The latter strengthens the main chain while the former allows malicious signers to gain greater power (by not following the min wait time rule). What we can do would be including the difference in the consensus - for any out of turn block the timestamp has to be at least BLOCK_PERIOD + 1 instead of BLOCK_PERIOD. |
Also, as suggested in the original EIP discussion I would auggest co-prime numbers for InTurn and OutOfTurn blocks. 7 and 3 respectively. |
Thanks for your review.
Can you expand on that? Aura does not have a signer limit and I'm not aware of any vectors here, but maybe they have other measures in place.
Yes, very good catch. I restored the block timestamp constraints in b97026c - this should mitigate this entirely.
I don't think we need to make this part of the consensus, I don't want to make it more complex than necessary. Otherwise, I'm not really opposed though.
Can you explain why prime numbers? Also, why 3 and 7, not 2 and 7? I see 7 would work, but I don't see why we should increase the baseline constant. See 75bbc88 |
It's not entirely clear to me whether this is A) Clique v2 or B) an additional consensus engine. |
It seems to be mostly about fixing issues in Clique as opposed to changing the design drastically, so I would agree with @nicksavers that naming this Clique 2.0 and having the |
Sounds more like Clique 1.1 to me tbh, but that's besides the point. |
Thanks for your review.
It's more like a patch for Clique. Existing networks can activate EIP-1955 via hardfork to gain from the proposed improvements.
If there was semantic versioning, this would be indeed 1.1, however, it was previously discussed as "v2" on various channels without actually implying a versioning theme here.
It does not specify the entire Clique engine though, therefore I used the I updated the spec to clarify and reflect the feedback. |
I haven't yet thought over all the implications, but here are a new thoughts I have based on a couple quick read-throughs:
Generally my concerns with the EIP is that it changes the block producing/acceptance logic, but provides no backing as to guarantee the censorship resistance of the proposed schema. I think that the most important thing for this EIP to move forward is to explore attack scenarios, and whether a minority of signers could have the capability to take over the network (or grieve it offline). I'm not saying that we need to make everything absolutely bullet proof, but we should be able to prove that the proposal is better and won't just blow up in a similar way. Another thing that would be of interest it to explain why Rinkeby is stable, but Görli keeps falling apart. IF we could explain exactly what goes wrong with Görli, we might figure out which parts need fixing, and which parts can be left alone. Perhaps you are clear on this, then try to provide some more details. |
Thanks Peter, your feedback is very valuable for our work.
I agree. I will clarify that this is an optional feature to be considered by clients but does not affect the signer consensus logic.
I understand your design decision to stabilize blocktimes as compared to Aura. However, this is a trade-off we are willing to embrace. A healthy network should have all signers online and sealing. If there is one or a few offline signers, this causes a lot of reorgs and in rare cases the network to get stuck. The drawback of having some blocktimes > 22 seconds is acceptable in our opinion and is not far off from main network blocktime fluctuations. The ultimate goal is to give in-turn blocks always a significant head-start. Offline or unhealthy signers should be released from their duty anyways.
The problem is these reorgs are not only frequent but predictable. Every block is basically sealend by 2-4 signers at the same time on Görli (for example). This is barely comparable to uncles on mainnet, rather, this creates a lot of edge cases where the network gets stuck (Kotti testnet halts every week). There is a research document written by the Pantheon developers explicitly outlining these edge-cases. I didn't include them in the spec but will link them here for reference: https://docs.google.com/document/d/1tmsr66sAPJmIZfSy5zck1uxnLzaXYxvzTNL5CF1pJQ0/
Clique currently defines a delay of
Aha, I see. We will spend some more thoughts on
Indeed, initially we picked
This is a good point. The problem with
This is something we should totally do. I'm personally willing to implement this in Parity once we agreed on the the basic specification. That way we can write and run tests and simulations. In general, are you interested in implementing Cliquey in Geth for testing/research purposes?
To be honest, I only know that Rinkeby is stable because you said so. I spend much more time on Görli and Kotti. Reasons could be various, my main suspect is always the things that are not or not clearly part of the spec or different client's implementations not being 100% accurate. There is still a lot to consider. Fact is that Görli and Kotti are much less stable. An interesting observation we have now on Görli versus Kotti is that Görli stabilized once we had much more validators (8 now) while Kotti with 3 keeps getting stuck. |
From Gitter
|
I think that MIN_WAIT should be (if at all) introduced on the acceptor side and not the producer side. I mean that no validator should accept OutOfTurn blocks earlier than MIN_WAIT after the block time. This way no malicious validators can push their blocks just to increase the number of reorgs. |
Needs some work. ⏳ |
Simple Summary
This document proposes a new proof-of-authority consensus engine that could be used by Ethereum testing and development networks in the future.
Abstract
Cliquey is the second iteration of the Clique proof-of-authority consensus protocol, previously discussed as "Clique v2". It comes with some usability and stability optimizations gained from creating the Görli and Kotti Classic cross-client proof-of-authority networks that were implemented in Geth, Parity Ethereum, Pantheon, Nethermind, and various other clients.
Motivation
The Kotti Classic and Görli testnets running different implementations of the Clique engine got stuck multiple times due to minor issues discovered. These issues were partially addressed on the mono-client Rinkeby network by optimizing the Geth code.
However, optimizations across multiple clients should be adequately specified and discussed. This working document is a result of a couple of months testing and running cross-client Clique networks, especially with the feedback gathered by several Pantheon, Nethermind, Parity Ethereum, and Geth engineers on different channels.
The overall goal is to simplify the setup and configuration of proof-of-authority networks, ensure testnets avoid getting stuck and mimicking mainnet conditions.
Rationale
The following changes were introduced over Clique EIP-225 and should be discussed briefly.
MIN_WAIT
period for out-of-turn block to be published which is not present for Clique. This addresses the issue of out-of-turn blocks often getting pushed into the network too fast causing a lot of short reorganizations and in some rare cases causing the network to come to a halt. By holding back out-of-turn blocks, Cliquey allows in-turn validators to seal blocks even under non-optimal network conditions, such as high network latency or validators with unsynchronized clocks.DIFF_INTURN
was increased from2
to7
to avoid situations where two different chain heads have the same total difficulty. This prevents the network from getting stuck by making in-turn blocks significantly more heavy than out-of-turn blocks.SIGNER_LIMIT
was removed from block sealing logic and is only required for voting. This allows the network to continue sealing blocks even if all but one of the validators are offline. The voting governance is not affected and still requires signer majority.[-BLOCK_PERIOD/4, BLOCK_PERIOD/4]
. With this, the average block time will still hover aroundBLOCK_PERIOD
.Finally, without changing any consensus logic, we propose the ability to specify an initial list of validators at genesis configuration without tampering with the
extraData
.