Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Error: DoubleVote #5907

Closed
phahulin opened this issue Jun 22, 2017 · 11 comments
Closed

Error: DoubleVote #5907

phahulin opened this issue Jun 22, 2017 · 11 comments
Labels
F5-documentation 📑 Documentation needs fixing, improving or augmenting. M3-docs 📑 Documentation.

Comments

@phahulin
Copy link
Contributor

Hi everyone,
we're trying to set up a PoA test network where list of validators is provided by a contract.
Initially there is a single node and then 12 more are added one by one via the contract. These nodes connect to the initial one as a bootnode, they don't connect to each other directly. So there is 12 to 1 correspondence.

We run on Parity/v1.6.8-beta-c396229-20170608/x86_64-linux-gnu/rustc1.17.0.
At some transaction (can't tell exactly which one, seems to be random) the following error occurs:

...
2017-06-22 12:39:04 UTC Verifier #0 TRACE ethcore::state_db  Cache lookup skipped for 9154f5f8c33dfa92cc49a5cdae504cd82d4b0952: no parent hash
2017-06-22 12:39:04 UTC Verifier #0 TRACE account_bloom  Check account bloom: 9154f5f8c33dfa92cc49a5cdae504cd82d4b0952
2017-06-22 12:39:04 UTC Verifier #0 TRACE state  add_balance(9154…0952, 0): 0
2017-06-22 12:39:04 UTC Verifier #0 TRACE ethcore::state_db  Cache lookup skipped for 9154f5f8c33dfa92cc49a5cdae504cd82d4b0952: no parent hash
2017-06-22 12:39:04 UTC Verifier #0 DEBUG engine  Set of validators obtained: [dd0bb0e2a1594240fed0c2f2c17c1e9ab4f87126, 9154f5f8c33dfa92cc49a5cdae504cd82d4b0952, 12a71bc07e8ae18019f343261e017d07a5217494, ad42a938bef94b804149bf35a854eb85505116df, 2b1d1b8cb055d300acf065b357be9e3e308a0daf, 9184a9e9d098b91ad5c2a64ceade8fe33f59eedb, abeeee1006e2a2a016e1cb668bbfbfacf40008ca, 1d5366bb34d4837991f7192e0f53093bb76d8c85, 1f3b85de5b011d6ff931aed076417b2642afbdc6, a8f7eaab62744dac30e636bf875ea456de097456, 9b16beff3641f0c96bc3cb1b01637baa64fd161b, 96e58e6c9a0d1ed9b52dc67eaf586143997edae0, f7eed6f9592d5c99d184af6bd9aae05046b2ca6d]
2017-06-22 12:39:04 UTC Verifier #0 TRACE engine  Multiple blocks proposed for step 299617384.
2017-06-22 12:39:04 UTC Verifier #0 WARN client  Stage 3 block verification failed for #4589 (63e3…d402)
Error: Engine(DoubleVote(dd0bb0e2a1594240fed0c2f2c17c1e9ab4f87126))
2017-06-22 12:39:04 UTC Verifier #0 TRACE perf  import_verified_blocks: 21.10ms
2017-06-22 12:39:04 UTC  TRACE mio::timer  tick_to; now=564; tick=564
2017-06-22 12:39:04 UTC  TRACE mio::timer  ticking; curr=Token(18446744073709551615)
2017-06-22 12:39:04 UTC  TRACE mio::event_loop  event loop tick
2017-06-22 12:39:04 UTC  TRACE mio::timer  tick_to; now=565; tick=565
2017-06-22 12:39:04 UTC  TRACE mio::timer  ticking; curr=Token(18446744073709551615)
2017-06-22 12:39:04 UTC  TRACE mio::event_loop  event loop tick
...

This causes nodes to disconnect from the bootnode and continue on their own. However, later some of them may reconnect, and later disconnect again when the same error happens to a new transaction.
Also note that we have force_sealing = true in config files, can this be the source of the problem? May be we should keep it only on one node (initial one)

@keorn keorn added the M4-core ⛓ Core client code / Rust. label Jun 22, 2017
@keorn
Copy link

keorn commented Jun 22, 2017

Are you sure that no 2 nodes run with the same --engine-signer?

@phahulin
Copy link
Contributor Author

phahulin commented Jun 22, 2017

Yes, double-checked that, and also that it matches with address in key-file from parity/keys/NetworkName/

@keorn
Copy link

keorn commented Jun 22, 2017

Can you post the json chain spec and longer -lengine=trace logs indicating from which node they are? Does the issue only occur with a larger number of nodes or can you replicate it with only 2?

@phahulin
Copy link
Contributor Author

Here is a link to the spec.
It is probably worth noting that process of adding new validators is 2-step: we first issue initial keys and then exchange them for mining keys via this DApp.

Here is log file. This is not obtained from a live network because right now there is none. Instead, on one of validator nodes I cleared chains and network folders, so this file logs syncing with bootnode starting from block 0. The error reoccurs on 4589 block.

I've set up a smaller network without force_sealing and will see if the problem occurs there too.

@5chdn 5chdn added the F2-bug 🐞 The client fails to follow expected behavior. label Jun 23, 2017
@phahulin
Copy link
Contributor Author

The error reproduces for 4 nodes (initial mining node used as bootnode + 3 validators added later via contract).
It occurred when I was generating keys for the 4th validator (this procedure requires 3 transactions), I was connected to the bootnode. I had to turn on force_seal = true otherwise there were just too few blocks in the network.

So there are:

  1. bootnode == initial mining node
    0xdd0bb0e2a1594240fed0c2f2c17c1e9ab4f87126
    log file, error appears on 1181, 1183 and 1185 blocks

  2. new validator 1
    0x55e89e03d0c207322db5b83a6cd12f74b93cff2b
    log file, error appears on 1372 block

  3. new validator 2
    0x518d6446a6e766a03cfe39ac44f4f97f5da0fb43
    log file, error appears on 1372 block

  4. new validator 3
    0x99478bd03d632023831aa64143c16060c0128653
    log file, error appears on 1372 block. Unfortunately, on this node I forgot to add -lengine sync before I started to generate new validator and error occurred, so it's just a plain log.

@phahulin
Copy link
Contributor Author

Above was still tested on 1.6.8

@rstormsf
Copy link

Any update on this?

@keorn
Copy link

keorn commented Jun 26, 2017

Could you please check the latest master or a nightly build?

@phahulin
Copy link
Contributor Author

Hi guys,
looks like we've make it work on 1.6.8: the problem seems to be that 12 mining nodes did not communicate with each other, only with bootnode (so that each had 1 peer = bootnode and bootnode had 12 peers). Now that all of them have 12 peers, error does not reproduce.
In this regard, I think docs need clarification as it's not mentioned there that 30303 should be opened via udp as well (docker ... -p 30303:30303/udp ...) & that it is necessary to provide actual public IP in --nat extip:111.111.111.111, otherwise a node can't find other nodes, unless they are provided via --bootnodes or added by rpc call.

I also tried to test master, but couldn't connect to DApps - it's asking for security token each time. If I generate the token, it accepts it (no error) but then asks again. Tried from different machines, time looks to be in sync with the server (at most 1 sec difference). When I start the node, there are these messages

Option '--dapps-port' is deprecated. Please use '--jsonrpc-port' instead.
Option '--dapps-interface' is deprecated. Please use '--jsonrpc-interface' instead.
Option '--dapps-hosts' is deprecated. Please use '--jsonrpc-hosts' instead.
Option '--dapps-cors' is deprecated. Please use '--jsonrpc-cors' instead.
2017-06-27 22:43:36 UTC Starting Parity/v1.7.0-unstable-02edc95-20170623/x86_64-linux-gnu/rustc1.18.0
2017-06-27 22:43:36 UTC Keys path parity/keys/OraclesPoA
2017-06-27 22:43:36 UTC DB path parity/chains/OraclesPoA/db/490278e0adc06935
2017-06-27 22:43:36 UTC Path to dapps parity/dapps
2017-06-27 22:43:36 UTC State DB configuration: fast
2017-06-27 22:43:36 UTC Operating mode: active
2017-06-27 22:43:37 UTC Updated conversion rate to Ξ1 = US$270.72 (439744420 wei/gas)
2017-06-27 22:43:38 UTC Configured for OraclesPoA using AuthorityRound engine
2017-06-27 22:43:39 UTC Error generating epoch change proof for block 4ecb…5c5e: Engine error (Insufficient validation proof: Caller insufficient to generate validator proof.)
2017-06-27 22:43:39 UTC Snapshots generated by this node will be incomplete.

Maybe some configuration needs to be changed when switching from 1.6?

@artjoma
Copy link

artjoma commented Jul 5, 2017

Parity: Starting Parity/v1.6.8-beta-c396229-20170608/x86_64-linux-gnu/rustc1.17.0
Same issue:
2/25 peers 132 KiB db 120 KiB chain 0 bytes queue 10 KiB sync RPC: 0 conn, 0 req/s, 0 µs
2017-07-05 20:29:32 UTC Stage 3 block verification failed for #24 (1c28…5e4c)
Error: Engine(DoubleVote(0052fb1ca9659ada7db23796bae3041b5dbb9507))

Private network with AURA and 2 peers. One peer was stopped for maintenance, after setup this error.

@rphmeier
Copy link
Contributor

rphmeier commented Jul 6, 2017

We are fairly certain this is addressed with 1a6f4f6#diff-3b2a71ed5fccb75dfe326f3b122487cdR783

The state of authority contracts in 1.6.* is very experimental. 1.7.0 will include changes for light client friendliness and warp sync, with only a slightly different ABI. When migrating to master you will need to restart your chain as the database will be incompatible.

@5chdn 5chdn added F5-documentation 📑 Documentation needs fixing, improving or augmenting. M3-docs 📑 Documentation. and removed F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust. labels Jul 24, 2017
@5chdn 5chdn closed this as completed Aug 7, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
F5-documentation 📑 Documentation needs fixing, improving or augmenting. M3-docs 📑 Documentation.
Projects
None yet
Development

No branches or pull requests

6 participants