Node crashes on quick restart after being killed #2838

ulope · 2018-10-18T21:46:01Z

Problem Definition

(This may be parity related, haven't tested with Geth yet.)

Steps to reproduce:

Kill a node (with SIGKILL, -9) right after a new block was received
Restart node within a few seconds

The node crashes with a ValueError from within web3:

ValueError: {'code': -32000, 'message': 'One of the blocks specified in filter (fromBlock, toBlock or blockHash) cannot be found', 'data': '0x8b2045'}

Full logs
As you can see the data attribute in the error is increasing and seems to be a block number.

0x8b2045 -> 9.117.765
0x8b2047 -> 9.117.767
0x8b2048 -> 9.117.768

This error persists for a while but disappears after a few minutes.

The text was updated successfully, but these errors were encountered:

hackaugusto · 2018-10-19T10:20:55Z

Maybe this was fixed by #2671

ulope · 2018-10-19T18:39:28Z

@hackaugusto Unfortunately not, this was with latest master as of yesterday evening (e0f3e6f) which already includes that PR.

hackaugusto · 2018-10-24T12:55:35Z

Ahhh, I think I know what is the problem:

A node learns about a new block, updates its state, then crashes
On restart, the block number is recovered, and the filters are "installed" with the latest known block, this will eventually be a stateless filter instance
The race: The node finishes the above before a new block is mined
The bug: The filter is polled on first run, which always executes the callbacks, and eventually poll the filter for a block in the future:

raiden/raiden/utils/filters.py

Line 111 in 97820b6

self._last_block + 1,

The race happened under this circunstance: - A node learns about a new block, updates its state, then crashes - On restart, the block number is recovered, the filters are installed with the latest known block. - The race: The node finishes the above before a new block is mined - The bug: The filter is polled during start of the RaidenService, by calling the AlarmTask.first_run, which always executes the callbacks, eventually using the StateFilter's to poll for new events from a block in the future. The fix was to give the latest known block number to the alarm task in the as an argument for first_run, and only execute the callbacks if there is a new block. fixes raiden-network#2838

LefterisJP · 2018-10-28T17:20:41Z

As discussed in Rocketchat this should no longer be happening after #2895 which now updates the block state only with confirmed blocks.

The race happened under this circunstance: - A node learns about a new block, updates its state, then crashes - On restart, the block number is recovered, the filters are installed with the latest known block. - The race: The node finishes the above before a new block is mined - The bug: The filter is polled during start of the RaidenService, by calling the AlarmTask.first_run, which always executes the callbacks, eventually using the StateFilter's to poll for new events from a block in the future. The fix was to give the latest known block number to the alarm task in the as an argument for first_run, and only execute the callbacks if there is a new block. fixes raiden-network#2838

The race happened under this circunstance: - A node learns about a new block, updates its state, then crashes - On restart, the block number is recovered, the filters are installed with the latest known block. - The race: The node finishes the above before a new block is mined - The bug: The filter is polled during start of the RaidenService, by calling the AlarmTask.first_run, which always executes the callbacks, eventually using the StateFilter's to poll for new events from a block in the future. The fix was to give the latest known block number to the alarm task in the as an argument for first_run, and only execute the callbacks if there is a new block. fixes #2838

The race happened under this circunstance: - A node learns about a new block, updates its state, then crashes - On restart, the block number is recovered, the filters are installed with the latest known block. - The race: The node finishes the above before a new block is mined - The bug: The filter is polled during start of the RaidenService, by calling the AlarmTask.first_run, which always executes the callbacks, eventually using the StateFilter's to poll for new events from a block in the future. The fix was to give the latest known block number to the alarm task in the as an argument for first_run, and only execute the callbacks if there is a new block. fixes raiden-network#2838

ulope added the Severity / Minor label Oct 18, 2018

ulope mentioned this issue Oct 24, 2018

Payments wait forever to be conducted #2779

Closed

hackaugusto self-assigned this Oct 24, 2018

hackaugusto mentioned this issue Oct 24, 2018

Fix race condition with new blocks and filters #2884

Merged

LefterisJP closed this as completed in #2884 Nov 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node crashes on quick restart after being killed #2838

Node crashes on quick restart after being killed #2838

ulope commented Oct 18, 2018

hackaugusto commented Oct 19, 2018

ulope commented Oct 19, 2018 •

edited

Loading

hackaugusto commented Oct 24, 2018

LefterisJP commented Oct 28, 2018

Node crashes on quick restart after being killed #2838

Node crashes on quick restart after being killed #2838

Comments

ulope commented Oct 18, 2018

Problem Definition

hackaugusto commented Oct 19, 2018

ulope commented Oct 19, 2018 • edited Loading

hackaugusto commented Oct 24, 2018

LefterisJP commented Oct 28, 2018

ulope commented Oct 19, 2018 •

edited

Loading