Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Treat blocks, that already downloaded, as synced #11264

Merged
merged 1 commit into from
Dec 3, 2019

Conversation

grbIzl
Copy link
Collaborator

@grbIzl grbIzl commented Nov 15, 2019

Important: it's recreated #10864 I've decided to re-create it from the scratch, because it had a lot of not relevant commits, that were made during its testing. Now the fix is verified (in duet with #10895) and ready for review.

(Copypasted description)

This PR aims to fix the following scenario in the block sync:

  1. Node downloads the next block number N and starts importing it. Due to block's complexity or some configuration issues block's import takes more time (I've seen this issue, when import took 5-10 seconds).
  2. Meanwhile the next sync round has begun and Parity tries to sync block N once more time, obviously import of this block has no success with error AlreadyQueued. As a result the sync round has completed with 0 blocks synced
  3. As no new blocks were synced during sync round Parity starts descending in block number, treating this situation as error during blocks download: https://github.com/paritytech/parity-ethereum/blob/master/ethcore/sync/src/block_sync.rs#L429 and repeats sync for (already downloaded) blocks
  4. Attempt to sync block number N-1 returns the same error and causes attempt to download block N-2 and so on

See the sequence diagram:
image

This downfall only finishes, when (if) block number N is imported and Parity catches up the current best block
As a result huge DDOS of GetBlock requests outcomes from the node. This also can affect current block's verification and exacerbates the issue.
Suggested solution:
Treat downloaded blocks on step 2 (marked * in the diagram) and further, that exist already in the chain or in the queue, as actually synced on this round and (as result) do not start error handling procedure with blocks descending.

Comments:

Relates to #10724.
Also such DDOS can cause the hanging of other node (see #10821 ), that replies to these massive GetBlocks requests.
This issue has similarities to the described in #9531 but has different scenario and root causes are not connected.

@grbIzl grbIzl added A0-pleasereview 🤓 Pull request needs code review. M4-core ⛓ Core client code / Rust. labels Nov 15, 2019
@ordian ordian requested a review from ascjones November 15, 2019 11:35
@ordian ordian added this to the 2.7 milestone Nov 15, 2019
@dvdplm
Copy link
Collaborator

dvdplm commented Nov 17, 2019

Thank you for the great PR description.

Attempt to sync block number N-1 returns the same error

I don't understand this part. Even if block N takes a very long time to import, why does this mean that N-1 will also fail? Actually, isn't N-1 already imported?

This downfall only finishes, when (if) block number N is imported and Parity catches up the current best block

What are the conditions that could lead block N to not finish the import at all?

@grbIzl
Copy link
Collaborator Author

grbIzl commented Nov 18, 2019

I don't understand this part. Even if block N takes a very long time to import, why does this mean that N-1 will also fail? Actually, isn't N-1 already imported?

I've created a sequence diagram in order to illustrate the scenario.

What are the conditions that could lead block N to not finish the import at all?

Any block import can fail.

@dvdplm
Copy link
Collaborator

dvdplm commented Nov 18, 2019

I've created a sequence diagram in order to illustrate the scenario.

Neat, thank you. Questions:

image

Any block import can fail.

Sure, but specifically in the scope of the bug this is fixing, what is causing the import to be so slow, do we know?

@grbIzl
Copy link
Collaborator Author

grbIzl commented Nov 18, 2019

Should this be N?

I think, I've named last synced block parameter not correctly. It's actually last_imported_block.The diagram is modified accordingly. See retraction logic here: https://github.com/paritytech/parity-ethereum/blob/master/ethcore/sync/src/block_sync.rs#L432

Sure, but specifically in the scope of the bug this is fixing, what is causing the import to be so slow, do we know?

I think, that import time is pretty standard (depends on block ofc). The reason of such behavior is not the slowness of blocks' import, but network latency. Or it would be better to say, lack of network latency. In regular configurations (when archive runs on the separate node) requesting and downloading takes more time, so this retraction doesn't happens so fast. In the issue, archive and full node run on the same machine, that makes the issue easier to reproduce.

Copy link
Contributor

@ascjones ascjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From memory this code is difficult to reason about (and test) - so the sequence diagram is super useful.

Indeed it looks as if this will solve the immediate issue, so we just need to make sure there are no unintended consequences.

So the question is: can we think of a scenario where having imported.len() == 0 for AlreadyQueued is required, in order to trigger a retraction?

@grbIzl
Copy link
Collaborator Author

grbIzl commented Dec 3, 2019

So the question is: can we think of a scenario where having imported.len() == 0 for AlreadyQueued is required, in order to trigger a retraction?

From my point of view this was a bug in the general logic of retraction. imported vector is converted directly into imported_this_round and the main point, where it's used and considered, is here: https://github.com/paritytech/parity-ethereum/blob/master/ethcore/sync/src/block_sync.rs#L432 So let's look into this place of code: do we want to start retraction, if on this round we do download blocks but they are in importing queue already? It's kind of intermediate state for block sync, its status is not defined yet: do we need to proceed further or do we need to retract back? I suggest, that because of its "intermediateness" we should stay on this place until blocks are imported. IMHO it's the only solution for such state.

Copy link
Contributor

@ascjones ascjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@ordian ordian merged commit 2895e3b into master Dec 3, 2019
@ordian ordian deleted the PreventBlocksReimport2 branch December 3, 2019 14:59
@ordian ordian added A8-looksgood 🦄 Pull request is reviewed well. and removed A0-pleasereview 🤓 Pull request needs code review. labels Dec 3, 2019
@ordian
Copy link
Collaborator

ordian commented Dec 3, 2019

@grbIzl please add patch labels if you think it should be backported

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A8-looksgood 🦄 Pull request is reviewed well. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants