Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When validation is waiting for parent chain daemon, "stall". #1022

Conversation

gwillen
Copy link
Contributor

@gwillen gwillen commented Jul 20, 2021

Currently, if -validatepegin is given, and block validation can't proceed
because the parent chain is not synced, we mark the block invalid and put
it in a queue to be "revalidated" later. Unfortunately, marking a block
invalid has downstream consequences, in particular causing descendant blocks
to be marked invalid, which are not currently fixed by the queue.

Instead, we'll use a different strategy: if the mainchain daemon isn't
sufficiently synced to validate a block, we will "stall" connecting that
block to the chain, and have ActivateBestChain simply keep the tip at the
previous block until we're ready.

We can still download and validate (partly) blocks past this point while
we're waiting. They will be connected once the parent chain daemon catches
up.

@gwillen gwillen force-pushed the feature-debug-revalidation-v18 branch 2 times, most recently from 8fb9283 to 841fd14 Compare July 20, 2021 18:26
@gwillen gwillen force-pushed the feature-debug-revalidation-v18 branch from 469948b to 9c5ea27 Compare August 30, 2021 23:59
src/validation.cpp Outdated Show resolved Hide resolved
UniValue error = reply["error"];
if (!error.isNull()) {
// On the first call, it's possible to node is still in
// warmup; in that case, just wait and retry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should drop On the first call from this comment since you removed the init flag

Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack 9c5ea27

@apoelstra
Copy link
Member

rescinding ACK

@apoelstra
Copy link
Member

I rescind my rescindation. My previous ACK is still good.

@apoelstra
Copy link
Member

I rescind my ACK for real this time.

We can't combine the schedulers, cf #647

@apoelstra
Copy link
Member

To expand on the need for two schedulers: this code puts ActivateBestChain on a 30-second timer. The old code did something similar though more involved with its 2-minute "reverification scheduler". However, ActivateBestChain calls LimitValidationInterfaceQueue before taking any other action, which blocks on the validation queue being cleared. But the actions in this queue are executed on the scheduler thread ... the same thread that is sitting blocked! So if we want to call ABC periodically we need to give it its own thread.

Currently, if -validatepegin is given, and block validation can't proceed
because the parent chain is not synced, we mark the block invalid and put
it in a queue to be "revalidated" later. Unfortunately, marking a block
invalid has downstream consequences, in particular causing descendant blocks
to be marked invalid, which are not currently fixed by the queue.

Instead, we'll use a different strategy: if the mainchain daemon isn't
sufficiently synced to validate a block, we will "stall" connecting that
block to the chain, and have ActivateBestChain simply keep the tip at the
previous block until we're ready.

We can still download and validate (partly) blocks past this point while
we're waiting. They will be connected once the parent chain daemon catches
up.
- Finish removing all references to 'recheckpeginblockinterval', including
  documentation and tests.
- Remove periodic calls to MainchainRPCCheck; use it only at startup (and
  refactor accordingly to simplify logic.)
- Move MainchainRPCCheck from validation.h/cpp (public) to an internal
  helper function of init.cpp.
- Comment out definition of 'revalidation queue' type in txdb, to suppress
  "unused variable" warning. (Leave it visible to avoid future reuse.)
Add a regression test for ElementsProject#891 .

This checks that we can sync successfully when making a bunch of new blocks
just as we have transient loss of parent daemon connectivity. This reliably
fails without the fix, and reliably succeeds with it. (It stands in for the
situation, more common in production, where we sync faster than the parent
daemon can keep up after a long outage.)
@apoelstra apoelstra force-pushed the feature-debug-revalidation-v18 branch from 9c5ea27 to e5ba4e8 Compare September 7, 2021 15:58
@apoelstra
Copy link
Member

cc @gwillen I squashed a couple commits and un-removed the second scheduling thread

@apoelstra apoelstra added this to the 0.21 milestone Sep 8, 2021
@gwillen
Copy link
Contributor Author

gwillen commented Sep 9, 2021

@apoelstra Thanks, looks great. If that fixes your tests then I'm 100% happy with it.

@apoelstra apoelstra merged commit 28ba02e into ElementsProject:elements-0.18 Sep 9, 2021
apoelstra added a commit that referenced this pull request Sep 14, 2021
…in daemon, "stall"

b2e1cc3 Regression test for pegin validation issues during sync (Glenn Willen)
d5042b4 Finish removing 'recheckpeginblockinterval'; move MainchainRPCCheck (Glenn Willen)
313f73d When validation is waiting for parent chain daemon, "stall". (Andrew Poelstra)

Pull request description:

  Forward-port of #1022

ACKs for top commit:
  gwillen:
    utACK b2e1cc3, verified that it contains only the requested changes from da11d7b.

Tree-SHA512: 971b6a137efdc54f84995b17595346bf3d3ebd5a04eef1676225c664923e0b52393550bc51ea366c4d3010aa35a9de16da0643bb5bfb40c7eb788a180c99b1a0
gwillen added a commit to gwillen/elements that referenced this pull request Apr 21, 2022
When PR 1022 was merged, an & got dropped from the declarations of the two
"fStall" out-parameters, rendering them unused (and triggering a compiler
warning.) Restore them, restoring the functionality of ElementsProject#1022.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants