Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design block-by-block API #1245

Closed
LLFourn opened this issue Dec 19, 2023 · 6 comments
Closed

Design block-by-block API #1245

LLFourn opened this issue Dec 19, 2023 · 6 comments
Labels
discussion There's still a discussion ongoing
Milestone

Comments

@LLFourn
Copy link
Contributor

LLFourn commented Dec 19, 2023

Originally posted by @evanlinjin in #1172 (comment)

Requirements

  1. Ability to ONLY insert relevant checkpoints. Relevant means checkpoints to blocks which
    contain relevant transactions. This is important for doing block-by-block syncing. I.e. full node
    without CBF, a CBF node (we still want to filter out false-positives), and silent payments (for the
    future).

  2. Ability for the block-source to handle reorgs (mid-sync) without requesting data from
    bdk::Wallet.

Why apply_block_connected_to and apply_block_assume_connected Does Not Satisfy

Both these methods applies all block checkpoints, no matter if they contain relevant transactions or
not. To solve this, we can have another method, apply_block_relevant_assume_connected, that does
not apply checkpoints of blocks containing no relevant transctions. However, this cannot handle
reorgs mid-sync in an elegant way.

Let's assume we have apply_block_relevant_assume_connected which filters checkpoints, and the
folowing scenario plays out:

block_height        | 1 | 2 | 3 |
emitted_initially     A   B   C
relevant              A       C
emitted_post_reorg    A   B'

emitted_initially is the checkpoints that the chain-source has emitted. relevant is the
checkpoints that end up being stored in LocalChain. emitted_post_reorg is the checkpoints that
the chain-source re-emits due to reorg. As can be seen, A, B' (the update) cannot connect with
A, C. We need A, B', C' as the update.

My proposal

/// Introduces transactions of the given block to the wallet.
///
/// Only relevant transactions are inserted. Transactions are inserted alongside their anchors.
fn introduce_block_txs(&mut self, block: &Block, height: u32) { todo!() }

/// Introduces a chain `tip` to the wallet as a `CheckPoint`.
///
/// This updates the `last_synced_to_height: u32` parameter to the height of `tip` if `tip` can
/// connect with the internal `LocalChain`.
///
/// This method attempts to insert the `tip`, but only if it contains relevant transactions.
fn introduce_tip(&mut self, tip: CheckPoint) -> Result<(), CannotConnectError> { todo!() }

The chain-source emitter is responsible for emitting full blocks and checkpoints (that connects the
current block to previously emitted blocks).

The block is first processed by introduce_block_txs. This inserts relevant transactions and
associated anchors into the wallet.

Then we call introduce_tip. If the block contains relevent transactions, the LocalChain is
updated with this new tip (and only the tip, since we want to skip irrelevant checkpoints). If the
tip is irrelevant, we only update the last_synced_to_height: u32 value.

How does introduce_tip work?

Imagine a situation where the emitter has emitted block height 1 (with hash A) and height 2 (with
hash B). 1:A is considered relevant and 2:B is not. The state of the wallet's LocalChain
would be a single checkpoint [1:A].

If the next emission is 3:C and it contains relevant transactions, the tip input of
introduce_tip may contain the update chain [1:A, 2:B, 3:C] (but of course, we only want to
insert 3:C and not 2:B). The logic of introduce_tip will iterate the update chain backwards
to determine whether 3:C can connect with the wallet chain (in this case, it can via 1:A). With
this knowledge, try_apply_tip will create a trimmed update chain [1:A, 3:C] that is then applied
to just apply the new tip.

introduce_tip also needs to keep checkpoints that are needed to invalidate original checkpoints.
I.e. with a original chain [1:A, 3:C, 4:D] and an update chain [1:A, 3:C, 4:D', 5:E].
apply_tip still needs to keep height 4 in the update even though it is not the tip. The
stripped-update will be [3:C, 4:D', 5:E].

Some tips cannot connect, but we don't return error

Given a scenario with an original chain [1:A, 3:C, 5:E] (where the original tip is at height 5).
If there is a 2-block reorg, the chain-source emitter will be tempted to emit block at height 4 (as
that is earliest block reorged). The update chain may be [1:A, 2:B, 3:C, 4:D']. This cannot be
connect because we cannot know if 4:D' and 5:E belongs in the same chain. However, it is not the
end of the world as the next block emitted will be of height 5. So we just ignore this and wait for
the next emission instead of returning an error.

Initiating syncs

Because we are only include checkpoints which contain relevant transactions, we need somewhere
else to track the last-synced-to-height. This value is used when creating a new instance of a
chain-source-emitter. We need to track the last-synced-to-height in the wallet's changeset.

impl Emitter {
    /// We use `last_cp` for reorg detection. Otherwise, we start emitting from
    /// `last_height - assume_final_depth` where the first emission will connect to `last_cp`.
    fn new(last_height: u32, last_cp: CheckPoint) -> Self { todo!() }
}

Optimizing introduce_tip

Because we are only inserting checkpoints with relevant transactions, inserted checkpoints will be
few and far apart.

I.e. If the original chain is [1:A, 4:D] and the next relevant checkpoint is at height 4000,
this means we need to do 3996 (4000-4) iterations just to find out if checkpoint at height 4000 can
connect to 4:D.

A solution is to cache the most recent irrelevant checkpoints. For example, when we introduce
checkpoint at height 5 (which is irrelevant), we cache it and associate it with our highest relevant
checkpoint 4:D. We keep doing this so when we get to height 4000, we can iterate from the
introduced tip and find out that the previous node (at height 3999) is the same as the height 3999
that is cached. We also know that the cached checkpoint of height 3999 is connected to 4:D. We can
safely create a trimmed-update of [4:D, 4000:X].

Changes to Bitcoind RPC chain source

We need to emit CheckPoints alongside blocks.

@LLFourn
Copy link
Contributor Author

LLFourn commented Dec 19, 2023

Thanks for this proposal @evanlinjin. I don't agree with the requirements.

Requirements

1. Ability to **ONLY** insert _relevant_ checkpoints. _Relevant_ means checkpoints to blocks which
   contain relevant transactions. This is important for doing block-by-block syncing. I.e. full node
   without CBF, a CBF node (we still want to filter out false-positives), and silent payments (for the
   future).

This is not an optimization we require. Even when you do every single block it's around 19mb a year on disk. In 10 years 190mb. Consumer devices like phones will likely have quadruple their current storage by then. If you want to have a constant size LocalChain then you can just consolidate everything into a single ChangeSet and prune the irrelevant ones and write back out to disk. Applications that care about this can do this every time the app is opened if they want.

This keeps things simple.

2. Ability for the block-source to handle reorgs (mid-sync) without requesting data from
   `bdk::Wallet`.

I think you'll agree with restating this like this but I want to check:

The sole responsibility of the block source should be to emit blocks in topological order along with a block that it is connected to that it has emitted before or that we have passed in on the creation of the block source. In the case of a block source that emits every single block, it can just emit the block since the prev_blockchash will satisfy the constraints.

I don't think there is a problem with making this work with block-by-block updating. Any re-org will contradict a block and will work. The problem is with scenarios like CBF where not every block is emitted. A re-org may take place and a non-contradicting block connected before the tip may be emitted that now matches the filter. This won't connect.

However, rather than introducing complexity in the API I think the solution is just to do #1005 keeping this problem in mind. Making LocalChain monotone means that you can insert this block which can be assumed to be on a separate chain and the LocalChain will accept it. Once this other chain becomes longer than the previous tip chain it will be orphaned in favor of it. If indeed there was not a re-org then the two separate chains can be merged into one once this fact is established.

So in summary I think #1172 will be fine with things the way they are but #1005 should be done before CBF or scanblocks is attempted with Wallet/LocalChain (it's fine to use them with a custom ChainOracle).

@evanlinjin
Copy link
Member

LocalChain then you can just consolidate everything into a single ChangeSet and prune the irrelevant ones and write back out to disk. Applications that care about this can do this every time the app is opened if they want.

Okay I agree with this. This is a simpler approach.

Making LocalChain monotone

This means we need to change local_chain::ChangeSet since some checkpoints may not connect. This is not a problem, but I'm just pointing it out. Maybe ChangeSets will look like this:

pub type ChangeSet = BTreeSet<(BlockId, BlockId)>;

I think making LocalChain monotone is the simplest solution (API-wise).

@notmandatory notmandatory added this to BDK Dec 20, 2023
@notmandatory notmandatory moved this to Todo in BDK Dec 20, 2023
@notmandatory notmandatory added the discussion There's still a discussion ongoing label Dec 20, 2023
@evanlinjin
Copy link
Member

evanlinjin commented Dec 28, 2023

The sole responsibility of the block source should be to emit blocks in topological order along with a block that it is connected to that it has emitted before or that we have passed in on the creation of the block source. In the case of a block source that emits every single block, it can just emit the block since the prev_blockchash will satisfy the constraints.

@LLFourn This can only be done if LocalChain is monotone, so we should make it priority. I.e. a 2-block reorg (where the reorg happens at height 100) means the next block introduced needs to be connected to block height 99. The LocalChain update will be [99, 100'] and the original chain will be [.., 99, 100, 101]. This cannot connect.

The alternative for now (without the larger change of making LocalChain monotone), would be to use the following API:

fn introduce_block(&mut self, block: &Block, cp: CheckPoint) -> Result<(), CannotConnectError> { todo!() }

@nondiremanuel nondiremanuel added this to the 1.0.0-alpha.3 milestone Jan 2, 2024
@nondiremanuel nondiremanuel moved this from Todo to In Progress in BDK Jan 2, 2024
@LLFourn
Copy link
Contributor Author

LLFourn commented Jan 3, 2024

The sole responsibility of the block source should be to emit blocks in topological order along with a block that it is connected to that it has emitted before or that we have passed in on the creation of the block source. In the case of a block source that emits every single block, it can just emit the block since the prev_blockchash will satisfy the constraints.

@LLFourn This can only be done if LocalChain is monotone, so we should make it priority. I.e. a 2-block reorg (where the reorg happens at height 100) means the next block introduced needs to be connected to block height 99. The LocalChain update will be [99, 100'] and the original chain will be [.., 99, 100, 101]. This cannot connect.

The alternative for now (without the larger change of making LocalChain monotone), would be to use the following API:

fn introduce_block(&mut self, block: &Block, cp: CheckPoint) -> Result<(), CannotConnectError> { todo!() }

Just noting that @evanlinjin and I had a call about this and the claim made above is wrong. The [99,100'] chain will connect to the other one since it unambiguously connects and splits off at block 99. We were both confused about this a bit. We do not need LocalChain monotonicity PR for #1005 or any other tricks to do block-by-block syncing to a LocalChain. @evanlinjin can you confirm you agree with this and maybe revisit your design.

@evanlinjin
Copy link
Member

Yes, this is correct thank you @LLFourn for this comment.

@nondiremanuel nondiremanuel modified the milestones: 1.0.0-alpha.3, 1.0.0 Jan 6, 2024
@notmandatory notmandatory moved this from In Progress to Needs Review in BDK Jan 8, 2024
@evanlinjin
Copy link
Member

Implemented in #1172

@github-project-automation github-project-automation bot moved this from Needs Review to Done in BDK Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion There's still a discussion ongoing
Projects
Archived in project
Development

No branches or pull requests

4 participants