Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design handshake between CN and BN to determine what Block to start streaming from. #120

Closed
Tracked by #170
a-saksena opened this issue Aug 22, 2024 · 2 comments · Fixed by #267
Closed
Tracked by #170
Assignees

Comments

@a-saksena
Copy link
Contributor

a-saksena commented Aug 22, 2024

Design a basic protocol for CN (or simulator) to initiate a stream to a BN that ensures the BN does not have any gaps in the block stream.

Initial Strawman

Block Node Connect

  • CN, on connect, sends block header, this contains block number.
    • If next block, no problem, start streaming.
    • If less than last known verified block, respond with "DuplicateBlock"
      • Response includes last known block, so CN can perhaps do its own catch up or reconnect.
      • This REQUIRES CN to check and resend block header, or end the stream and restart.
      • This includes if CN sends a block less than the last known block, that this block
        node, for some reason, does not actually hold.
        • In this case the block node must retrieve the missing block(s) from another
          block node to fill the gaps, but shall always respond to CN with the very
          latest known and verified block. The streaming API is only for current data,
          not for filling gaps.
    • If greater than next block, we missed block(s)
      • Respond with "Behind"
        • This includes last known block number.
        • CN will send from block after that block, or send "EndOfStream" and retry with exponential backoff.
        • CN will include earliest known block with end of stream, so we have an idea of the range to catch up.
          • This is advisory, and will almost certainly change before we finish "catching up".
        • If CN retries before we "catch up", we record the offered block number, and continue trying to "catch up" to that. Response is still "Behind" with last known block number.
          • This allows CN to jump in to "catch us up" directly if we're behind, but close enough.
        • We probably need a failure detection if the required target block doesn't get "closer" with each connection from CN.
      • If CN ends stream, need to catch up from other BN
        • Query BN "status" API, get last available block
          • If greater than or equal to block number CN sent
            • Ask for range, last-known-block+1 to last-available-block.
            • Hopefully catch up before next CN connection.
          • If less than block number CN sent
            • Either ask for stream last-known-block+1 to "infinite" and quit when caught up OR ask another BN, in case all needed blocks available elsewhere.
    • Each CN connect will send a block header, repeat above process until we get a matched block number or CN can finish catching us up.
  • Note, we can (re)enter connect any time we get a next block from CN that isn't what we expect. This simplifies logic for working out when to retry or reset a stream.

Error Handling

  • If CN detects an error at any time
    • Next BlockItem will be an EndStream item with an appropriate error code.
    • Block Node will drop any in-progress unproven block from that CN, and, if
      no remaining active incoming streams, notify all subscribers with an
      EndStream item specifying "source error".
    • Block Node will continue streaming from other incoming stream sources, if
      any, or await a restarted stream if no other incoming stream sources.
  • If a BN detects an error at any time
    • BN will send an EndStream response to all incoming streams, with appropriate
      status code.
      • CN, on receiving the end stream, will retry publishing the stream; and will
        use exponential backoff if the BN failure continues.
        • If CN has multiple "downstream" BN options, a CN may connect to an alternate
          BN for reliability and mark the failed BN as a backup.
    • BN will send EndStream to all subscribers with appropriate status code.
    • BN will either recover or await manual recovery.
@jsync-swirlds jsync-swirlds self-assigned this Aug 22, 2024
@jsync-swirlds jsync-swirlds changed the title Design hand-shake between CN and BN to determine what Block to start streaming from. Design handshake between CN and BN to determine what Block to start streaming from. Aug 27, 2024
@mattp-swirldslabs
Copy link
Contributor

This is an unlikely corner case but what's the behavior if the connect "next block" happens to be a missing block in the middle of the contiguous saved blocks? I imagine the BN needs to find missing gaps in the blocks it has stored. If the CN tries to send a block that happens to be less than the latest but is also missing, then it would respond with "DuplicateBlock" and the BN would send the latest block to start from. In a separate thread, it would backfill the missing block?

@jsync-swirlds
Copy link
Member

jsync-swirlds commented Sep 17, 2024

BN should not ever have an interstitial gap fillable from consensus node, if it does it should be requesting that block from another Block Node. Consensus Node should never send an old block when the Block Node has newer blocks, and the Block Node should refuse such streams with an appropriate error (likely "DuplicateBlock").
I added this specific case to the protocol above in the same duplicate block section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants