Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lookahead collator: do not try to build upon an unknown block #4036

Closed
wants to merge 5 commits into from

Conversation

s0me0ne-unkn0wn
Copy link
Contributor

No description provided.

@s0me0ne-unkn0wn s0me0ne-unkn0wn added R0-silent Changes should not be mentioned in any release notes T9-cumulus This PR/Issue is related to cumulus. labels Apr 9, 2024
@@ -315,13 +315,14 @@ where
let mut parent_header = initial_parent.header;
let overseer_handle = &mut params.overseer_handle;

// We mainly call this to inform users at genesis if there is a mismatch with the
// on-chain data.
collator.collator_service().check_block_status(parent_hash, &parent_header);
Copy link
Contributor

@skunert skunert Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not directly call continue here? Why move it to the loop below?
Parent hash and parent header below will only change if we built a new block ortselves, so that should always return true in subsequent calls anyway (or I am missing something).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parent_hash and parent_header get updated at the bottom of the inner loop. On the other hand, they are updated to values we get after a successful collation, which are unlikely to be unknown or bad 🤔 Need to think a bit more, thanks for pointing out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parent_hash and parent_header get updated at the bottom of the inner loop. On the other hand, they are updated to values we get after a successful collation, which are unlikely to be unknown or bad 🤔 Need to think a bit more, thanks for pointing out!

Exactly, that is what I meant. We build a block successfully and set parent to it. But since we built it ourselves it will for sure be in-chain. If the first parent itself if messed up we will abort before entering the 0..2 loop.

// This needs to change to support elastic scaling, but for continuously
// scheduled chains this ensures that the backlog will grow steadily.
for n_built in 0..2 {
// Do not try to build upon an unknown, pruned or bad block
if !collator.collator_service().check_block_status(parent_hash, &parent_header) {
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this happen often ? I'd still add a trace here just in case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not often at all, and check_block_status() has very detailed debug-level messages inside.

@sandreim
Copy link
Contributor

sandreim commented Apr 9, 2024

I think it depends on the exact reason for block not being present. If for some reason we didn't see it yet but it's present on the relay chain we should just fetch it or expect that it is announced by another collator. @skunert does this make any sense ?

@skunert
Copy link
Contributor

skunert commented Apr 9, 2024

I did not think enough about this first time I checked here, I am not sure how this scenario should even occur. The parent we are checking in the code was found via find_potential_parents, which starts at the included block and checks all child branches to find a parent. So parents that we find should generally be in the local db. @s0me0ne-unkn0wn did you see this problem in the wild or was it added as precaution?

I think it depends on the exact reason for block not being present. If for some reason we didn't see it yet but it's present on the relay chain we should just fetch it or expect that it is announced by another collator. @skunert does this make any sense ?

So if we see a new pending para block in the relay chain and it is not announced anywhere soon, we will automatically start fetching it with the pov-recovery mechanism. This already happens.

Edit: Aah, we assume that included and pending are locally available, which might not be the case. So change makes sense.

@s0me0ne-unkn0wn
Copy link
Contributor Author

@sandreim, if you could point me out to the interfaces that would help me achieve that, that could be a great addition. Still, I feel like it is better to have it as a follow-up. Testing the soundness of such a solution could be a real hell IIUC 😟

@skunert I came across that during the live testing on Kusama (paritytech/devops#3261). It looked like this:
image

@skunert
Copy link
Contributor

skunert commented Apr 9, 2024

@skunert I came across that during the live testing on Kusama (paritytech/devops#3261). It looked like this: image

Okay makes sense, we see in the logs that this block was imported right after we errored. So yeah in that situation makes sense to skip.

@bkchr
Copy link
Member

bkchr commented Apr 9, 2024

Generally weird that the block didn't make it in time over the parachain network to the collator. However, this stuff is not predictable. Also given that there are now that many different collator implementations are running together, makes it more complicated to reason about on what happened.

@bkchr bkchr added this pull request to the merge queue Apr 9, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 9, 2024
@s0me0ne-unkn0wn s0me0ne-unkn0wn added this pull request to the merge queue Apr 10, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 10, 2024
@s0me0ne-unkn0wn s0me0ne-unkn0wn added this pull request to the merge queue Apr 10, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Apr 10, 2024
@s0me0ne-unkn0wn s0me0ne-unkn0wn added this pull request to the merge queue Apr 10, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 10, 2024
@s0me0ne-unkn0wn
Copy link
Contributor Author

Merged into #3630

@s0me0ne-unkn0wn s0me0ne-unkn0wn deleted the s0me0ne/no-build-upon-unknown-block branch April 11, 2024 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
R0-silent Changes should not be mentioned in any release notes T9-cumulus This PR/Issue is related to cumulus.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants