-
Notifications
You must be signed in to change notification settings - Fork 798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lookahead collator: do not try to build upon an unknown block #4036
Conversation
@@ -315,13 +315,14 @@ where | |||
let mut parent_header = initial_parent.header; | |||
let overseer_handle = &mut params.overseer_handle; | |||
|
|||
// We mainly call this to inform users at genesis if there is a mismatch with the | |||
// on-chain data. | |||
collator.collator_service().check_block_status(parent_hash, &parent_header); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we not directly call continue
here? Why move it to the loop below?
Parent hash and parent header below will only change if we built a new block ortselves, so that should always return true in subsequent calls anyway (or I am missing something).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parent_hash
and parent_header
get updated at the bottom of the inner loop. On the other hand, they are updated to values we get after a successful collation, which are unlikely to be unknown or bad 🤔 Need to think a bit more, thanks for pointing out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parent_hash
andparent_header
get updated at the bottom of the inner loop. On the other hand, they are updated to values we get after a successful collation, which are unlikely to be unknown or bad 🤔 Need to think a bit more, thanks for pointing out!
Exactly, that is what I meant. We build a block successfully and set parent to it. But since we built it ourselves it will for sure be in-chain. If the first parent itself if messed up we will abort before entering the 0..2
loop.
// This needs to change to support elastic scaling, but for continuously | ||
// scheduled chains this ensures that the backlog will grow steadily. | ||
for n_built in 0..2 { | ||
// Do not try to build upon an unknown, pruned or bad block | ||
if !collator.collator_service().check_block_status(parent_hash, &parent_header) { | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this happen often ? I'd still add a trace here just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not often at all, and check_block_status()
has very detailed debug-level messages inside.
I think it depends on the exact reason for block not being present. If for some reason we didn't see it yet but it's present on the relay chain we should just fetch it or expect that it is announced by another collator. @skunert does this make any sense ? |
I did not think enough about this first time I checked here, I am not sure how this scenario should even occur. The parent we are checking in the code was found via
So if we see a new pending para block in the relay chain and it is not announced anywhere soon, we will automatically start fetching it with the pov-recovery mechanism. This already happens. Edit: Aah, we assume that included and pending are locally available, which might not be the case. So change makes sense. |
@sandreim, if you could point me out to the interfaces that would help me achieve that, that could be a great addition. Still, I feel like it is better to have it as a follow-up. Testing the soundness of such a solution could be a real hell IIUC 😟 @skunert I came across that during the live testing on Kusama (paritytech/devops#3261). It looked like this: |
Okay makes sense, we see in the logs that this block was imported right after we errored. So yeah in that situation makes sense to skip. |
Generally weird that the block didn't make it in time over the parachain network to the collator. However, this stuff is not predictable. Also given that there are now that many different collator implementations are running together, makes it more complicated to reason about on what happened. |
Merged into #3630 |
No description provided.