Skip to content

Commit

Permalink
fix: switch to fallback if primary node is stuck
Browse files Browse the repository at this point in the history
When the support of the Dencun fork was released on Goerli, there was a
problem with nodes. The primary node was not updated properly, so, in
response to this request `eth/v1/beacon/headers/finalized` it always
returned an incorrect finalized slot that was far behind the actual
latest finalized slot. In this case, the app waited for 7 minutes and
then switched to the fallback node. The fallback node worked correctly
and returned the correct finalized slot, so, the app could process the
next epoch using the response from the fallback node. But this correct
latest slot returned by the fallback node was stored in the
`this.latestSlot`, and then, on the next worker cycle, when the app
tried to use a primary node that was not working correctly, the app
again got a slot that was far behind from the latest. The app saw that
this "latest" slot was not greater than the slot stored in the
`this.latestSlot` variable, so, the `this.latestSlot` variable was not
updated and kept storing the correct latest slot returned by the
fallback node on the previous worker cycle. Because of this fact, this
condition below
``` if (processingState.epoch <= Math.trunc(this.latestSlot.slot /
this.config.get('FETCH_INTERVAL_SLOTS')))
```
became true and the app didn't switch to the fallback node anymore. So,
the app got stuck and couldn't process the next epochs.

Now we introduce the assumption that any nodes (primary or fallback)
must never return a slot that is less than the slot that the app already
processed on the previous worker cycles. If the app sees this case, it
assumes that the node that returns such a slot is not working correctly
and switches to the fallback node.

Also the condition in this check
``` if (processingState.epoch < Math.trunc(this.latestSlot.slot /
this.config.get('FETCH_INTERVAL_SLOTS')))
```
is now changed from `<=` to `<`. This is because if the currently
processing epoch stays equal to the latest finalized epoch for a long
time (more than 7 minutes), the app should consider this situation
incorrect and switch to the fallback node.
  • Loading branch information
AlexanderLukin committed Mar 6, 2024
1 parent eacccd9 commit afc1835
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions src/common/consensus-provider/consensus-provider.service.ts
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,14 @@ export class ConsensusProviderService {
{
maxRetries: this.config.get('CL_API_GET_BLOCK_INFO_MAX_RETRIES'),
useFallbackOnResolved: (r) => {
if (Number(r.data.header.message.slot) != this.latestSlot.slot) {
this.latestSlot = { slot: Number(r.data.header.message.slot), fetchTime: Number(Date.now()) };
const latestSlot = Number(r.data.header.message.slot);

if (latestSlot < this.latestSlot.slot) {
// we assume that the node must never return a slot less than the last saved slot
return true;
}
if (latestSlot > this.latestSlot.slot) {
this.latestSlot = { slot: latestSlot, fetchTime: Number(Date.now()) };
}
if (processingState.epoch < Math.trunc(this.latestSlot.slot / this.config.get('FETCH_INTERVAL_SLOTS'))) {
// if our last processed epoch is less than last, we shouldn't use fallback
Expand Down

0 comments on commit afc1835

Please sign in to comment.