produce a consensus outcome for fatal errors #508

raulk · 2022-04-22T12:42:50Z

Context

Fatal errors are raised when we encounter system-level unexpected conditions during message execution. They are severe and usually indicate a correctness flaw. Either something is found to be broken at runtime (e.g. state tree cannot be decoded, init actor is not found, etc.), or we've hit some kind of programming error.

Fatal errors occur in the FVM itself, outside actor code. Panics in actor code are properly handled by emitting exit code USR_ASSERTION_FAILED. See the FVM error spec.

Currently, on a fatal error, the Executor fails to apply the messge returns the error to the caller (the Filecoin client). However, there is no possible course of action the caller can take.

There are several outcomes here:

If the failure is caused by a local condition, the node will fork off from the network.
If the failure is caused by a condition reproduceable in a subset of nodes, the chain will fork.
If the failure is caused by a generalised error, the network could halt.

Proposal

The goal is to allow the chain to make progress in the presence of network-wide fatal errors.

Convert fatal errors into receipts with a designated SYS_INTERNAL_ERROR exit code.
Revert all state tree changes.
Consume all gas; it doesn't matter at which point the error happened (could've been at different points in different nodes), the result gas consumption will be identical.

This results in deterministic behaviour that the network can agree on in the presence of fatal errors.

Result

If the fatal error affects a single node, that node will produce the above message receipt leading to a consensus fault with the rest of the network, i.e. node strays off just like before.
If the fatal error affects the entire network, the network agrees that an internal error happened during the processing of that message, and moves on without halting. (There's some chance that different nodes will observe different internal errors at different points in the execution, yet they will arrive to the same result; this is intended).

Implementation notes

This change can be entirely self contained inside the DefaultExecutor.

This resolves the "Panics during message execution" area of investigation under #428.

The text was updated successfully, but these errors were encountered:

filecoin-project/ref-fvm#508

jennijuju assigned Stebalien May 2, 2022

raulk mentioned this issue May 5, 2022

nv16 development checklist #531

Closed

48 tasks

raulk added a commit to filecoin-project/fvm-specs that referenced this issue May 10, 2022

errors: add SYS_INTERNAL_ERROR exit code.

8f91341

filecoin-project/ref-fvm#508

raulk mentioned this issue May 10, 2022

errors: explain SYS_ASSERTION_FAILED exit code in more detail. filecoin-project/fvm-specs#87

Merged

raulk added a commit to filecoin-project/fvm-specs that referenced this issue May 10, 2022

linkify reference to filecoin-project/ref-fvm#508

c8199bc

raulk mentioned this issue May 11, 2022

transmute fatal errors into SYS_ASSERTION_FAILED exit code. #548

Merged

raulk closed this as completed in #548 May 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

produce a consensus outcome for fatal errors #508

produce a consensus outcome for fatal errors #508

raulk commented Apr 22, 2022 •

edited

Loading

produce a consensus outcome for fatal errors #508

produce a consensus outcome for fatal errors #508

Comments

raulk commented Apr 22, 2022 • edited Loading

Context

Proposal

Result

Implementation notes

raulk commented Apr 22, 2022 •

edited

Loading