Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Previous epoch attestation(s)" errors and warnings—how to remedy or at least reduce? #3413

Closed
JamesCropcho opened this issue Aug 2, 2022 · 2 comments

Comments

@JamesCropcho
Copy link

JamesCropcho commented Aug 2, 2022

Description

The log of lighthouse beacon_node (when restricted to ERRO and WARN) has clumps of entries like:

13:01:08.219 ERRO Previous epoch attestation(s) missing   validators: ["val1", "val2"], epoch: 137070, service: val_mon
13:01:08.219 WARN Previous epoch attestation(s) failed to match head, validators: ["val1", "val3", "val2"], epoch: 137070, service: val_mon
13:01:08.219 WARN Previous epoch attestation(s) failed to match target, validators: ["val1", "val2"], epoch: 137070, service: val_mon
13:01:08.219 WARN Previous epoch attestation(s) had sub-optimal inclusion delay, validators: ["val3"], epoch: 137070, service: val_mon

These clumps appear perhaps once every two hours on a beacon node whose validator client has ~100 validators. Notable configuration includes:

--validator-monitor-auto
--target-peers 18
--http-disable-legacy-spec

--block-cache-size is left to the default.

--target-peers was lowered in the interest of reducing compute costs by reducing outbound traffic (aside: however, honestly the change does not appear to have affected the volume of outbound traffic).

The clumps do not always include the ERRO entry.

Version

https://github.com/sigp/lighthouse/releases/download/v2.5.0/lighthouse-v2.5.0-aarch64-unknown-linux-gnu.tar.gz

Expected Behaviour

I have a lot of experience administering decentralized systems and I do understand that a certain frequency of—and types of—warnings are a normal part of healthy operation. However, 1) I have an interest in optimizing this staking infrastructure both from the perspective of maximizing rewards as well as minimizing compute costs; and 2) errors are often more severe than warnings, so if the same one occurs regularly I at least need to comprehend it.

Steps to Resolve

If anyone is able to tell me whether any of the below tactics (or others) may possibly reduce the frequency or severity (e.g. remedy the ERRO) of said events, and hence to resolve the issue, or—just as valuable—if any/all are likely to have no helpful effect, I would appreciate it:

  1. Increase/decrease target-peers
  2. Increase/decrease block-cache-size
  3. Increase the IOPS (I/O ops per second) available to the SDD storage of the data-dir
  4. Increase the throughput (MB per second) available to the SDD storage of the data-dir
  5. Increase the number of CPU/vCPU cores of the cloud instance running lighthouse beacon_node
  6. Increase the allotted network performance capacity of the cloud instance running lighthouse beacon_node

Thanks for reading.

@pawanjay176
Copy link
Member

I think the first thing to do is increase your target peers to the default value (80). Keeping your target peers at such a low value will lead to your node not finding enough peers in the required attestation subnets which will lead to your attestation not reaching enough aggregators and eventually not getting included in blocks.

Increasing the target peers will not have that big effect on the bandwidth, that depends on gossipsub protocol configs. See this comment for more details #3005 (comment)

@JamesCropcho
Copy link
Author

Thank you for your insight and helpful reply, Pawan, as well as the link to the earlier comment.

Okay, I have increased target-peers to the default.

Also, "for fun" I have increased block-cache-size to 15—my lighthouse beacon_node seems to require 8 cores and I have a suplus of RAM. If you believe I should not mess with block-cache-size kindly let me know and I'll restore the default value. My thinking here is along the lines of, "eh, couldn't hurt," but I may be mistaken.

I have nothing to report as I'll have to wait at least 24h to see what happens, but I can report back upon request.

Thank you again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants