-
-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: update execution client state change logs #5771
Conversation
Performance Report✔️ no performance regression detected Full benchmark results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, I agree with you re the log level discussion
break; | ||
case ExecutionEngineState.SYNCING: | ||
this.logger.info("ExecutionEngine is syncing"); | ||
this.logger.warn("Execution client is syncing"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should not be a warning. It's a normal scenario that will happen a lot of times and syncing is a normal process not something a user should be worried about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My assumption here is that this is only printed out if the state changes, therefore this should not happen often during normal operation, if the EL gets into syncing mode there is degraded performance, the node might not be able to perform correct head votes / miss block proposals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think warn level is good here because it will only come if execution goes into syncing from synced, which is actually not supposed to happen often once execution syncs up with the CL, and users getting notified that node won't be able to perform its duties till execution is back up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's the sync stats among the EL nodes, not the with the CL. And EL nodes can get unsynced and synced a lot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sync stats among the EL nodes
That's an interesting topic, how does this actually work if you specify multiple ELs in --execution.urls
.
I see issues with this
- if one beacon node has multiple ELs only one gets payload and fork choice updates, how are the fallback ELs able to sync? I thought this requires some sort of multiplexer / middleware that sends request to all ELs.
- if a execution client is connected to multiple beacon nodes it causes frequent reorgs as it might received different fork choice updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EL nodes can get unsynced and synced a lot
At least in the normal case with just one execution client I have not yet seen that the EL is getting back to syncing mode if it was already synced before and has been up 24/7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's the sync stats among the EL nodes, not the with the CL. And EL nodes can get unsynced and synced a lot?
no they can't once synced they should stay synced with the CL head as long as CL doesn't go out of sync and then CL doesn't has to do batch updates to EL (our req/resp is batch processing) to get back to head
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think warn should be better then, let's keep the change
break; | ||
case ExecutionEngineState.SYNCING: | ||
this.logger.info("ExecutionEngine is syncing"); | ||
this.logger.warn("Execution client is syncing"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.logger.warn("Execution client is syncing"); | |
this.logger.warn("Execution client is syncing hence client would not be able to fully participate in the network till its synced"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure about adding this, we are not printing this out when the beacon node itself is syncing, my general feeling about this is that users/operators generally understand what a syncing client implies (I guess it becomes more complicated in case of EL, see my other comment below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hence client would not be able to fully participate in the network till its synced
@g11tech on this topic, what does "not fully" mean in that case, I saw users complain about missed head votes which was due to EL lacking behind / unsynced.
My understanding is
- if EL is really far behind --> missed attestations / proposals
- if EL is close to being synced (~ 1-2 epochs behind) --> incorrect / missing head votes, att target / source votes still work, missed proposals
- if EL is slow to process blocks (can't quite keep up), same as degraded performance as point 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if el is not synced with the cl head then except gossip we do not participate in the network at all including attests, proposals aggregates etc. so none of validator duties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not participate in the network at all
In that case I think it should be fine to just print "Execution client is syncing" as warn. Maybe the scenario where all head votes are missed is related to the CL not being able to keep up with head (but close to it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan really of long explanations in the logs. I think something more concise would be: Execution client is syncing. Validator duties on hold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan really of long explanations in the logs
Yeah, I think we should keep logs simple and use commonly understood terms. For errors that crash the process or fail a CLI command we can add more details as it is a one time message and we should help the user to fix the issue without having to guess what went wrong or having to ask in discord because the error is unclear.
Validator duties on hold
hmm personally not a fan, would have to add this to offline state change message as well and then add another message to online, synced that indiciates that those are not "on hold" anymore.
break; | ||
case ExecutionEngineState.SYNCED: | ||
this.logger.info("ExecutionEngine is synced"); | ||
this.logger.info("Execution client is synced"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.logger.info("Execution client is synced"); | |
this.logger.info("Execution client is synced and client can now participate fully in the network"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note on this log "ExecutionEngine is synced", right now it is printed out any time the beacon node starts because the default state is online
and then it goes to synced
state if EL is synced.
Therefore, printing "can now participate" is not that accurate in that case. I would suggest we keep the logs simple for now and revisit it if we get user feedback on it or if we notice that there is confusion around what this means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long does it take usually for Lodestar to know whether the EL is actually in a synced
state or if this is just the default online
state due to beacon startup? If there's some sort of way to delay/timeout this message to ensure it actually is synced
, that might work better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long does it take usually for Lodestar to know whether the EL is actually in a synced state
After first request to EL we know the actual status and print out a status change message
ensure it actually is synced
We know the correct state at this point in time since we made an request to EL, "can now participate" is just unnecessary here because beacon node was just starting up and EL was likely always synced.
The message only makes sense to me if EL goes from syncing to synced.
🎉 This PR is included in v1.10.0 🎉 |
Motivation
These logs are meant for end users as they are info/warn/error and not debug only. The term "ExecutionEngine" is quite technical in my opinion and to avoid any confusion would recommend we just stick to "Execution client" which is the term used by most users.
The second change of this PR is the update of log level to warn from info if execution client is syncing. This better aligns it with the validator log if beacon node is syncing. We have discussed in #5467 if info or warn should be used there.
Description