[Merged by Bors] - Fix possible deadloop in beacon #6451

fasmat · 2024-11-13T13:24:47Z

Motivation

I found a problem in the beacon protocol where if the node wakes up from sleep or syncs its clock via NTP at the wrong moment the go routine running the beacon protocol might end up in a deadloop without ever recovering.

Description

listenEpochs is supposed to wait until the start of an epoch to then start the beacon protocol for that epoch. When the new epoch starts the select case <-pd.clock.AwaitLayer(layer) will unlock. This can happen any time after that layer was reached - usually ms, but if the runtime was very busy or the host was hibernating the time between reaching the layer and this case unlocking can extend much longer. The problem is the if a bit further below:

if !current.FirstInEpoch() {
	continue
}

if for any reason the signal from the select case was received after the 2nd layer of the epoch already started, this if statement will continue the for loop without updating current or layer resulting in a deadloop.

I fixed the issue by updating layer before checking, so that if the node is late it skips participating in the beacon protocol for this epoch and waits for the next.

Test Plan

Existing tests pass

TODO

Explain motivation or link existing issue(s)
Test changes and document test plan
Update documentation as needed
Update changelog as needed

codecov · 2024-11-13T13:50:33Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.9%. Comparing base (fbca87d) to head (7f8251a).
Report is 1 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff           @@
##           develop   #6451   +/-   ##
=======================================
  Coverage     79.9%   79.9%           
=======================================
  Files          352     352           
  Lines        46099   46098    -1     
=======================================
+ Hits         36850   36867   +17     
+ Misses        7154    7143   -11     
+ Partials      2095    2088    -7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

poszu · 2024-11-13T13:55:23Z

What do you mean by "deadloop"? As far as I understand, it would spin on:

		case <-pd.clock.AwaitLayer(layer):
			current := pd.clock.CurrentLayer()
			if !current.FirstInEpoch() {
				continue
			}

for a long time until the next epoch (when the current becomes the first layer finally) because layer is in the past and not updated.

fasmat · 2024-11-13T14:19:14Z

What do you mean by "deadloop"? As far as I understand, it would spin on:
		case <-pd.clock.AwaitLayer(layer):
			current := pd.clock.CurrentLayer()
			if !current.FirstInEpoch() {
				continue
			}
for a long time until the next epoch (when the current becomes the first layer finally) because layer is in the past and not updated.

Right, it will only loop for a full epoch.

fasmat · 2024-11-13T14:19:54Z

bors merge

## Motivation I found a problem in the beacon protocol where if the node wakes up from sleep or syncs its clock via NTP at the wrong moment the go routine running the beacon protocol might end up in a deadloop without ever recovering.

spacemesh-bors · 2024-11-13T15:08:51Z

Pull request successfully merged into develop.

Build succeeded:

Fix possible deadloop in beacon

e9584b6

fasmat self-assigned this Nov 13, 2024

fasmat requested review from dshulyak, poszu, ivan4th, acud and jellonek as code owners November 13, 2024 13:24

Update CHANGELOG

7f8251a

poszu approved these changes Nov 13, 2024

View reviewed changes

spacemesh-bors bot changed the title ~~Fix possible deadloop in beacon~~ [Merged by Bors] - Fix possible deadloop in beacon Nov 13, 2024

spacemesh-bors bot closed this Nov 13, 2024

spacemesh-bors bot deleted the fix-possible-deadloop branch November 13, 2024 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Merged by Bors] - Fix possible deadloop in beacon #6451

[Merged by Bors] - Fix possible deadloop in beacon #6451

fasmat commented Nov 13, 2024

codecov bot commented Nov 13, 2024 •

edited

Loading

poszu commented Nov 13, 2024

fasmat commented Nov 13, 2024

fasmat commented Nov 13, 2024

spacemesh-bors bot commented Nov 13, 2024

[Merged by Bors] - Fix possible deadloop in beacon #6451

[Merged by Bors] - Fix possible deadloop in beacon #6451

Conversation

fasmat commented Nov 13, 2024

Motivation

Description

Test Plan

TODO

codecov bot commented Nov 13, 2024 • edited Loading

Codecov Report

poszu commented Nov 13, 2024

fasmat commented Nov 13, 2024

fasmat commented Nov 13, 2024

spacemesh-bors bot commented Nov 13, 2024

codecov bot commented Nov 13, 2024 •

edited

Loading