-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prioritized client updates #17354
prioritized client updates #17354
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
```release-note:improvement | ||
client: prioritize allocation updates to reduce Raft and RPC load | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1420,30 +1420,36 @@ func (ar *allocRunner) persistLastAcknowledgedState(a *state.State) { | |
} | ||
} | ||
|
||
// LastAcknowledgedStateIsCurrent returns true if the current state matches the | ||
// state that was last acknowledged from a server update. This is called from | ||
// the client in the same goroutine that called AcknowledgeState so that we | ||
// can't get a TOCTOU error. | ||
func (ar *allocRunner) LastAcknowledgedStateIsCurrent(a *structs.Allocation) bool { | ||
// GetUpdatePriority returns the update priority based the difference between | ||
// the current state and the state that was last acknowledged from a server | ||
// update. This is called from the client in the same goroutine that called | ||
// AcknowledgeState so that we can't get a TOCTOU error. | ||
func (ar *allocRunner) GetUpdatePriority(a *structs.Allocation) cstructs.AllocUpdatePriority { | ||
ar.stateLock.RLock() | ||
defer ar.stateLock.RUnlock() | ||
|
||
last := ar.lastAcknowledgedState | ||
if last == nil { | ||
return false | ||
return cstructs.AllocUpdatePriorityTypical | ||
} | ||
|
||
switch { | ||
case last.ClientStatus != a.ClientStatus: | ||
return false | ||
return cstructs.AllocUpdatePriorityUrgent | ||
case last.ClientDescription != a.ClientDescription: | ||
return false | ||
return cstructs.AllocUpdatePriorityTypical | ||
case !last.DeploymentStatus.Equal(a.DeploymentStatus): | ||
return false | ||
return cstructs.AllocUpdatePriorityTypical | ||
Comment on lines
1441
to
+1442
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technically deployments are gated by this field, so it could considered critical since it can cause a scheduling decision... ...but nothing about deployments is concerned with sub-second latencies, so I think it's fine to leave this as If you're in this code again maybe add a comment pointing out that while deployment status changes are not urgent, they can affect scheduling but not in a way that sub-second skew is significant. If you really want to tidy things up the PR description misses this too:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Somehow I missed that, so yeah I would've set it to urgent based on the reasoning I had in the PR. I'll keep (for now at least) and I'll add some commentary here around reasoning for things. |
||
case !last.NetworkStatus.Equal(a.NetworkStatus): | ||
return false | ||
return cstructs.AllocUpdatePriorityTypical | ||
} | ||
return maps.EqualFunc(last.TaskStates, a.TaskStates, func(st, o *structs.TaskState) bool { | ||
|
||
if !maps.EqualFunc(last.TaskStates, a.TaskStates, func(st, o *structs.TaskState) bool { | ||
return st.Equal(o) | ||
}) | ||
|
||
}) { | ||
return cstructs.AllocUpdatePriorityTypical | ||
} | ||
|
||
return cstructs.AllocUpdatePriorityNone | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't know what it was before, how can we assume the change is typical? Seems worth a comment especially since all of the other code in this method must check from Highest Priority to Lowest in order to ensure a change to a low priority field doesn't demote an actually high priority update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that we don't know for sure. In practice an allocation will never become healthy quickly enough that the first update we send is that update. That being said we probably should account for allocations that quickly fail because there's a bunch of things that can go unrecoverably wrong on the client before we ever hit the task runner, and it'd be nice to be able to send those failure states to the server more quickly.