-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
func: add new picker dependency #20029
Changes from all commits
87be0e8
366b9e8
678e329
1da63fb
f5a7145
d94fb84
8000e15
7a2740a
7793b9f
2edcfc1
333b61f
4e5ff84
c43ee72
899b4e0
1e55846
f2bc833
2ef6f90
b75211f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
```release-note:improvement | ||
server: Add new options for reconcilation in case of disconnected nodes | ||
``` |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -19,6 +19,7 @@ import ( | |||||
"github.com/hashicorp/nomad/helper" | ||||||
"github.com/hashicorp/nomad/helper/uuid" | ||||||
"github.com/hashicorp/nomad/nomad/structs" | ||||||
reconnectingpicker "github.com/hashicorp/nomad/scheduler/reconnecting_picker" | ||||||
) | ||||||
|
||||||
const ( | ||||||
|
@@ -32,6 +33,10 @@ const ( | |||||
rescheduleWindowSize = 1 * time.Second | ||||||
) | ||||||
|
||||||
type ReconnectingPicker interface { | ||||||
PickReconnectingAlloc(disconnect *structs.DisconnectStrategy, original *structs.Allocation, replacement *structs.Allocation) *structs.Allocation | ||||||
} | ||||||
Comment on lines
+36
to
+38
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need this interface? I could see a case where the drain strategy could be an interface with one implementation per strategy, but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Im aiming at removing the responsibility of the picking from the allocReconciler, that is the main reason of te interface :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interfaces in Go are a bit strange, and work differently than in other languages, like Java or C#. Since they're implicitly defined pre-creating them can often lead to a confusing implementation. Specially for people new to the code base, trying to "jump to definition" in an interface is quite frustrating 😬 Sometimes that's unavoidable because we want to have multiple implementations (like a real thing and a test implementation), or if want to restrict what to expose (like only exposing certain methods of a struct). But in this case I can't really think of a different implementations of So we don't need this interface to remove responsibility from the reconciler. Using the concrete struct already accomplishes that. And having it creates unnecessary indirection that makes the code harder to follow. If, later on, we do find a need for an interface, the implicit nature of Go's interface can help. If the reconciler ends up calling X(), Y(), and Z(), where each behaves differently under some conditions, we can easily add them to a new interface without much code change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having that interface in place gives us a define frontier between the alloc reconciler and the reconnecting picker: Imagine we refactor the reconciler and we want to unit test the logic. If we remove the interface, we remove the possibility of having a look at the internals of the reconciliation, we will need to write tests that also take into account the picker logic, no way to isolate components. If we leave the interface where it is, it is the perfect cutting point where we can add a test implementation of the picker just for the purpose of verifying the behaviour of the reconciliation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The logic in But nevertheless, the nice things about implicit interfaces is that, in case we do find ourselves needing alternative implementation, the only line we would need to change is the type of Line 110 in 7a2740a
The part of the code that has the biggest impact (and it's not even that big) on the scenario you mentioned is this one: Line 227 in 7a2740a
This is what is preventing us from feeding the reconciler with different implementation because the reconciler itself is making a decision on what to instantiate, instead of the dependency being fed externally, so it's violating the isolation you mentioned. But, for the same reason we don't need this interface, we don't need to change this logic. If the need does arise to provide alternate implementation, it's pretty straightforward to update the code. But until then, this interfaces creates unnecessary indirection, making the code hard to read. https://100go.co/5-interface-pollution/ has some good discussion about interfaces in Go. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is meant to isolate the logic of the reconciler, not of the picker. When we get to the point of refactoring the reconciler, we can add a configuration option to change the picker, as done in other parts of the code. I you ask me, there are not enough interfaces in the nomad code, isolating components is almost impossible in some parts :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we're refactoring the reconciler then adding this interface if necessary won't be any extra work. But until then, this interface is not bringing any benefits, and it's making the code harder to read. |
||||||
|
||||||
// allocUpdateType takes an existing allocation and a new job definition and | ||||||
// returns whether the allocation can ignore the change, requires a destructive | ||||||
// update, or can be inplace updated. If it can be inplace updated, an updated | ||||||
|
@@ -102,6 +107,8 @@ type allocReconciler struct { | |||||
// defaults to time.Now, and overridden in unit tests | ||||||
now time.Time | ||||||
|
||||||
reconnectingPicker ReconnectingPicker | ||||||
|
||||||
// result is the results of the reconcile. During computation it can be | ||||||
// used to store intermediate state | ||||||
result *reconcileResults | ||||||
|
@@ -195,6 +202,7 @@ func NewAllocReconciler(logger log.Logger, allocUpdateFn allocUpdateType, batch | |||||
jobID string, job *structs.Job, deployment *structs.Deployment, | ||||||
existingAllocs []*structs.Allocation, taintedNodes map[string]*structs.Node, evalID string, | ||||||
evalPriority int, supportsDisconnectedClients bool, opts ...AllocReconcilerOption) *allocReconciler { | ||||||
|
||||||
ar := &allocReconciler{ | ||||||
logger: logger.Named("reconciler"), | ||||||
allocUpdateFn: allocUpdateFn, | ||||||
|
@@ -216,6 +224,7 @@ func NewAllocReconciler(logger log.Logger, allocUpdateFn allocUpdateType, batch | |||||
desiredFollowupEvals: make(map[string][]*structs.Evaluation), | ||||||
taskGroupAllocNameIndexes: make(map[string]*allocNameIndex), | ||||||
}, | ||||||
reconnectingPicker: reconnectingpicker.New(logger), | ||||||
} | ||||||
|
||||||
for _, op := range opts { | ||||||
|
@@ -461,7 +470,7 @@ func (a *allocReconciler) computeGroup(groupName string, all allocSet) bool { | |||||
if len(reconnecting) > 0 { | ||||||
// Pass all allocations because the replacements we need to find may be | ||||||
// in any state, including themselves being reconnected. | ||||||
reconnect, stop := a.reconcileReconnecting(reconnecting, all) | ||||||
reconnect, stop := a.reconcileReconnecting(reconnecting, all, tg) | ||||||
|
||||||
// Stop the reconciled allocations and remove them from the other sets | ||||||
// since they have been already handled. | ||||||
|
@@ -1145,7 +1154,7 @@ func (a *allocReconciler) computeStop(group *structs.TaskGroup, nameIndex *alloc | |||||
// - If the reconnecting allocation is to be stopped, its replacements may | ||||||
// not be present in any of the returned sets. The rest of the reconciler | ||||||
// logic will handle them. | ||||||
func (a *allocReconciler) reconcileReconnecting(reconnecting allocSet, all allocSet) (allocSet, allocSet) { | ||||||
func (a *allocReconciler) reconcileReconnecting(reconnecting allocSet, all allocSet, tg *structs.TaskGroup) (allocSet, allocSet) { | ||||||
stop := make(allocSet) | ||||||
reconnect := make(allocSet) | ||||||
|
||||||
|
@@ -1199,8 +1208,8 @@ func (a *allocReconciler) reconcileReconnecting(reconnecting allocSet, all alloc | |||||
continue | ||||||
} | ||||||
|
||||||
// Pick which allocation we want to keep. | ||||||
keepAlloc := pickReconnectingAlloc(reconnectingAlloc, replacementAlloc) | ||||||
// Pick which allocation we want to keep using the disconnect reconcile strategy | ||||||
keepAlloc := a.reconnectingPicker.PickReconnectingAlloc(tg.Disconnect, reconnectingAlloc, replacementAlloc) | ||||||
if keepAlloc == replacementAlloc { | ||||||
// The replacement allocation is preferred, so stop the one | ||||||
// reconnecting if not stopped yet. | ||||||
|
@@ -1235,44 +1244,6 @@ func (a *allocReconciler) reconcileReconnecting(reconnecting allocSet, all alloc | |||||
return reconnect, stop | ||||||
} | ||||||
|
||||||
// pickReconnectingAlloc returns the allocation to keep between the original | ||||||
// one that is reconnecting and one of its replacements. | ||||||
// | ||||||
// This function is not commutative, meaning that pickReconnectingAlloc(A, B) | ||||||
// is not the same as pickReconnectingAlloc(B, A). Preference is given to keep | ||||||
// the original allocation when possible. | ||||||
func pickReconnectingAlloc(original *structs.Allocation, replacement *structs.Allocation) *structs.Allocation { | ||||||
// Check if the replacement is newer. | ||||||
// Always prefer the replacement if true. | ||||||
replacementIsNewer := replacement.Job.Version > original.Job.Version || | ||||||
replacement.Job.CreateIndex > original.Job.CreateIndex | ||||||
if replacementIsNewer { | ||||||
return replacement | ||||||
} | ||||||
|
||||||
// Check if the replacement has better placement score. | ||||||
// If any of the scores is not available, only pick the replacement if | ||||||
// itself does have scores. | ||||||
originalMaxScoreMeta := original.Metrics.MaxNormScore() | ||||||
replacementMaxScoreMeta := replacement.Metrics.MaxNormScore() | ||||||
|
||||||
replacementHasBetterScore := originalMaxScoreMeta == nil && replacementMaxScoreMeta != nil || | ||||||
(originalMaxScoreMeta != nil && replacementMaxScoreMeta != nil && | ||||||
replacementMaxScoreMeta.NormScore > originalMaxScoreMeta.NormScore) | ||||||
|
||||||
// Check if the replacement has better client status. | ||||||
// Even with a better placement score make sure we don't replace a running | ||||||
// allocation with one that is not. | ||||||
replacementIsRunning := replacement.ClientStatus == structs.AllocClientStatusRunning | ||||||
originalNotRunning := original.ClientStatus != structs.AllocClientStatusRunning | ||||||
|
||||||
if replacementHasBetterScore && (replacementIsRunning || originalNotRunning) { | ||||||
return replacement | ||||||
} | ||||||
|
||||||
return original | ||||||
} | ||||||
|
||||||
// computeUpdates determines which allocations for the passed group require | ||||||
// updates. Three groups are returned: | ||||||
// 1. Those that require no upgrades | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have a godoc comment on this one. And do we have a path to deprecate the previous behaviour?