Skip to content

Commit

Permalink
Remove "activeDeadlineSeconds" setting on pre-/post- upgrade job.
Browse files Browse the repository at this point in the history
Why:
Currently we set both backoffLimit and activeDeadlineSeconds on pre-/post-
upgrade job. A job's activeDeadlineSeconds takes precedence over
backoffLimit, thus pre-/post- upgrade job will not deploy additional pods
once it reaches the time limit even if the backoffLimit is not hit.

When this happens, the following condition is set on the job, but
we are not checking that condition in version upgrade logic, thus
leading to operator stucks in upgrade inprogress state.

  conditions:
  - lastProbeTime: "2020-04-01T09:48:21Z"
    lastTransitionTime: "2020-04-01T09:48:21Z"
    message: Job was active longer than specified deadline
    reason: DeadlineExceeded
    status: "True"
    type: Failed
  failed: 5
  startTime: "2020-04-01T09:36:36Z"

What:
this change is to remove deadline seconds, thus simplify the logic
to cap the retries solely based on counts. If we ever needed to
have both time and count limits, then adding checking for deadline
exceeded condition in upgrade logic is required.
  • Loading branch information
wyzhang committed Apr 1, 2020
1 parent 2a561f1 commit 2a00fc5
Show file tree
Hide file tree
Showing 4 changed files with 10 additions and 3 deletions.
1 change: 0 additions & 1 deletion controllers/testdata/post_upgrade_job.json
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@
"uid": "a0afd92d-4b0d-11ea-8611-42010a800022"
},
"spec": {
"activeDeadlineSeconds": 600,
"backoffLimit": 10,
"completions": 1,
"parallelism": 1,
Expand Down
1 change: 0 additions & 1 deletion controllers/testdata/pre_upgrade_job.json
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@
"uid": "401828eb-4b1a-11ea-8611-42010a800022"
},
"spec": {
"activeDeadlineSeconds": 600,
"backoffLimit": 10,
"completions": 1,
"parallelism": 1,
Expand Down
10 changes: 10 additions & 0 deletions controllers/version_update.go
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,11 @@ func upgradeForBackend(master *v1alpha1.CDAPMaster, labels map[string]string, ob
}

// First, run pre-upgrade job
//
// Note that pre-upgrade job doesn't have an "activeDeadlineSeconds" set it on, so it will
// try as many as imageVersionUpgradeJobMaxRetryCount times before giving up. If we ever
// needed to set an overall deadline for the pre-upgrade job, the logic below needs to check
// deadline exceeded condition on job's status
if !isConditionTrue(master, updateStatus.PreUpgradeSucceeded) {
log.Printf("Version update: pre-upgrade job not completed")
preJobName := getPreUpgradeJobName(master.Status.UpgradeStartTimeMillis)
Expand Down Expand Up @@ -192,6 +197,11 @@ func upgradeForBackend(master *v1alpha1.CDAPMaster, labels map[string]string, ob
}

// At last, run post-upgrade job
//
// Note that post-upgrade job doesn't have an "activeDeadlineSeconds" set it on, so it will
// try as many as imageVersionUpgradeJobMaxRetryCount times before giving up. If we ever
// needed to set an overall deadline for the post-upgrade job, the logic below needs to check
// deadline exceeded condition on job's status
if !isConditionTrue(master, updateStatus.PostUpgradeSucceeded) {
log.Printf("Version update: post-upgrade job not completed")
postJobName := getPostUpgradeJobName(master.Status.UpgradeStartTimeMillis)
Expand Down
1 change: 0 additions & 1 deletion templates/upgrade-job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,4 +74,3 @@ spec:
{{end}}
restartPolicy: Never
backoffLimit: {{.BackoffLimit}}
activeDeadlineSeconds: 600

0 comments on commit 2a00fc5

Please sign in to comment.