Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Jobs with ttl_seconds_after_finished = 0 correctly #2596

Merged
merged 10 commits into from
Oct 30, 2024

Conversation

JaylonmcShan03
Copy link
Contributor

@JaylonmcShan03 JaylonmcShan03 commented Oct 1, 2024

Description

Fixes #2531
When a Kubernetes Job has ttl_seconds_after_finished set to 0, Kubernetes deletes the Job immediately after it completes. This can cause Terraform to plan the recreation of the Job in subsequent runs, which is sort of undesirable behavior. The expected behavior is for Terraform to recognize that the Job was deleted intentionally and not attempt to recreate it or show any diffs.

Changes made
- Read function:
- When the read crud func encounters a NotFound error for a job, it now checks if ttl is set to 0, if so it keeps the
resource in the state without marking it as destroyed, this helps prevent the recreation of the job. If it is not 0, it
behaves as before by removing the resource from the state.

CustomizeDiff function introduced:
- Supressing diffs when the job has been deleted due to due to ttl = 0

Acceptance tests

  • Have you added an acceptance test for the functionality being added?
  • Have you run the acceptance tests on this branch?

Output from acceptance testing:

└─(10:13:50 on fix-job-ttl-zero-handling)──> make test TESTARGS="-run TestAccKubernetesJobV1_customizeDiff_ttlZero"
==> Checking that code complies with gofmt requirements...
go vet ./...
go test "/Users/mau/Dev/terraform-provider-kubernetes/kubernetes" -vet=off -run TestAccKubernetesJobV1_customizeDiff_ttlZero -parallel 8 -timeout=30s
ok      github.com/hashicorp/terraform-provider-kubernetes/kubernetes   0.924s
...

Release Note

Release note for CHANGELOG:

Properly handle Kubernetes Jobs with ttl_seconds_after_finished = 0 to prevent unnecessary recreation.

References

Fixes #2531

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

@github-actions github-actions bot added size/L and removed size/M labels Oct 11, 2024
kubernetes/resource_kubernetes_job_v1.go Outdated Show resolved Hide resolved
var oldTTLStr, newTTLStr string

if oldTTLRaw != nil {
oldTTLStr, _ = oldTTLRaw.(string)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just omit the , _ here if you're not going to check the type assertion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

,_ is not omitted, thanks for the feedback!

kubernetes/resource_kubernetes_job_v1.go Outdated Show resolved Hide resolved
`, name, imageName)
}

func testAccKubernetesJobV1Config_customizeDiff_ttlFive(name, imageName string) string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having 2 configs, you could just make the ttl value of a parameter to the function and template it in.

Comment on lines 54 to 97
func resourceKubernetesJobV1CustomizeDiff(ctx context.Context, d *schema.ResourceDiff, meta interface{}) error {
if d.Id() == "" {
log.Printf("[DEBUG] Resource ID is empty, resource not created yet.")
return nil
}

// Retrieve old and new TTL values
oldTTLRaw, newTTLRaw := d.GetChange("spec.0.ttl_seconds_after_finished")

// If both TTL values are not set or are empty, skip TTL diff
if (oldTTLRaw == nil && newTTLRaw == nil) || (oldTTLRaw == "" && newTTLRaw == "") {
log.Printf("[DEBUG] No ttl_seconds_after_finished provided or both values are empty; skipping TTL diff")
return nil
}

// Only check TTL if present
var oldTTLStr, newTTLStr string
if oldTTLRaw != nil {
oldTTLStr = oldTTLRaw.(string)
}
if newTTLRaw != nil {
newTTLStr = newTTLRaw.(string)
}

oldTTLInt, err := strconv.Atoi(oldTTLStr)
if err != nil {
log.Printf("[DEBUG] Invalid old TTL value: %s, skipping diff", oldTTLStr)
return nil
}
newTTLInt, err := strconv.Atoi(newTTLStr)
if err != nil {
log.Printf("[DEBUG] Invalid new TTL value: %s, skipping diff", newTTLStr)
return nil
}

// Suppress the diff if the old and new TTL values are the same
if oldTTLInt == newTTLInt {
log.Printf("[DEBUG] ttl_seconds_after_finished has not changed; suppressing diff")
d.Clear("spec")
d.Clear("metadata")
}

return nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense to me why we would have a diff function that checks if the value of ttl_seconds_after_finished hasn't changed and then suppress the diff?

I tried removing this function completely and your tests still pass.

Comment on lines 234 to 255
_, err = conn.BatchV1().Jobs(namespace).Get(ctx, name, metav1.GetOptions{})
if err != nil {
if apierrors.IsNotFound(err) {
// Job is missing; check TTL
ttlAttr := d.Get("spec.0.ttl_seconds_after_finished")
ttlStr, _ := ttlAttr.(string)
ttlInt, err := strconv.Atoi(ttlStr)
if err != nil {
ttlInt = 0
}

if ttlInt >= 0 {
// Job was deleted due to TTL nothing to update
log.Printf("[INFO] Job %s not found but ttl_seconds_after_finished = %v; nothing to update", d.Id(), ttlInt)
return nil
}

// Job was deleted unexpectedly; return an error
return diag.Errorf("Job %s not found; cannot update because it has been deleted", d.Id())
}
return diag.Errorf("Error retrieving Job: %s", err)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this logic either. In the read function it's going to check if the job exists and then set the ID to "" if it doesn't which will force recreation and so the update function wont get called.

I tried commenting this code out and all your tests still pass too.

@github-actions github-actions bot added size/M and removed size/L labels Oct 23, 2024
kubernetes/resource_kubernetes_job_v1_test.go Outdated Show resolved Hide resolved
kubernetes/resource_kubernetes_job_v1_test.go Outdated Show resolved Hide resolved
@JaylonmcShan03 JaylonmcShan03 merged commit a443c94 into main Oct 30, 2024
21 of 22 checks passed
@JaylonmcShan03 JaylonmcShan03 deleted the fix-job-ttl-zero-handling branch October 30, 2024 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Terraform lifecycle of a job that has ttl_seconds_after_finished set
3 participants