Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix retries #2241

Merged
Merged

Conversation

Tom-Newton
Copy link
Contributor

@Tom-Newton Tom-Newton commented Oct 13, 2024

Purpose of this PR

Fix retries for submission failure and application. Closes: #2237 and apparently maybe #2118 too.

Proposed changes:

  • Convert isNextRetryDue into timeUntilNextRetryDue so we know how to set RequeueAfter.
  • After completing a reconcile where the outcome is to wait longer before retrying, make sure to set a requeue time. Without this the controller won't attempt to reconcile the SparkApplication again until 10 hours later after default Cache.SyncPeriod.
  • Remove the bool return from runSparkSubmit and just return error, because the bool argument is equivalent to err != nil. Also remove the related code in submitSparkApplication since that now seems to be out of date.
  • Refactor submitSparkApplication and how its called a bit to ensure that submission and execution attempt counts, logging of errors and state are updated consistently.
  • Always return an error with meaningful name in reconcileFailedSubmissionSparkApplication if validateSparkResourceDeletion returns false. Previously if was returning err set from a prior and possibly unrelated function call.
  • Added new e2e tests for submission failure and application failure.

Change Category

Indicate the type of change by marking the applicable boxes:

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Checklist

Before submitting your PR, please review the following:

  • I have conducted a self-review of my own code.
  • I have updated documentation accordingly. - I don't think any change is required
  • I have added tests that prove my changes are effective or that my feature works
  • Existing unit tests pass locally with my changes.

Additional Notes

I'm a golang noob

@Tom-Newton
Copy link
Contributor Author

I wasn't expecting to get reviewers assigned after opening as draft. This isn't ready for review yet.

@Tom-Newton Tom-Newton force-pushed the tomnewton/fix_requeues_to_retry branch from a1b6431 to bc13f30 Compare October 13, 2024 14:03
@Tom-Newton Tom-Newton marked this pull request as ready for review October 13, 2024 14:17
@Tom-Newton Tom-Newton force-pushed the tomnewton/fix_requeues_to_retry branch 2 times, most recently from 6a5eb85 to a5ed2e5 Compare October 13, 2024 14:31
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
Signed-off-by: Thomas Newton <[email protected]>
@Tom-Newton Tom-Newton force-pushed the tomnewton/fix_requeues_to_retry branch from f80259c to 2c64edb Compare October 14, 2024 15:07
@Tom-Newton Tom-Newton mentioned this pull request Oct 15, 2024
1 task
@jacobsalway
Copy link
Member

Thanks for the tag. Sorry have been busy with work but will review this tonight.

@josecsotomorales
Copy link
Contributor

@ChenYi015 can we review this PR and create a new release if it looks good? @jacobsalway I'm happy to help with contributions

internal/controller/sparkapplication/controller.go Outdated Show resolved Hide resolved
test/e2e/sparkapplication_test.go Outdated Show resolved Hide resolved
test/e2e/sparkapplication_test.go Outdated Show resolved Hide resolved
internal/controller/sparkapplication/controller.go Outdated Show resolved Hide resolved
Tom-Newton and others added 4 commits October 18, 2024 14:14
@Tom-Newton Tom-Newton force-pushed the tomnewton/fix_requeues_to_retry branch from d45bf69 to 223687f Compare October 18, 2024 13:32
@Tom-Newton Tom-Newton requested a review from ChenYi015 October 18, 2024 13:32
Signed-off-by: Thomas Newton <[email protected]>
@Tom-Newton Tom-Newton force-pushed the tomnewton/fix_requeues_to_retry branch from cf1e2cc to f0dc727 Compare October 18, 2024 13:38
pkg/util/sparkapplication.go Outdated Show resolved Hide resolved
internal/controller/sparkapplication/controller.go Outdated Show resolved Hide resolved
internal/controller/sparkapplication/controller.go Outdated Show resolved Hide resolved
internal/controller/sparkapplication/controller.go Outdated Show resolved Hide resolved
Signed-off-by: Thomas Newton <[email protected]>
@Tom-Newton Tom-Newton requested a review from ChenYi015 October 18, 2024 17:23
@josecsotomorales
Copy link
Contributor

Hey @ChenYi015, it looks like change requests were addressed, can we get another review?

Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ChenYi015

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ChenYi015
Copy link
Contributor

/lgtm

@ChenYi015
Copy link
Contributor

@josecsotomorales Have released v2.1.0-rc.0 with this PR included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Retries don't work
4 participants