Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return -1 status code if backoff limit reached #10311

Closed
wants to merge 1 commit into from

Conversation

zangell44
Copy link
Collaborator

This PR adds logic to the Kubernetes infrastructure block to return status code -1 if a job backoff limit has been reached.

May fix #9246.

Example

Checklist

  • This pull request references any related issue by including "closes <link to issue>"
    • If no issue exists and your change is not a small fix, please create an issue first.
  • This pull request includes tests or only affects documentation.
  • This pull request includes a label categorizing the change e.g. fix, feature, enhancement, docs.

For documentation changes:

  • This pull request includes redirect settings in netlify.toml for files that are removed or renamed

@zangell44 zangell44 added the fix A fix for a bug in an existing feature label Jul 26, 2023
@zangell44 zangell44 requested a review from a team as a code owner July 26, 2023 21:04
@netlify
Copy link

netlify bot commented Jul 26, 2023

Deploy Preview for prefect-docs-preview failed.

Name Link
🔨 Latest commit 69c7329
🔍 Latest deploy log https://app.netlify.com/sites/prefect-docs-preview/deploys/64c18a6a253b230008a7a7f0

@zangell44 zangell44 marked this pull request as draft July 26, 2023 21:42
@zanieb
Copy link
Contributor

zanieb commented Jul 27, 2023

fwiw -1 is supposed to indicate that we do not know the exit code. Can the exit code be pulled from the last failure or is no container being created?

@zangell44
Copy link
Collaborator Author

Can the exit code be pulled from the last failure or is no container being created?

I'd expect it to be pulled from the last failure, but we've seen reports of this not working as expected with spot instance eviction. I'm going to reproduce the issue and understand better instead of merging this band-aid fix.

@abrookins
Copy link
Contributor

Can we merge the bandaid while you try to reproduce @zangell44?

@github-actions
Copy link
Contributor

This pull request is stale because it has been open 14 days with no activity. To keep this pull request open remove stale label or comment.

@github-actions
Copy link
Contributor

This pull request was closed because it has been stale for 14 days with no activity. If this pull request is important or you have more to add feel free to re-open it.

@github-actions github-actions bot closed this Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix A fix for a bug in an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Agent not detecting flow crash when EC2 spot instance revoked
3 participants