Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chef: Gracefully Handle RFC062 Exit Codes #19155

Merged
merged 6 commits into from
May 12, 2020

Conversation

bdwyertech
Copy link
Contributor

@bdwyertech bdwyertech commented Oct 20, 2018

Allows exit code 35 and 37 which do not imply Chef failure.

https://github.com/chef/chef-rfc/blob/master/rfc062-exit-status.md

Should help with #15134 and #16551

Copy link
Contributor

@svanharmelen svanharmelen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I do have one suggestion you might have a look at...

communicator/remote/command.go Outdated Show resolved Hide resolved
@svanharmelen
Copy link
Contributor

I noticed this is an old PR, are you still willing to work with me to get the PR ready for merging? In that case please have a look at my suggestion and also please rebase the PR so that the test can (hopefully) run successfully.

Lastly it would be great if you can sign the contributor license agreement. Thanks!

@bdwyertech
Copy link
Contributor Author

@svanharmelen sure, I've also gotten quite a bit better with Go since. I'll give this another whirl.

@bdwyertech
Copy link
Contributor Author

bdwyertech commented Apr 28, 2020

@stevenoneill-jmfamily and others, would we want to attempt a retry here or is the pending reboot notification good enough? Putting this in a retry loop if we get one of those exit codes should be doable, maybe with a configurable sleep.

Thoughts for configurables:

  • retry_on_codes: []int
    • If empty, (default), then this logic is disabled.
  • max_retries: int
    • Maximum number of times to continue the loop (e.g. number of reboots or special exit codes)
  • retry_wait: string (time.Duration string e.g. 5m)
    • Time to wait before trying to reconnect again

@svanharmelen If this is good to go, I'd say merge as MVC support for this. Don't have a timeframe for being able to support those configurables, write the tests, etc.

@stevenoneill-jmfamily
Copy link

@bdwyertech thanks for the heads up on this, appreciate it!

That sounds reasonable to me, I like that the defaults result in nothing (keep existing behavior). My two bits....from a Chef user pov, I feel like I'd be expecting behavior and names like the current test-kitchen experience. eg, utilizing these variable names:

retry_on_exit_code
max_retries
wait_for_retry

It'd be cool to have the consistency across products, if possible. If not, i'm still good with it because this feature would be really frickin useful :-)

Copy link
Contributor

@svanharmelen svanharmelen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is indeed a much better solution, but it will needs some tweaks to make it work as you intended 😉

builtin/provisioners/chef/resource_provisioner.go Outdated Show resolved Hide resolved
Copy link

@shoekstra shoekstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bdwyertech,

Thanks for this, I just ran into the same issue.. Needed a small fix but works great:

...
null_resource.default (chef): Recipe: test::windows
null_resource.default (chef):   * reboot[now] action reboot_now[2020-04-29T23:30:36+02:00] WARN: Rebooting system immediately, requested by 'now'
...
null_resource.default (chef):     Chef Infra Client finished, 1/6 resources updated in 19 seconds
null_resource.default (chef): [2020-04-29T23:30:39+02:00] WARN: Rebooting server at a recipe's request. Details: {:delay_mins=>0, :reason=>"Reboot by Chef Infra Client", :timestamp=>2020-04-29 23:30:36 +0200, :requested_by=>"now"}
...
null_resource.default (chef):     Chef Infra Client failed. 1 resources updated in 20 seconds
null_resource.default (chef): [2020-04-29T23:30:39+02:00] FATAL: Stacktrace dumped to C:/chef/cache/chef-stacktrace.out
null_resource.default (chef): [2020-04-29T23:30:39+02:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
null_resource.default (chef): [2020-04-29T23:30:39+02:00] FATAL: Chef::Exceptions::Reboot: Rebooting server at a recipe's request. Details: {:delay_mins=>0, :reason=>"Reboot by Chef Infra Client", :timestamp=>2020-04-29 23:30:36 +0200, :requested_by=>"now"}
null_resource.default (chef): Reboot has been scheduled in the run state
null_resource.default: Creation complete after 54s [id=6131578846470282695]

Apply complete! Resources: 1 added, 0 changed, 1 destroyed.

builtin/provisioners/chef/resource_provisioner.go Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Apr 29, 2020

Codecov Report

Merging #19155 into master will decrease coverage by 0.00%.
The diff coverage is 44.11%.

Impacted Files Coverage Δ
builtin/provisioners/chef/resource_provisioner.go 56.25% <43.07%> (-1.66%) ⬇️
communicator/winrm/communicator.go 53.70% <66.66%> (+1.41%) ⬆️
dag/marshal.go 53.40% <0.00%> (+1.13%) ⬆️

@bdwyertech
Copy link
Contributor Author

bdwyertech commented Apr 30, 2020

Alright, so I have nothing better to do then clean up the laundry list...

@shoekstra @svanharmelen @stevenoneill-jmfamily what do you think of this? I haven't the slightest idea how you would stub something like this in a test... Would love to see how you'd do it.

If you want to give this a spin, use a cookbook that triggers a reboot, and set the codes you want in your provisioner block.

retry_on_exit_code = [35, 213]
wait_for_retry = 30 // Default (seconds) : time to wait after receiving exit code before retrying connection & provisioning process
max_retries = 1 // Default : number of times to retry

@bdwyertech bdwyertech changed the title Chef: Allow RFC062 Exit Codes Chef: Gracefully Handle RFC062 Exit Codes Apr 30, 2020
builtin/provisioners/chef/resource_provisioner.go Outdated Show resolved Hide resolved
builtin/provisioners/chef/resource_provisioner.go Outdated Show resolved Hide resolved
builtin/provisioners/chef/resource_provisioner.go Outdated Show resolved Hide resolved
builtin/provisioners/chef/resource_provisioner.go Outdated Show resolved Hide resolved
@shoekstra
Copy link

This looks great, but it is also a considerably bigger change than your initial PR and my concern is it'll be harder to get merged in without adding tests. It would be a shame for this PR to stagnate again as a lot of Chef users would benefit from your fix.

May I suggest creating a new PR that adds the retry functionality so it doesn't hold up merging your initial change? I think would have a better chance of being merged as is because the proposed change is much smaller.

Signed-off-by: Brian Dwyer <[email protected]>
Even if MaxRetries is 0, we should still execute the loop one time in
order to run the Chef-Client at least once. Also waiting only makes
sense when we have `attempts` left. And last but not least we want to
exit immediately when the exit code is not in the retry list.

So this PR fixes three small issues to make everything work as
expected.
This example doesn't really show how these values should be used. The
default of retry_on_exit_code is now already when most people want, so
this line is not needed in most cases.

I think the docs describe the new options just fine, so lets leave this
out...
Copy link
Contributor

@svanharmelen svanharmelen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now 👍

@jbardin jbardin self-assigned this May 12, 2020
@jbardin jbardin merged commit e912dc8 into hashicorp:master May 12, 2020
@ghost
Copy link

ghost commented Jun 12, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Jun 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants