Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ExecutionHook CRDs API Types #5

Closed

Conversation

ashish-amarnath
Copy link

@ashish-amarnath ashish-amarnath commented Jan 13, 2020

@k8s-ci-robot
Copy link

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
  • Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jan 13, 2020
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jan 13, 2020
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jan 13, 2020
@xing-yang
Copy link

/assign @xing-yang

@prydonius
Copy link

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ashish-amarnath, prydonius
To complete the pull request process, please assign xing-yang
You can assign the PR to them by writing /assign @xing-yang in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

api/v1alpha1/executionhook_types.go Show resolved Hide resolved

var (
// GroupVersion is group version used to register these objects
GroupVersion = schema.GroupVersion{Group: "apps.k8s.io", Version: "v1alpha1"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm checking whether it should be apps.k8s.io or app.k8s.io to be consistent with other CRDs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Quick pointer, the KEP also referred to this as apiVersion: apps.k8s.io/v1alpha1.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, apps.k8s.io looks right. Just want to double check as I found "app.k8s.io" is used for other CRDs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is apps.k8s.io correct or should we change to app.k8s.io?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that's why "app.k8s.io" is used for "Application" CRD, not "apps.k8s.io"? I don't know the reason, hence raising the question. Since this ExecutionHook repo belongs to sig-apps, I thought it would be better to stay in app.k8s.io for consistency.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, perhaps. Note that a SIG can and should have more than one API group, API groups are more fine grained than SIGs.

Are you sure apps is the right SIG for this and not node? This seems more connected to me with pods and runtimes than with replicasets and deployments. The existing exec feature is SIG node.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is under sig-apps: https://github.com/kubernetes/community/blob/master/sigs.yaml#L194. The KEP was initially done in sig-storage. @saad-ali suggested that this should be in sig-apps because we need application specific knowledge to use the execution hooks to quiesce and unquiesce an application. Of course we should get input from sig-node as well. I believe there was someone from sig-node who reviewed the KEP.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, just checking. lifecycle.apps.k8s.io might be a good group name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

@ashish-amarnath ashish-amarnath force-pushed the kubebuilder-init branch 2 times, most recently from bc71ee4 to 35cd833 Compare January 16, 2020 22:42
@ashish-amarnath ashish-amarnath changed the title Generate kubebuilder scaffolding project Implement ExecutionHook CRDs API Jan 16, 2020
@ashish-amarnath ashish-amarnath changed the title Implement ExecutionHook CRDs API Implement ExecutionHook CRDs API Types Jan 16, 2020
Signed-off-by: Ashish Amarnath <[email protected]>
@ashish-amarnath ashish-amarnath force-pushed the kubebuilder-init branch 3 times, most recently from 606932c to cdd9db1 Compare January 17, 2020 00:45
api/v1alpha1/executionhook_types.go Outdated Show resolved Hide resolved
api/v1alpha1/executionhook_types.go Outdated Show resolved Hide resolved
api/v1alpha1/executionhook_types.go Outdated Show resolved Hide resolved
api/v1alpha1/executionhook_types.go Outdated Show resolved Hide resolved
api/v1alpha1/executionhook_types.go Show resolved Hide resolved
api/v1alpha1/executionhook_types.go Outdated Show resolved Hide resolved
api/v1alpha1/hookaction_types.go Outdated Show resolved Hide resolved
api/v1alpha1/hookaction_types.go Outdated Show resolved Hide resolved
// Any action may be triggered more than once but only the latest results will be logged in status.
// As alpha feature, only ExecAction type in Handler will be support, not the HTTPGETAction or TCPSocketAction.
// This is required.
Action corev1.Handler `json:"action"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add protobuf part

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Kubernetes clients for CRDs, the protbuf serialization tags are not used. Protobuf encoding are used in kubernetes only for internal types, mainly for efficiency. This is not extended for CRDs. For that reason, I have removed all protobuf tags.

Also See: kubernetes/kubernetes#55541

api/v1alpha1/hookaction_types.go Outdated Show resolved Hide resolved
@liggitt
Copy link

liggitt commented Jan 24, 2020

/assign @thockin @lavalamp
as active API reviewers on the design in kubernetes/enhancements#705

@ashish-amarnath ashish-amarnath force-pushed the kubebuilder-init branch 2 times, most recently from 8b269e1 to c09f99c Compare January 24, 2020 18:32
Signed-off-by: Ashish Amarnath <[email protected]>
// PodContainerNames lists the containers the ExecutionHook should be executed on in a Pod.
type PodContainerNames struct {
// This field is required
PodName string `json:"podName"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make a lot of sense to me, if your selector identifies e.g. all pods in a replica set, this still makes you specify each one by name (since the name has a random suffix).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one of "PodContainerNames" and "PodContainerSelector" will be specified but not both. So if the selector identifies all pods in the PodContainerSelector, this PodContainerNames won't be used. We need to add validation to make sure only one is specified.

@lavalamp
Copy link

I didn't realize that KEP got to the implementable stage. I recall having made significant feedback on it and I don't think I re-reviewed the changes.

@lavalamp
Copy link

The PR merging the KEP & setting it to implementable has this comment specifically calling out the API as still needing to be nailed down: kubernetes/enhancements#705 (review)

@xing-yang
Copy link

The PR merging the KEP & setting it to implementable has this comment specifically calling out the API as still needing to be nailed down: kubernetes/enhancements#705 (review)

Hi @lavalamp , @thockin has approved the KEP for alpha. Before moving to Beta, we'll evaluate whether to move it to kubelet. See here: https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/20190120-execution-hook-design.md#risks-and-mitigations

@lavalamp
Copy link

It is more or less up to you when you get an API review. If you get it for the alpha -> beta transition, fixing whatever comes up may involve a lot more work than if you get it now and address prior to implementation. I don't actually have time to do an API review right now, maybe @thockin does.

Anyway, I just want to be clear that the KEP went to implementable with a significant caveat (which may not have been the right thing to do, I don't know, it's hard to change now either way), and just to budget time for api review and whatever work that implies to make changes.

Since I'm not sure everyone is on the same page when it comes to API reviews, I'll add: the thing I try to do during API review is to make sure the feature's expression in the API composes well with the rest of the API surface area. This sometimes generates more and larger AI's than e.g. just asking for unions where there should be unions.

@xing-yang
Copy link

It is more or less up to you when you get an API review. If you get it for the alpha -> beta transition, fixing whatever comes up may involve a lot more work than if you get it now and address prior to implementation. I don't actually have time to do an API review right now, maybe @thockin does.

Anyway, I just want to be clear that the KEP went to implementable with a significant caveat (which may not have been the right thing to do, I don't know, it's hard to change now either way), and just to budget time for api review and whatever work that implies to make changes.

Since I'm not sure everyone is on the same page when it comes to API reviews, I'll add: the thing I try to do during API review is to make sure the feature's expression in the API composes well with the rest of the API surface area. This sometimes generates more and larger AI's than e.g. just asking for unions where there should be unions.

Hi @lavalamp, if I remember correctly, I think Alternative option 1b has incorporated your suggestions during the reviews: https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/20190120-execution-hook-design.md#alternative-option-1b. There were many rounds of discussions on whether to do it in kubelet or use a CRD during the design. Eventually we decided to go with a CRD (the current main proposal) for alpha version, implement it and see how it works. Then evaluate it and move it to Kubelet when going from alpha to beta.

So if it is okay with you and @thockin, we'd like to stay with this plan. Thanks!

@lavalamp
Copy link

@xing-yang as long as we all have the same expectation of a somewhat arduous step in the future, that's all that I'm trying to accomplish here :)

@xing-yang
Copy link

@xing-yang as long as we all have the same expectation of a somewhat arduous step in the future, that's all that I'm trying to accomplish here :)

Understood:).

@thockin
Copy link

thockin commented Jan 27, 2020

To prep for this, I pulled up the KEP, and (surprise) I agree with my own comments there. I will try to make a quick pass on this, but I think what I was really hoping for was revisions to the KEP to nail down the API.

// HookActionName is copied to ExecutionHookSpec by the controller such as
// the Application Snapshot Controller.
type ExecutionHookSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want to remove meta-comments?

ActionName string `json:"actionName"`
}

// PodSelection contains two fields, PodContainerNamesList and PodContainerSelector,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically copies the type contents into a comment. Don't list the fields here. Just say that this struct represents a one-of, and define the defaults.

One-of types really want a discriminator field, e.g. "type=Selector"

// ExecutionHookSpec defines the desired state of ExecutionHook
// HookActionName is copied to ExecutionHookSpec by the controller such as
// the Application Snapshot Controller.
type ExecutionHookSpec struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called "PodAction" or maybe just "Action" or "Notification"

// asynchronously. If execution ordering is required, caller has to implement the logic and create
// different hooks in order.
// This field is required.
PodSelection PodSelection `json:"podSelection"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I disagree with this structure. I said something about this in the KEP. Here's my thinking.

Ideally (IMO) HookAction eventually becomes a field inside each Pod.spec.containers[] - it defines what to do when that action is triggered. Likewise, this type (ExecutionHook) should be called PodAction or similar. It triggers an action on a set of Pods. It should not be spelling out which container because the action definition itself will be attached to a container.

If/when we carry this into Beta, you'll have to change all of this to a simple pod selector. Better to change it now. This can simply be selector and be a normal pod selector. I'm not convinced we need specific pod names?

Once this is integrated to Pod, triggering an action "foo" can mean "all action specs named foo on any containers in the pod"

Copy link

@xing-yang xing-yang Jan 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did have this proposed as part of the LifeCycle struct inside a container as described in alternative option 1a and 1b in the KEP. That was actually the main proposal when the KEP was initially submitted. https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/20190120-execution-hook-design.md#alternative-option-1a

We also talked about the pros and cons of having kubelet or an external controller take care of these hooks:
https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/20190120-execution-hook-design.md#controller-handlings-for-option-1a-and-1b

If we decided to go with alternative 1a/1b or some variation of those, then I don't think we need this execution-hook repo as the handling of the hooks will be either in kubelet and/or in the application snapshot controller (when that's available).

After numerous discussions, however, the decision was to start with an alpha implementation with a hook CRD and an external controller to manage the hooks. This allows us to explore other features such as application snapshot which will use this hook for quiescing. We have stated in the KEP that we'll look at doing it in kubelet before moving to beta.
https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/20190120-execution-hook-design.md#risks-and-mitigations
"The security concern is that ExecutionHook controller has the authority to execute commands in any pods. For alpha and proof of concept, we propose to use external controller to handle executionhooks. But to move to beta and graduate as GA, we will evaluate it and move it to kubelet which already has the privilege to execute commands in pod/containers."

// It will be set to false if the action cannot be executed successfully after
// ActionTimeoutSeconds passes.
// +optional
Succeed *bool `json:"succeed,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

success

// If not set, it is nil, indicating Action has not started
// If set, it means Action has started at the specified time
// +optional
Timestamp *metav1.Time `json:"timestamp,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this actually mean? What guarantees can I infer from it?

What happens if I retry the action? Under what circumstances would a retry happen?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to know how long an action took from trigger to completion? I think we would..

// This is required
ErrorType ErrorType `json:"errorType"`

// Error message
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the result from the action (e.g. exec stderr)? Or is it fixed from the controller?

// +optional
Message *string `json:"message,omitempty"`

// More detailed reason why error happens
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any of this comes from user pod (e.g. stderr) it should be size-bounded and specified.

PodSelection PodSelection `json:"podSelection"`

// Name of the HookAction. This is required.
ActionName string `json:"actionName"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to control how fast this happens across a set of pods? E.g. I have a deployment and I want this to wait 10 seconds between each pod?

// +kubebuilder:resource:path=executionhook,shortName=eh,scope=Namespaced,categories=executionhook
// +kubebuilder:subresource:status

// ExecutionHook is the Schema for the executionhook API
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talk about guarantees of execution (at-least-once vs exactly-once) and retries, timeouts, idempotency, etc?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2020
@ashish-amarnath
Copy link
Author

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 27, 2020
@thockin thockin removed their assignment Oct 28, 2020
@k8s-triage-robot
Copy link

The lifecycle/frozen label can not be applied to PRs.

This bot removes lifecycle/frozen from PRs because:

  • Commenting /lifecycle frozen on a PR has not worked since March 2021
  • PRs that remain open for >150 days are unlikely to be easily rebased

You can:

  • Rebase this PR and attempt to get it merged
  • Close this PR with /close

Please send feedback to sig-contributor-experience at kubernetes/community.

/remove-lifecycle frozen

@k8s-ci-robot k8s-ci-robot removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Aug 17, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 16, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 16, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants