Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Sidecar lifecycle control #658

Closed
miaojianwei opened this issue Jul 2, 2021 · 18 comments
Closed

[feature request] Sidecar lifecycle control #658

miaojianwei opened this issue Jul 2, 2021 · 18 comments
Assignees
Labels
kind/feature-request wontfix This will not be worked on

Comments

@miaojianwei
Copy link

What would you like to be added:
I want to be able to configure the sidecar’s lifecycle, let the sidecar to stop running when the main container stops running, with no need for the main container to provide additional interface for status detection.

Why is this needed:
I add a sidecar container to the pod running the user‘s computing task such as AI training, which is used to provide some necessary services for smart cards. However, when the computing container stops, sidecar is still running, so the pod status is still running. But the user is not aware of the existence of the sidecar container, they expect the pod status to be complete.
So I hope the sidecar container can stop itself when the computing container stops.

@miaojianwei miaojianwei changed the title [feature request] sidecar lifecycle control [feature request] Sidecar lifecycle control Jul 2, 2021
@FillZpp
Copy link
Member

FillZpp commented Jul 7, 2021

/unassign @jian-he
/assign @FillZpp

@kruise-bot kruise-bot assigned FillZpp and unassigned jian-he Jul 7, 2021
@FillZpp
Copy link
Member

FillZpp commented Jul 8, 2021

@miaojianwei Understand. For those pods with restartPolicy never or onFailure, sidecar container will remain running, even if the main container has already stopped. We suppose to solve this problem in feature releases.

@stale
Copy link

stale bot commented Oct 6, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Oct 6, 2021
@FillZpp
Copy link
Member

FillZpp commented Oct 8, 2021

/pinned

@stale stale bot removed the wontfix This will not be worked on label Oct 8, 2021
@FillZpp
Copy link
Member

FillZpp commented Nov 10, 2021

/unassign @FillZpp
/assign @veophi

@kruise-bot kruise-bot assigned veophi and unassigned FillZpp Nov 10, 2021
@trondhindenes
Copy link

We have a similar but not identical need: We're trying to find a way to control the lifecycle of sidecar containers related to the "main" container during shutdown.
We need to guarantee that the sidecar never shuts down before the main container. This is problematic in cases where the sidecar is needed to perform some operation, and the main container takes a few seconds to finish off some semi-long-running task after having received the SIGTERM signal. If the sidecar is able to exit right away, the main container will not be able to gracefully finish it's task before shutting down.

it would be very valuable to have something like a apps.kruise.io/container-termination-priority annotation to control the order in which containers are stopped, similar to apps.kruise.io/container-launch-priority (sidecar termination control is a well-known problem in Kubernetes, so if Openkruise would solve this it would be very beneficial for the community)

@FillZpp
Copy link
Member

FillZpp commented Dec 29, 2021

it would be very valuable to have something like a apps.kruise.io/container-termination-priority annotation to control the order in which containers are stopped, similar to apps.kruise.io/container-launch-priority

Yeah, we know that. But unfortunately, there is no way to control the sequence of container termination, except the preStop hook, which should be configured by users. It means you have to customize the preStop hook script of your sidecar container, it keeps waiting for the process in main container exited (maybe via identify file in shared volume) to make sure sidecar terminating after main.

What's more, now there is a new Keystone containers KEP that aims to terminate pods on the basis of keystone(main/app/essential) containers completion status. You can join its discussion, although I think it is unlikely to be accepted by K8s community.

Hope this can help you.

@trondhindenes
Copy link

thanks for the detailed explanation @FillZpp . My thinking was that it would be possible to use the same patch operation to remove a container from a running pod as (I'm assuming) openkruse uses to add containers in order, and then have a admission controller intercept pod delete commands to trigger the "ordered deletion". It might not be feasable tho, so thanks again for the reply.

@FillZpp
Copy link
Member

FillZpp commented Dec 29, 2021

@trondhindenes Aha, I understand your thought, but it does not work like that. In fact, we can not intercept the container start/stop operations executed by kubelet with admission webhook, which could only intercept update or delete to the pod objects.

So we have to control the container launch priority using configmap reference, which is some kind of a 'hook' in the progress, you can see detail in this doc.

But when you intercept a pod deletion request via admission webhook, the pod is not terminating yet (deletionTimestamp is nil) and kubelet will start the container if you trigger a container to be stopped at this time. But once the pod deletion request is passed, kubelet will begin to stop the containers immediately and there is no 'hook' here to affect the order.

Not sure if I made it clear.

@trondhindenes
Copy link

Thanks. I played a bit with the kubernetes api today, and realized that patching a pod is a lot stricted than what I'd previously thought - it seems like updating the image if a container is pretty much the only allowed mutating operation (as well as adding more containers to a running pod ofc).

Thanks again for the feedback, appreciate it!

@furykerry
Copy link
Member

We need to guarantee that the sidecar never shuts down before the main container. This is problematic in cases where the sidecar is needed to perform some operation, and the main container takes a few seconds to finish off some semi-long-running task after having received the SIGTERM signal. If the sidecar is able to exit right away, the main container will not be able to gracefully finish it's task before shutting down.

k8s lacks the ability to natively support container termination sequence, yet it does not mean it's impossible. It's just hacky. One solution comes into my mind is that openkruise can takeover the container lifecycle management such as prestop probe, and inject a dummy prestop probe for kubelet ( sleep N seconds). One problem with this solution is that when you investigate the pod yaml, the prestop probe will not be replaced, which may be confusing for some people

@trondhindenes
Copy link

trondhindenes commented Jan 4, 2022

Thanks! We ended up with a custom mutating webhook that does what we need, and like you say we're injecting a prestop hook to coordinate shutdown between containers in the pod - it works surprisingly well (https://github.com/trondhindenes/daps)

@stale
Copy link

stale bot commented Apr 4, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Apr 4, 2022
@FillZpp
Copy link
Member

FillZpp commented Apr 5, 2022

/pinned

@stale stale bot removed the wontfix This will not be worked on label Apr 5, 2022
@stale
Copy link

stale bot commented Jul 4, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Jul 4, 2022
@stale stale bot closed this as completed Jul 11, 2022
@FillZpp
Copy link
Member

FillZpp commented Jul 12, 2022

/reopen

@kruise-bot
Copy link

@FillZpp: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kruise-bot kruise-bot reopened this Jul 12, 2022
@stale stale bot removed the wontfix This will not be worked on label Jul 12, 2022
@stale
Copy link

stale bot commented Oct 10, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Oct 10, 2022
@stale stale bot closed this as completed Oct 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature-request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

7 participants