-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure the user-container shutdown gracefully #2311
Comments
I am glad to handle this if needed /assign |
@lvjing2 Do you have a reproducer for this? In theory, the http server is supposed to shutdown gracefully, that is: it's supposed to only shutdown after all connections have shut down. There is however a wrinkle here in that K8s has a graceful shutdown timeout of 30s by default (I think?). After that it will force kill the container and no graceful shutdown is possible. We had a discussion somewhere (maybe Slack) where this popped up as well. In theory, we'd need to set that graceful shutdown period to the revisions requestTimeout (at least). |
k8s has a graceful shutdown timeout of 30s by default. You can change it by POD's spec.terminationGracePeriodSeconds . |
@markusthoemmes @fatkun I need to test more, to make sure what is the root cause of my such problem. |
@markusthoemmes I test the gracefully shutdown for a web server, and I found it will terminate the request before it finished. The following is the reproduce steps:
@RequestMapping("/")
public String index(@RequestParam String sleep) throws Exception {
log.info("start to handle index");
int sleepSec = Integer.valueOf(sleep);
Thread.sleep(sleepSec * 1000);
log.info("end to handle index after " + sleep + " second");
return "Gracefully shutdown after " + sleep + " second";
}
From the log, I find my request is terminated before it is finished, cause there is only |
@fatkun The terminationGracePeriodSeconds in the pod spec is the grace period between the pod received a TERM signal and a KILL signal. It means the container will received the TERM signal immediately no matter how long the terminationGracePeriodSeconds is. One more bug I found is that:when the pod is received a SIGTERM signal, then all the container will received this SIGTERM signal at the almost the same time. so they will start or try to terminal themselves. the evidence about handle one request about twice
How to reproduce
@RequestMapping("/")
public String index(@RequestParam String sleep) throws Exception {
log.info("start to handle index");
int sleepSec = Integer.valueOf(sleep);
Thread.sleep(sleepSec * 1000);
log.info("end to handle index after " + sleep + " second");
return "Gracefully shutdown after " + sleep + " second";
}
set -x
SERVICE='helloworld-go-1'
OLDPOD=$(kubectl get pod -n knative-serving | grep ${SERVICE} | grep Running | awk '{print $1}')
time curl -v -H "Host: helloworld-go-1.knative-serving.example.com" -d sleep=58 http://35.192.170.221 &
sleep 2
echo "====================old user-container in pod ${OLDPOD} logs================="
kubectl delete pod -n knative-serving ${OLDPOD} &
kubectl logs -n knative-serving ${OLDPOD} user-container
NEWPOD=$(kubectl get pod -n knative-serving | grep ${SERVICE} | grep Running | awk '{print $1}')
while [ "${NEWPOD}" == "" ]
do
NEWPOD=$(kubectl get pod -n knative-serving | grep ${SERVICE} | grep Running | awk '{print $1}')
done
echo "====================new user-container in pod ${NEWPOD} logs================="
kubectl logs -n knative-serving ${NEWPOD} user-container |
@lvjing2 I use wrk to test performance and see some error while pod is terminating. The user-container is PID 1. If you are not register signal SIGTERM , it will ignore the signal SIGTERM(like the sample helloworld-go). user-container quit after 30s. I use |
@fatkun The preStop request should no related with whether the istio-proxy is down or not, it needn't the iptable from istio. I can't understand how your queue-proxy preStop failed, and exit. The user-container is a docker process, it support to handle the SIGTERM. And also my app in user-container is a spring boot web app with tomcat, it also contains the handler for the SIGTERM. What's more, the user-container is not always quit after 30s. I checked it may takes only 10+ second to quick the all pod.
|
@lvjing2 what I meant initially, is that the queue-proxy will only quit after it handled all requests (the go http server does that by default IIRC). Regarding the user container: Yes, I think we need to teach it to handle a graceful shutdown, or we need some kind of coordinated shutdown where the user-container receives the kill signal only after the queue-proxy decides its fine to exit now. As for istio, envoy is probably also able to shutdown gracefully (without force terminating connections). Maybe another signal is needed? |
@lvjing2 In our cluster the delete command execute finish, the pod status is still Terminating. here is kubelet logs after I delete the pod, istio-proxy and queue-proxy quit after 1second, and user-container quit after 30 secends
|
@fatkun From your log, We can find that the queue-proxy and istio-proxy terminated immediately, and prestop hook is not work anymore since the queue-proxy started to terminated before the prestop works. I guess your app in user-container didn't handle the SIGTERM, so your app didn't start to terminal until it received STGKILL after 30s timeout. So the problem comes to why the queue-proxy terminated immediately when it received a SIGTERM signal? I'd like to ask another question, Do the queue-proxy still maintain http connections with user-container when received a SIGTERM? |
The problems discovery here:
I noticed that the first problem and also #2344 should be fixed by workaround #2357 , so I'd like to handle the left second problem. |
@fatkun after we make istio-proxy lives longer, do you still see queue-proxy shuts down quickly? |
@lvjing2 yes, we currently don't have have a way to expose retry settings yet. It is certainly a tradeoff between having nice retries and having duplicates in some cases. Can you please file a separate issue to track exposing retry policy to users? Thanks |
@tcnghia I think #2201 had tracked this. |
I will close this, and open another one to track the problem about handling one request twice. |
Expected Behavior
When autoscaler scale down the replica number of pod, then some pod will be terminated. Expectantly, the user-container shouldn't start to terminate until all it's request is finished.
Actual Behavior
The user-container would start to terminate after 20s.
https://github.com/knative/serving/blob/master/cmd/queue/main.go#L57
Additional Info
Actually, the delay about 20s is ok for many cases. But if I ran a batch job for example taking about 2mins for each request. If the user-container received a SIGTERM signal when it is still or just start to handle the request, then there are highly possible that the batch job be interrupted.
Possible solution
we can use the statistical concurrent request number to check whether there are unfinished request in the user-container.
https://github.com/knative/serving/blob/master/pkg/queue/stats.go#L67
The quitHandler will wait until the concurrent request number turn to be 0 after the container received a terminal signal, rather than just sleep for 20s.
The text was updated successfully, but these errors were encountered: