Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure the user-container shutdown gracefully #2311

Closed
lvjing2 opened this issue Oct 25, 2018 · 16 comments
Closed

Make sure the user-container shutdown gracefully #2311

lvjing2 opened this issue Oct 25, 2018 · 16 comments
Assignees
Labels
area/autoscale area/test-and-release It flags unit/e2e/conformance/perf test issues for product features kind/bug Categorizes issue or PR as related to a bug. kind/feature Well-understood/specified features, ready for coding.

Comments

@lvjing2
Copy link
Contributor

lvjing2 commented Oct 25, 2018

Expected Behavior

When autoscaler scale down the replica number of pod, then some pod will be terminated. Expectantly, the user-container shouldn't start to terminate until all it's request is finished.

Actual Behavior

The user-container would start to terminate after 20s.
https://github.com/knative/serving/blob/master/cmd/queue/main.go#L57

Additional Info

Actually, the delay about 20s is ok for many cases. But if I ran a batch job for example taking about 2mins for each request. If the user-container received a SIGTERM signal when it is still or just start to handle the request, then there are highly possible that the batch job be interrupted.

Possible solution

we can use the statistical concurrent request number to check whether there are unfinished request in the user-container.
https://github.com/knative/serving/blob/master/pkg/queue/stats.go#L67
The quitHandler will wait until the concurrent request number turn to be 0 after the container received a terminal signal, rather than just sleep for 20s.

@knative-prow-robot knative-prow-robot added area/autoscale area/test-and-release It flags unit/e2e/conformance/perf test issues for product features kind/bug Categorizes issue or PR as related to a bug. kind/feature Well-understood/specified features, ready for coding. kind/good-first-issue labels Oct 25, 2018
@lvjing2
Copy link
Contributor Author

lvjing2 commented Oct 25, 2018

I am glad to handle this if needed /assign

@markusthoemmes
Copy link
Contributor

@lvjing2 Do you have a reproducer for this? In theory, the http server is supposed to shutdown gracefully, that is: it's supposed to only shutdown after all connections have shut down.

There is however a wrinkle here in that K8s has a graceful shutdown timeout of 30s by default (I think?). After that it will force kill the container and no graceful shutdown is possible. We had a discussion somewhere (maybe Slack) where this popped up as well. In theory, we'd need to set that graceful shutdown period to the revisions requestTimeout (at least).

@fatkun
Copy link
Contributor

fatkun commented Oct 26, 2018

k8s has a graceful shutdown timeout of 30s by default. You can change it by POD's spec.terminationGracePeriodSeconds .
How to avoid istio-proxy container quit first, it seems istio recived TERM SIGNAL and quit fast, and the comming request are failed?

@lvjing2
Copy link
Contributor Author

lvjing2 commented Oct 26, 2018

@markusthoemmes @fatkun I need to test more, to make sure what is the root cause of my such problem.

@lvjing2
Copy link
Contributor Author

lvjing2 commented Oct 26, 2018

@markusthoemmes I test the gracefully shutdown for a web server, and I found it will terminate the request before it finished. The following is the reproduce steps:

  1. I send a request to my spring boot web app by curl, the request handler which will sleep about 58s(little less than default time 60s).
    @RequestMapping("/")
    public String index(@RequestParam String sleep) throws Exception {
        log.info("start to handle index");
        int sleepSec = Integer.valueOf(sleep);
        Thread.sleep(sleepSec * 1000);
        log.info("end to handle index after " + sleep + " second");
        return "Gracefully shutdown after " + sleep + " second";
    }
  1. Then I send a SIGTERM signal to the tomcat by command kill.
  2. I check the log of tomcat, it shows terminal the request quickly.
2018-10-26 16:42:39.962  INFO 52587 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : FrameworkServlet 'dispatcherServlet': initialization completed in 16 ms
2018-10-26 16:42:39.989  INFO 52587 --- [nio-8080-exec-1] com.example.demo.HelloController         : start to handle index
2018-10-26 16:42:43.222  INFO 52587 --- [      Thread-14] ConfigServletWebServerApplicationContext : Closing org.springframework.boot.web.servlet.context.AnnotationConfigServletWebServerApplicationContext@156107c2: startup date [Fri Oct 26 16:41:51 CST 2018]; root of context hierarchy
2018-10-26 16:42:43.226  INFO 52587 --- [      Thread-14] o.s.j.e.a.AnnotationMBeanExporter        : Unregistering JMX-exposed beans on shutdown
  1. the curl response
* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) Empty reply from server

From the log, I find my request is terminated before it is finished, cause there is only start to handle index but no end to handle index after log. I am not sure is there anything I missed in this test.

@lvjing2
Copy link
Contributor Author

lvjing2 commented Oct 26, 2018

@fatkun The terminationGracePeriodSeconds in the pod spec is the grace period between the pod received a TERM signal and a KILL signal. It means the container will received the TERM signal immediately no matter how long the terminationGracePeriodSeconds is.
So what matters to my problem is how knative handler the TERM signal, it show the request will be terminated from the above test.

One more bug I found is that:

when the pod is received a SIGTERM signal, then all the container will received this SIGTERM signal at the almost the same time. so they will start or try to terminal themselves.
For queue-proxy and user-container, we added preStop hook by wait for 20s. but for istio-proxy, it will terminated quickly just as you said. Meanwhile the user-container are possibly handling some request. For the upstream of the istio-proxy would possibly start to retry those request, so the route would dispatch those request to the new pod, this means those request will be executed for about twice. This will cause trouble of idempotence.

the evidence about handle one request about twice

Started DemoApplication in 5.442 seconds (JVM running for 24.989)
Initializing Spring FrameworkServlet 'dispatcherServlet'
FrameworkServlet 'dispatcherServlet': initialization started
FrameworkServlet 'dispatcherServlet': initialization completed in 38 ms
start to handle index
Closing org.springframework.boot.web.servlet.context.AnnotationConfigServletWebServerApplicationContext@3a74a538: startup date [Fri Oct 26 08:27:05 UTC 2018]; root of context hierarchy
Unregistering JMX-exposed beans on shutdown
Started DemoApplication in 5.586 seconds (JVM running for 22.603)
Initializing Spring FrameworkServlet 'dispatcherServlet'
FrameworkServlet 'dispatcherServlet': initialization started
FrameworkServlet 'dispatcherServlet': initialization completed in 19 ms
start to handle index
end to handle index after 58 second
time curl -v -H 'Host: helloworld-go-1.knative-serving.example.com' -d sleep=58 http://35.192.170.xxx
* Rebuilt URL to: http://35.192.170.221/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 1087 (#0)
* upload completely sent off: 8 out of 8 bytes

<html>
<head>
 <title>500 Internal Privoxy Error</title>
 <link rel="shortcut icon" href="http://config.privoxy.org/error-favicon.ico" type="image/x-icon"></head>
<body>
<h1>500 Internal Privoxy Error</h1>
<p>Privoxy encountered an error while processing your request:</p>
<p><b>Could not load template file <code>connection-timeout</code> or one of its included components.</b></p>
<p>Please contact your proxy administrator.</p>
<p>If you are the proxy administrator, please put the required file(s)in the <code><i>(confdir)</i>/templates</code> directory.  The location of the <code><i>(confdir)</i></code> directory is specified in the main Privoxy <code>config</code> file.  (It's typically the Privoxy install directory, or <code>/etc/privoxy/</code>).</p>
</body>
</html>
* Connection #0 to host 127.0.0.1 left intact

real	1m0.093s
user	0m0.008s
sys	0m0.008s

How to reproduce

  1. build a web app with this
@RequestMapping("/")
    public String index(@RequestParam String sleep) throws Exception {
        log.info("start to handle index");
        int sleepSec = Integer.valueOf(sleep);
        Thread.sleep(sleepSec * 1000);
        log.info("end to handle index after " + sleep + " second");
        return "Gracefully shutdown after " + sleep + " second";
    }
  1. deploy this app
  2. running the following script
set -x
SERVICE='helloworld-go-1'
OLDPOD=$(kubectl get pod -n knative-serving  | grep ${SERVICE} | grep Running | awk '{print $1}')
time curl -v -H "Host: helloworld-go-1.knative-serving.example.com" -d sleep=58 http://35.192.170.221 &
sleep 2
echo "====================old user-container in pod ${OLDPOD} logs================="
kubectl delete pod -n knative-serving ${OLDPOD} &
kubectl logs -n knative-serving ${OLDPOD} user-container

NEWPOD=$(kubectl get pod -n knative-serving  | grep ${SERVICE} | grep Running | awk '{print $1}')
while [ "${NEWPOD}" == "" ]
do
    NEWPOD=$(kubectl get pod -n knative-serving  | grep ${SERVICE} | grep Running | awk '{print $1}')
done

echo "====================new user-container in pod ${NEWPOD} logs================="
kubectl logs -n knative-serving ${NEWPOD} user-container

@fatkun
Copy link
Contributor

fatkun commented Oct 26, 2018

@lvjing2 I use wrk to test performance and see some error while pod is terminating.
I see istio-proxy quit fast, the log in kubelet shows queue-proxy preStop request failed, maybe due to istio-proxy exit (iptable redirect the traffic to istio). queue-proxy exit too.

The user-container is PID 1. If you are not register signal SIGTERM , it will ignore the signal SIGTERM(like the sample helloworld-go). user-container quit after 30s.

I use kubelet exec -it xxxx -c queue-proxy sh to detect the container quit or not.

@lvjing2
Copy link
Contributor Author

lvjing2 commented Oct 26, 2018

@fatkun The preStop request should no related with whether the istio-proxy is down or not, it needn't the iptable from istio. I can't understand how your queue-proxy preStop failed, and exit.

The user-container is a docker process, it support to handle the SIGTERM. And also my app in user-container is a spring boot web app with tomcat, it also contains the handler for the SIGTERM. What's more, the user-container is not always quit after 30s. I checked it may takes only 10+ second to quick the all pod.

time kubectl delete pod -n knative-serving helloworld-go-1-00001-deployment-f97fd6bb-sxbnq
pod "helloworld-go-1-00001-deployment-f97fd6bb-sxbnq" deleted
kubectl delete pod -n knative-serving   0.11s user 0.03s system 1% cpu 12.972 total

@markusthoemmes
Copy link
Contributor

@lvjing2 what I meant initially, is that the queue-proxy will only quit after it handled all requests (the go http server does that by default IIRC).

Regarding the user container: Yes, I think we need to teach it to handle a graceful shutdown, or we need some kind of coordinated shutdown where the user-container receives the kill signal only after the queue-proxy decides its fine to exit now.

As for istio, envoy is probably also able to shutdown gracefully (without force terminating connections). Maybe another signal is needed?

@fatkun
Copy link
Contributor

fatkun commented Oct 26, 2018

@lvjing2 In our cluster the delete command execute finish, the pod status is still Terminating.

here is kubelet logs
istio-proxy containerID: docker://dd1bf2
queue-proxy containerID: docker://c63abb
user-container containerID: docker://fc2ef1

after I delete the pod, istio-proxy and queue-proxy quit after 1second, and user-container quit after 30 secends

Oct 26 19:49:56 HOSTXXX kubelet[240233]: I1026 19:49:56.753884  240233 kuberuntime_container.go:578] Killing container "docker://fc2ef1c950d2021ba043d33f815ddb1f1a73279524db9c47eccfe62ea5634e0f" with 30 second grace period
Oct 26 19:49:56 HOSTXXX kubelet[240233]: I1026 19:49:56.753949  240233 kuberuntime_container.go:578] Killing container "docker://dd1bf25dbe2b1e02a74974c1326f1d1484fa1f04d2ae4a171020be7a8697a83c" with 30 second grace period
Oct 26 19:49:56 HOSTXXX kubelet[240233]: I1026 19:49:56.754016  240233 kuberuntime_container.go:578] Killing container "docker://c63abb859ddac85599a6719204fa9e04328f0c8641a4c11ac5575ce0428d0808" with 30 second grace period

Oct 26 19:49:57 HOSTXXX kubelet[240233]: I1026 19:49:57.094450  240233 handlers.go:69] Http lifecycle hook (quitquitquit) for Container "user-container" in Pod "helloworld2-00001-deployment-7d68775d8c-4sc72_hi(d532bf90-d914-11e8-a849-6c92bf6b2d28)" failed - error: Get http://xx.xx.98.188:8022/quitquitquit: read tcp xx.xx.29.87:16089->xx.xx.98.188:8022: read: connection reset by peer, message: ""
Oct 26 19:49:57 HOSTXXX kubelet[240233]: E1026 19:49:57.094471  240233 kuberuntime_container.go:491] preStop hook for container "user-container" failed: Get http://xx.xx.98.188:8022/quitquitquit: read tcp xx.xx.29.87:16089->xx.xx.98.188:8022: read: connection reset by peer
Oct 26 19:49:57 HOSTXXX kubelet[240233]: I1026 19:49:57.118945  240233 handlers.go:69] Http lifecycle hook (quitquitquit) for Container "queue-proxy" in Pod "helloworld2-00001-deployment-7d68775d8c-4sc72_hi(d532bf90-d914-11e8-a849-6c92bf6b2d28)" failed - error: Get http://xx.xx.98.188:8022/quitquitquit: dial tcp xx.xx.98.188:8022: getsockopt: connection refused, message: ""
Oct 26 19:49:57 HOSTXXX kubelet[240233]: E1026 19:49:57.118967  240233 kuberuntime_container.go:491] preStop hook for container "queue-proxy" failed: Get http://xx.xx.98.188:8022/quitquitquit: dial tcp xx.xx.98.188:8022: getsockopt: connection refused
Oct 26 19:49:57 HOSTXXX kubelet[240233]: I1026 19:49:57.569470  240233 kubelet.go:1969] SyncLoop (PLEG): "helloworld2-00001-deployment-7d68775d8c-4sc72_hi(d532bf90-d914-11e8-a849-6c92bf6b2d28)", event: &pleg.PodLifecycleEvent{ID:"d532bf90-d914-11e8-a849-6c92bf6b2d28", Type:"ContainerDied", Data:"dd1bf25dbe2b1e02a74974c1326f1d1484fa1f04d2ae4a171020be7a8697a83c"}
Oct 26 19:49:57 HOSTXXX kubelet[240233]: I1026 19:49:57.569614  240233 kubelet.go:1969] SyncLoop (PLEG): "helloworld2-00001-deployment-7d68775d8c-4sc72_hi(d532bf90-d914-11e8-a849-6c92bf6b2d28)", event: &pleg.PodLifecycleEvent{ID:"d532bf90-d914-11e8-a849-6c92bf6b2d28", Type:"ContainerDied", Data:"c63abb859ddac85599a6719204fa9e04328f0c8641a4c11ac5575ce0428d0808"}

...

Oct 26 19:50:27 HOSTXXX kubelet[240233]: I1026 19:50:27.884705  240233 kubelet.go:1969] SyncLoop (PLEG): "helloworld2-00001-deployment-7d68775d8c-4sc72_hi(d532bf90-d914-11e8-a849-6c92bf6b2d28)", event: &pleg.PodLifecycleEvent{ID:"d532bf90-d914-11e8-a849-6c92bf6b2d28", Type:"ContainerDied", Data:"fc2ef1c950d2021ba043d33f815ddb1f1a73279524db9c47eccfe62ea5634e0f"}

@lvjing2
Copy link
Contributor Author

lvjing2 commented Oct 26, 2018

@fatkun From your log, We can find that the queue-proxy and istio-proxy terminated immediately, and prestop hook is not work anymore since the queue-proxy started to terminated before the prestop works. I guess your app in user-container didn't handle the SIGTERM, so your app didn't start to terminal until it received STGKILL after 30s timeout.
From my test, my app could handle the SIGTERM, so it only takes about 10+ second to terminate a pod.

So the problem comes to why the queue-proxy terminated immediately when it received a SIGTERM signal? I'd like to ask another question, Do the queue-proxy still maintain http connections with user-container when received a SIGTERM?
https://github.com/knative/serving/blob/master/cmd/queue/main.go#L285

@lvjing2
Copy link
Contributor Author

lvjing2 commented Oct 31, 2018

The problems discovery here:

  1. How to make sure the istio-proxy terminated only after the preStop request connection started successfully. In a short, make the istio-proxy didn't quit so quickly.
  2. Avoid the same user request routed and executed twice when shutting down.

I noticed that the first problem and also #2344 should be fixed by workaround #2357 , so I'd like to handle the left second problem.
/assign

@tcnghia
Copy link
Contributor

tcnghia commented Nov 1, 2018

@fatkun after we make istio-proxy lives longer, do you still see queue-proxy shuts down quickly?

@tcnghia
Copy link
Contributor

tcnghia commented Nov 1, 2018

@lvjing2 yes, we currently don't have have a way to expose retry settings yet.

It is certainly a tradeoff between having nice retries and having duplicates in some cases. Can you please file a separate issue to track exposing retry policy to users?

Thanks

@lvjing2
Copy link
Contributor Author

lvjing2 commented Nov 1, 2018

@tcnghia I think #2201 had tracked this.
The retries is good, and could fix a lot potential problems, for example holding the request among the time between the revision is ready, and the user-container is not ready(if not define the readiness probe).
But I think we should avoid the duplicates, cause it could possibly cause writing same data twice in db or some other bad things. I think I should open a new issues about avoiding handle the same request more than once.

@lvjing2
Copy link
Contributor Author

lvjing2 commented Nov 2, 2018

I will close this, and open another one to track the problem about handling one request twice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/autoscale area/test-and-release It flags unit/e2e/conformance/perf test issues for product features kind/bug Categorizes issue or PR as related to a bug. kind/feature Well-understood/specified features, ready for coding.
Projects
None yet
Development

No branches or pull requests

5 participants