Make sure one request should only be executed once. #2388

lvjing2 · 2018-11-02T03:57:55Z

Expected Behavior

One request should only be executed once.

Actual Behavior

Currently after #2357 , the istio-proxy, queue-proxy, user-container will wait after 20s. but once 20s passed, the terminating order of this three container is still istio-proxy -> queue-proxy -> user-container in most case, especially when the user request takes some time to finish, so the request remained in user-container will not return to the user anymore, and then the istio will try to reroute this request to any other alive pod.
So one request would only be executed more than once, this would cause problems for example the request is to writing data to db.

Steps to Reproduce the Problem

the evidence about handle one request about twice

lvjing2 · 2018-11-03T11:18:17Z

From my investigation, there are several problems related with this issues

problem with init pid in Scenario 1

For our helloworld-go example, the service in user-container is:

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0   4292   744 ?        Ss   04:56   0:00 /bin/sh -c /go/bin/helloworld
root           6  0.0  0.0 218800  5280 ?        Sl   04:56   0:00 /go/bin/helloworld

the service in queue-proxy is:

PID   USER     TIME  COMMAND
    1 root      0:01 /ko-app -containerConcurrency=0

the service in istio-proxy is:

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
istio-p+       1  0.0  0.2  31196 17676 ?        Ssl  04:56   0:00 /usr/local/bin/pilot-agent proxy sidecar --configPath /etc/istio/proxy --binaryPath /usr/local/bin/envoy --serviceCluster helloworld-go-0
istio-p+      15  0.5  0.4 123004 36744 ?        Sl   04:56   0:08 /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluste

so according to the init pid problem, our user-container will never received the SIGTERM signal. which is coincident with my test: the user-container always takes about 30s to start terminating only when received SIGKILL. For istio-proxy and queue-proxy, they received the SIGTERM very well.

I can get rid of this problem by using the tool ko.

change the helloworld-go image to github.com/knative/docs/serving/samples/helloworld-go
apply by executing ko apply -f service.yaml
In this case, the user-container would be deployed in docker just like the queue-proxy

PID   USER     TIME  COMMAND
    1 root      0:01 /ko-app -containerConcurrency=0

So the user-container would receive the SIGTERM. but the problem remains when we didn't use ko.

markusthoemmes · 2018-11-05T07:15:35Z

The "pid=1" problem is I think very well known in Docker/Containerland. It depends on how the user builds their containers. Here's a good summary of what somebody can do wrong when building their container. We should consider adding this to the runtime contract (gonna open an issue on that).

lvjing2 · 2018-11-05T13:16:17Z

After fixing the problem of pid 1, then the problem of this issues also gone, so I will close this issue. Thanks for markus' help.

knative-prow-robot added kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/process Changes in how we work labels Nov 2, 2018

tcnghia added enhancement New feature or request area/networking and removed kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/process Changes in how we work labels Nov 2, 2018

lvjing2 mentioned this issue Nov 5, 2018

try to start shutdown without always waiting for a specific time #2365

Closed

markusthoemmes mentioned this issue Nov 5, 2018

Add signal handling to the runtime contracrt. #2404

Closed

lvjing2 mentioned this issue Nov 5, 2018

fix pid 1 problem knative/docs#493

Merged

lvjing2 closed this as completed Nov 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure one request should only be executed once. #2388

Make sure one request should only be executed once. #2388

lvjing2 commented Nov 2, 2018

lvjing2 commented Nov 3, 2018 •

edited

Loading

markusthoemmes commented Nov 5, 2018

lvjing2 commented Nov 5, 2018

Make sure one request should only be executed once. #2388

Make sure one request should only be executed once. #2388

Comments

lvjing2 commented Nov 2, 2018

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

lvjing2 commented Nov 3, 2018 • edited Loading

problem with init pid in Scenario 1

markusthoemmes commented Nov 5, 2018

lvjing2 commented Nov 5, 2018

lvjing2 commented Nov 3, 2018 •

edited

Loading