Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure one request should only be executed once. #2388

Closed
lvjing2 opened this issue Nov 2, 2018 · 3 comments
Closed

Make sure one request should only be executed once. #2388

lvjing2 opened this issue Nov 2, 2018 · 3 comments
Labels
area/networking enhancement New feature or request kind/bug Categorizes issue or PR as related to a bug.

Comments

@lvjing2
Copy link
Contributor

lvjing2 commented Nov 2, 2018

Expected Behavior

One request should only be executed once.

Actual Behavior

Currently after #2357 , the istio-proxy, queue-proxy, user-container will wait after 20s. but once 20s passed, the terminating order of this three container is still istio-proxy -> queue-proxy -> user-container in most case, especially when the user request takes some time to finish, so the request remained in user-container will not return to the user anymore, and then the istio will try to reroute this request to any other alive pod.
So one request would only be executed more than once, this would cause problems for example the request is to writing data to db.

Steps to Reproduce the Problem

the evidence about handle one request about twice

@knative-prow-robot knative-prow-robot added kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/process Changes in how we work labels Nov 2, 2018
@tcnghia tcnghia added enhancement New feature or request area/networking and removed kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/process Changes in how we work labels Nov 2, 2018
@lvjing2
Copy link
Contributor Author

lvjing2 commented Nov 3, 2018

From my investigation, there are several problems related with this issues

problem with init pid in Scenario 1

For our helloworld-go example, the service in user-container is:

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0   4292   744 ?        Ss   04:56   0:00 /bin/sh -c /go/bin/helloworld
root           6  0.0  0.0 218800  5280 ?        Sl   04:56   0:00 /go/bin/helloworld

the service in queue-proxy is:

PID   USER     TIME  COMMAND
    1 root      0:01 /ko-app -containerConcurrency=0

the service in istio-proxy is:

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
istio-p+       1  0.0  0.2  31196 17676 ?        Ssl  04:56   0:00 /usr/local/bin/pilot-agent proxy sidecar --configPath /etc/istio/proxy --binaryPath /usr/local/bin/envoy --serviceCluster helloworld-go-0
istio-p+      15  0.5  0.4 123004 36744 ?        Sl   04:56   0:08 /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluste

so according to the init pid problem, our user-container will never received the SIGTERM signal. which is coincident with my test: the user-container always takes about 30s to start terminating only when received SIGKILL. For istio-proxy and queue-proxy, they received the SIGTERM very well.

I can get rid of this problem by using the tool ko.

  1. change the helloworld-go image to github.com/knative/docs/serving/samples/helloworld-go
  2. apply by executing ko apply -f service.yaml
    In this case, the user-container would be deployed in docker just like the queue-proxy
PID   USER     TIME  COMMAND
    1 root      0:01 /ko-app -containerConcurrency=0

So the user-container would receive the SIGTERM. but the problem remains when we didn't use ko.

@markusthoemmes
Copy link
Contributor

The "pid=1" problem is I think very well known in Docker/Containerland. It depends on how the user builds their containers. Here's a good summary of what somebody can do wrong when building their container. We should consider adding this to the runtime contract (gonna open an issue on that).

@lvjing2
Copy link
Contributor Author

lvjing2 commented Nov 5, 2018

After fixing the problem of pid 1, then the problem of this issues also gone, so I will close this issue. Thanks for markus' help.

@lvjing2 lvjing2 closed this as completed Nov 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking enhancement New feature or request kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants