Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue Proxy does not exit after draining and server shutdown #12865

Closed
Legion2 opened this issue Apr 18, 2022 · 10 comments
Closed

Queue Proxy does not exit after draining and server shutdown #12865

Legion2 opened this issue Apr 18, 2022 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@Legion2
Copy link

Legion2 commented Apr 18, 2022

What version of Knative?

1.3.2

Expected Behavior

When a pod is deleted, queue-proxy should drain the connections and then exit, and after that the user container should exit.

Actual Behavior

When a pod is deleted, queue-proxy drains the connections and shut down the server and prints Shutdown complete, exiting... in the logs. However, it does not exit and is killed by kubelet after deletionGracePeriodSeconds (300s).

Steps to Reproduce the Problem

Setup

  1. Setup Knative with Istio
  2. Create the following service:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: foo
  annotations:
    networking.knative.dev/disableAutoTLS: "true"
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "1"
        autoscaling.knative.dev/maxScale: "10"
        autoscaling.knative.dev/targetUtilizationPercentage: "70"
    spec:
      containerConcurrency: 1
      containers:
      - name: api
        image: some-image
        resources:
          requests:
            memory: 400Mi
            cpu: 50m
          limits:
            memory: 1024Mi
        ports:
        - containerPort: 8080
          name: http1
  1. Put a relative constant load on the service: 0.1 requests per second with 200 ms Response time and spikes to 2 seconds response time
  2. The autoscaler should scale up and down the number of replicas.
  3. This produces many pods in deleting state, where the queue-proxy container is blocking the deletion, even after it has drained the connections
@Legion2 Legion2 added the kind/bug Categorizes issue or PR as related to a bug. label Apr 18, 2022
@CharZhou
Copy link

CharZhou commented Jul 1, 2022

same

@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 30, 2022
@knative-prow-robot
Copy link
Contributor

This issue or pull request is stale because it has been open for 90 days with no activity.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close

/lifecycle stale

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 31, 2022
@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 30, 2023
@github-actions github-actions bot closed this as completed Mar 1, 2023
@dprotaso
Copy link
Member

dprotaso commented Mar 1, 2023

Following up here - after the queue proxy finishes drain - Kubernetes will send a TERM signal to the user-container (your application)

The shutdown will be blocked if your applicaiton doesn't perform a graceful shutdown.

As an example this python server doesn't have a graceful shutdown

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: my-test
spec:
  template:
    spec:
      timeoutSeconds: 150
      containers:
        - image: python:3.9-slim
          command: ["python"]
          args: ["-m", "http.server", "8080"]

in contrast to nginx which does perform a graceful shutdown

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: my-test
spec:
  template:
    spec:
      timeoutSeconds: 150
      containers:
      - image: nginx
        ports:
        - containerPort: 80 

@dprotaso
Copy link
Member

dprotaso commented Mar 1, 2023

Also the queue proxy has a 30s drain time - if a request arrives the drain time is reset

@paoloyx
Copy link

paoloyx commented Mar 21, 2023

We're experiencing the same problem with a 1.7.1 version of Knative serving. By deploying services showed by @dprotaso the nginx one is gracefully shutdown, the python one no.

Is there a "canonical" way to address this issue? Thank you

@dprotaso
Copy link
Member

Is there a "canonical" way to address this issue? Thank you

I'm not familiar with python enough - but generally user applications will want to listen to SIGTERM and perform a graceful exit.

I wouldn't feel comfortable for Knative to try to force quit the user container to exist since there could be important things that occur on shutdown.

@ashrafguitoni
Copy link

For anyone dealing with this issue in Python, try using dumb-init

Copy link

knative-prow bot commented Nov 20, 2023

@danielrubin1989: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

6 participants