Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support grpc keep alive server parameters #4402

Closed
anjmao opened this issue Aug 6, 2019 · 17 comments
Closed

Support grpc keep alive server parameters #4402

anjmao opened this issue Aug 6, 2019 · 17 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@anjmao
Copy link

anjmao commented Aug 6, 2019

Is this a BUG REPORT or FEATURE REQUEST? (choose one): FEATURE REQUEST

NGINX Ingress controller version:

rancher/nginx-ingress-controller:0.21.0-rancher3

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-08T16:31:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.6", GitCommit:"b1d75deca493a24a2f87eb1efde1a569e52fc8d9", GitTreeState:"clean", BuildDate:"2018-12-16T04:30:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:
AWS, Rancher

What happened:

  1. I have grpc backend written in go and mobile client written in swift which uses swift-grpc.
    On go backend I have keep alive policy
keepalivePolicy = keepalive.EnforcementPolicy{
	MinTime:             5 * time.Second, // If a client pings more than once every x duration, terminate the connection.
	PermitWithoutStream: false,           // Allow pings even when there are no active streams
}

keepaliveParams = keepalive.ServerParameters{
	MaxConnectionIdle:     1 * time.Hour,    // If a client is idle for given duration, send a GOAWAY.
	MaxConnectionAge:      1 * time.Hour,    // If any connection is alive for more than given duration, send a GOAWAY.
	MaxConnectionAgeGrace: 10 * time.Second, // Allow given duration for pending RPCs to complete before forcibly closing connections
	Time:                  10 * time.Second, // Ping the client if it is idle for given duration to ensure the connection is still active.
	Timeout:               5 * time.Second,  // Wait given duration for the ping ack before assuming the connection is dead.
}
  1. Ngnix ingress is used to load balance and terminate TLS traffic.
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
    nginx.ingress.kubernetes.io/server-snippet: |
      grpc_read_timeout 600s;
      grpc_send_timeout 600s;
      client_body_timeout 600s;

My goal is to have long lived bidi streaming rpc so client can accept incoming updates from the backend. Also if client is disconnected (let's say internet connection is disabled) I want server to determine this as fast as possible (ideally 10 seconds). Currently my grpc server is doing keep alive ping each 10 seconds and ngnix proxy is doing ack of the ping but ngnix itself is not pinging client.

What you expected to happen:
I expect to have settings on ngnix ingress to allow setup grpc keep alive policy, something like

grpc_keepalive_time 10s;
grpc_keepalive_timeout 5s;

Or even better to just forward grpc ping frames to the client.

I found similar issue on envoy proxy but it seems to be fixed now envoyproxy/envoy#2086

@aledbf aledbf added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 2, 2019
@thetruechar
Copy link

nginx.ingress.kubernetes.io/server-snippet: |
grpc_read_timeout 600s;
grpc_send_timeout 600s;
client_body_timeout 600s;

this save my day, thank you guy!

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 23, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 22, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@PI-Victor
Copy link
Member

PI-Victor commented Feb 11, 2021

i think i want to have a better look at this, since i ran into the same issue.

/remove-lifecycle rotten
/assign

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 11, 2021
@PI-Victor
Copy link
Member

/reopen

@k8s-ci-robot
Copy link
Contributor

@PI-Victor: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Feb 11, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 13, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 12, 2021
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@nnewc
Copy link

nnewc commented Nov 22, 2021

nginx.ingress.kubernetes.io/server-snippet:
grpc_read_timeout 600s;
grpc_send_timeout 600s;
client_body_timeout 600s;

Is there an alternative to this? Because of kubernetes/kubernetes#126811 server-snippets are disabled in my cluster (RKE2).

@danielleiszen
Copy link

I am not sure, if this is related. But I TLS terminate a gRPC upstream with NGINX Ingress. After 1 minute of inactivity the stream stops regardless of the idle timeout, connection timeout and whatever timeout I specify on the client side when opening the stream. The gRPC server logs clearly show a 1 min timeout which is NOT specified anywhere in the configurations or call options for the service.

Furthermore I cannot reproduce the behaviour without the NGINX Ingress. So I suspect that NGINX messes up the call options for the gRPC stream during TLS termination. My ingress annotations are as simple as:

  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"

Any idea on avoiding this?
Thank you.

@chenchengfa93
Copy link

Hi @danielleiszen, have any idea avoiding this? I meet the same problem

@chenchengfa93
Copy link

Hi @danielleiszen, have any idea avoiding this? I meet the same problem
In my case, this is because of client_body_timeout,grpc keepalive only use tcp ping, but client_body_timeout need a body.We send empty message with interval time less to client_body_timeout to solve it

@danielleiszen
Copy link

Hi @danielleiszen, have any idea avoiding this? I meet the same problem
In my case, this is because of client_body_timeout,grpc keepalive only use tcp ping, but client_body_timeout need a body.We send empty message with interval time less to client_body_timeout to solve it

Hi @chenchengfa93,

I ended up doing something similar. I created a keep alive endpoint on my service that I call periodically from the client. The keep alive triggers a downstream communication and keeps the channel open. The client schedules the next keep alive call only when that downstream event arrives.

This seems to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

9 participants