-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request received RST when nginx reload #8122
Comments
@levinxo Hi, cross languages, gRPC doesn't support transparent reconnect for in-fly RPCs. If a subchannel (TCP connection) failed, all RPC scheduled on it will fail too. This prevents sending duplicated messages, and makes routing logic simpler. The client channel should be able to reconnect after certain backoff period. If you turn on the trace log for the Java client, you should find more information about the channel connectivity states and reconnect attempts. |
The exception just means the remote closed the connection before the RPC is completed, UNAVAILABLE is appropriate. To avoid it, either the RPC needs to complete quicker or Ngnix needs to give more time before tearing down the connection.
I don't understand what's being described here, could you rephrase it? gRPC clients (including both Java and Python/C-core) do not immediately close the connection after receiving GOAWAY even if there is no ongoing RPC, until the channel is shut down. It could be a desired behavior, but it's not happening today in all languages. |
@lidizheng @voidzcy hi,我猜两位是华裔,就直接说中文了哈。 What did you expect to see?使用grpc-java建立连接后持续串行请求后端服务(nginx做负载均衡,后端服务是tf serving),当nginx进行reload操作后(nginx -s reload),旧的nginx进程处理完当前请求后会退出,这时我期望新的连接会自动重新建立到nginx新的进程上。 关于上面的预期,在python的grpc client上得到了验证,使用python的客户端持续发送请求,此时对nginx操作reload,nginx在处理完当前的请求后,会发送一个 What did you see instead?在grpc-java这边,我看到的情况如下。使用grpc-java持续发送请求,当nginx进行reload时,grpc-java会抛出异常:
此时发现nginx发送了 RST 包给 grpc-java,不像上面python客户端会先发送FIN包来正常结束当前请求。而且该抛异常的请求在nginx的access日志中,是 |
Please post in English so that my other colleagues can read and provide insights. Did you see Java client being able to auto reconnect after the reload? That should happen properly.
Is this Nginx's log? It doesn't seem to be gRPC's. Is that supposed to be the second GOAWAY sent by Nginx graceful shutdown? Looks Nginx closed the connection right after sending out the first GOAWAY. Therefore, the unfinished RPC failed with
I am still confused by this. Isn't the TCP FIN sent by Nginx as you described above? Why are you saying grpc-python sends FIN? Btw, I've seen a grpc-python user reporting the opposite behavior in grpc/grpc#24069 |
Hi, I will post in english in the future.
it is netty-codec-http2 package's error: It seems to be netty's exception.
grpc-python sends FIN after Nginx sent FIN, but grpc-java receive RST by Nginx, has no FIN received before or after RST. how to turn on the trace log for the Java gRPC client? I want to check if Java client being able to auto reconnect after the nginx reload. |
One thing I need to mention for Java is that, if the connection has been previously established before receiving GOAWAY, the channel will go into IDLE state instead of eagerly trying to reconnect. It will reconnect if there are new/pending RPCs. You can enable the ChannelLogger with |
If nginx uses RST instead of FIN, then it might not have finished sending the response. It might have done the write() to the OS, but the OS might never have sent it. That could be why grpc-java thinks the RPC failed: because it did; the response never arrived. Since the server normally initiates the FIN, that leaves me wondering why it uses RST for grpc-java. What client behavior could change that behavior? I think this mostly needs an investigation of nginx's behavior. |
Closing since it really seams to be more of a question about nginx behavior. If there's something specific the grpc-java implementation is doing "wrong" that triggers the different behavior in nginx, we'd be interested to hear what it is. But right now there doesn't seem to be anything actionable for us to do here. Closing, but if more info becomes available comment and we can reopen. |
What version of gRPC-Java are you using?
1.35.0
What is your environment?
Client: Windows 10/ JDK 8
Server: k8s 1.14.8 cluster / tensorflow serving 2.0 / ingress-nginx 0.19, nginx version 1.15.3
What did you expect to see?
When continuously requests a grpc server (tensorflow serving) behind nginx with for loop, the nginx is reloading. nginx fork new sub process to handle new request, the prior sub process exit util the current request finish, then client stub will create a new channel/connection to process remain requests.
For example, When python client sends a request to the grpc server, nginx will produce a
FIN/PSH/ACK
flag packet before sub process exited. Then the client will sendFIN/ACK
to nginx and close the request. Finally, python grpc client will reconnect to nginx to handle the remaining requests.What did you see instead?
When nginx reloaded, the prior sub process log 200 OK to nginx access_log, while the grpc-java client threw an exception below:
nginx sent
RST
flag packet to java client and has noFIN
flag packet when use wireshark captured the tcp packet.Steps to reproduce the bug
The text was updated successfully, but these errors were encountered: