Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request received RST when nginx reload #8122

Closed
levinxo opened this issue Apr 28, 2021 · 8 comments
Closed

Request received RST when nginx reload #8122

levinxo opened this issue Apr 28, 2021 · 8 comments

Comments

@levinxo
Copy link

levinxo commented Apr 28, 2021

What version of gRPC-Java are you using?

1.35.0

What is your environment?

Client: Windows 10/ JDK 8
Server: k8s 1.14.8 cluster / tensorflow serving 2.0 / ingress-nginx 0.19, nginx version 1.15.3

What did you expect to see?

When continuously requests a grpc server (tensorflow serving) behind nginx with for loop, the nginx is reloading. nginx fork new sub process to handle new request, the prior sub process exit util the current request finish, then client stub will create a new channel/connection to process remain requests.

For example, When python client sends a request to the grpc server, nginx will produce a FIN/PSH/ACK flag packet before sub process exited. Then the client will send FIN/ACK to nginx and close the request. Finally, python grpc client will reconnect to nginx to handle the remaining requests.

What did you see instead?

When nginx reloaded, the prior sub process log 200 OK to nginx access_log, while the grpc-java client threw an exception below:

Sending GOAWAY failed: lastStreamId '0', errorCode '2', debugData ''. Forcing shutdown of the connection.

UNAVAILABLE: Connection closed after GOAWAY. HTTP/2 error code: NO_ERROR

nginx sent RST flag packet to java client and has no FIN flag packet when use wireshark captured the tcp packet.

Steps to reproduce the bug

  1. use for loop to request a grpc server continuously (like tensorflow serving) behind nginx
  2. reloading nginx at the same time
@lidizheng
Copy link
Contributor

@levinxo Hi, cross languages, gRPC doesn't support transparent reconnect for in-fly RPCs. If a subchannel (TCP connection) failed, all RPC scheduled on it will fail too. This prevents sending duplicated messages, and makes routing logic simpler.

The client channel should be able to reconnect after certain backoff period. If you turn on the trace log for the Java client, you should find more information about the channel connectivity states and reconnect attempts.

@voidzcy
Copy link
Contributor

voidzcy commented Apr 28, 2021

The exception just means the remote closed the connection before the RPC is completed, UNAVAILABLE is appropriate. To avoid it, either the RPC needs to complete quicker or Ngnix needs to give more time before tearing down the connection.

When python client sends a request to the grpc server, nginx will produce a FIN/PSH/ACK flag packet before sub process exited. Then the client will send FIN/ACK to nginx and close the request. Finally, python grpc client will reconnect to nginx to handle the remaining requests.
nginx sent RST flag packet to java client and has no FIN flag packet

I don't understand what's being described here, could you rephrase it? gRPC clients (including both Java and Python/C-core) do not immediately close the connection after receiving GOAWAY even if there is no ongoing RPC, until the channel is shut down. It could be a desired behavior, but it's not happening today in all languages.

@levinxo
Copy link
Author

levinxo commented Apr 29, 2021

@lidizheng @voidzcy hi,我猜两位是华裔,就直接说中文了哈。

What did you expect to see?

使用grpc-java建立连接后持续串行请求后端服务(nginx做负载均衡,后端服务是tf serving),当nginx进行reload操作后(nginx -s reload),旧的nginx进程处理完当前请求后会退出,这时我期望新的连接会自动重新建立到nginx新的进程上。

关于上面的预期,在python的grpc client上得到了验证,使用python的客户端持续发送请求,此时对nginx操作reload,nginx在处理完当前的请求后,会发送一个 FIN/PSH/ACK 标志的包给python客户端。python客户端进行ACK确认后关闭tcp连接,nginx会退出当前的旧进程。最后python客户端会重新和nginx建立新的连接处理后续的请求。这整个过程中,python的grpc客户端是能正常处理所有请求的。

What did you see instead?

在grpc-java这边,我看到的情况如下。使用grpc-java持续发送请求,当nginx进行reload时,grpc-java会抛出异常:

Sending GOAWAY failed: lastStreamId '0', errorCode '2', debugData ''. Forcing shutdown of the connection.

UNAVAILABLE: Connection closed after GOAWAY. HTTP/2 error code: NO_ERROR

此时发现nginx发送了 RST 包给 grpc-java,不像上面python客户端会先发送FIN包来正常结束当前请求。而且该抛异常的请求在nginx的access日志中,是200 OK

@voidzcy
Copy link
Contributor

voidzcy commented May 3, 2021

Please post in English so that my other colleagues can read and provide insights.

Did you see Java client being able to auto reconnect after the reload? That should happen properly.

Sending GOAWAY failed: lastStreamId '0', errorCode '2', debugData ''. Forcing shutdown of the connection.

Is this Nginx's log? It doesn't seem to be gRPC's. Is that supposed to be the second GOAWAY sent by Nginx graceful shutdown? Looks Nginx closed the connection right after sending out the first GOAWAY. Therefore, the unfinished RPC failed with UNAVAILABLE: Connection closed after GOAWAY. HTTP/2 error code: NO_ERROR. Nothing seems wrong to me.

此时发现nginx发送了 RST 包给 grpc-java,不像上面python客户端会先发送FIN包来正常结束当前请求

I am still confused by this. Isn't the TCP FIN sent by Nginx as you described above? Why are you saying grpc-python sends FIN?

Btw, I've seen a grpc-python user reporting the opposite behavior in grpc/grpc#24069

@levinxo
Copy link
Author

levinxo commented May 12, 2021

Hi, I will post in english in the future.

Sending GOAWAY failed: lastStreamId '0', errorCode '2', debugData ''. Forcing shutdown of the connection.

it is netty-codec-http2 package's error:
https://github.com/netty/netty/blob/4.1/codec-http2/src/main/java/io/netty/handler/codec/http2/Http2ConnectionHandler.java#L928

It seems to be netty's exception.

I am still confused by this. Isn't the TCP FIN sent by Nginx as you described above? Why are you saying grpc-python sends FIN?

grpc-python sends FIN after Nginx sent FIN, but grpc-java receive RST by Nginx, has no FIN received before or after RST.

how to turn on the trace log for the Java gRPC client? I want to check if Java client being able to auto reconnect after the nginx reload.

@voidzcy
Copy link
Contributor

voidzcy commented May 14, 2021

One thing I need to mention for Java is that, if the connection has been previously established before receiving GOAWAY, the channel will go into IDLE state instead of eagerly trying to reconnect. It will reconnect if there are new/pending RPCs.

You can enable the ChannelLogger with io.grpc.ChannelLogger.level=FINEST in logging.properties to see connection activities.

@ejona86
Copy link
Member

ejona86 commented Jun 17, 2021

If nginx uses RST instead of FIN, then it might not have finished sending the response. It might have done the write() to the OS, but the OS might never have sent it. That could be why grpc-java thinks the RPC failed: because it did; the response never arrived.

Since the server normally initiates the FIN, that leaves me wondering why it uses RST for grpc-java. What client behavior could change that behavior? I think this mostly needs an investigation of nginx's behavior.

@ejona86
Copy link
Member

ejona86 commented Jun 24, 2021

Closing since it really seams to be more of a question about nginx behavior. If there's something specific the grpc-java implementation is doing "wrong" that triggers the different behavior in nginx, we'd be interested to hear what it is. But right now there doesn't seem to be anything actionable for us to do here.

Closing, but if more info becomes available comment and we can reopen.

@ejona86 ejona86 closed this as completed Jun 24, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants