Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc keepalive: test server-to-client HTTP/2 pings #8645

Closed
gyuho opened this issue Oct 4, 2017 · 14 comments
Closed

grpc keepalive: test server-to-client HTTP/2 pings #8645

gyuho opened this issue Oct 4, 2017 · 14 comments
Assignees
Milestone

Comments

@gyuho
Copy link
Contributor

gyuho commented Oct 4, 2017

Need add tests around #8535.

@xiang90 xiang90 added this to the v3.4.0 milestone Nov 26, 2017
@xiang90
Copy link
Contributor

xiang90 commented Dec 15, 2017

Can you fill in more details here?

maybe @spzala can give this a try.

@gyuho
Copy link
Contributor Author

gyuho commented Dec 15, 2017

Similar to

https://github.com/coreos/etcd/blob/9deaee3ea1b1f0c4119aab865eceff38eb5d5ade/clientv3/integration/black_hole_test.go#L33-L49

By configuring server-side keepalive time parameters in https://godoc.org/google.golang.org/grpc/keepalive#ServerParameters, we want to test (either manually or write integration tests):

  1. server pings client after ServerParameters.Time
  2. client had no activity during ServerParameters.Timeout
  3. server closes the connection to the client

@spzala
Copy link
Member

spzala commented Jan 7, 2018

WIP.

@spzala spzala self-assigned this Jan 7, 2018
@spzala spzala added the WIP label Jan 10, 2018
@gyuho
Copy link
Contributor Author

gyuho commented Jan 12, 2018

@spzala We can test this manually first.

I would try

  1. Set up --grpc-keepalive-min-time, --grpc-keepalive-interval, --grpc-keepalive-timeout to etcd server (ref *: configure server keepalive #8535 and gRPC doc).
  2. Since this is server-to-client HTTP/2 ping, we should disable client-to-server ping and discard incoming packets from server in client-side (iptables, tc).
  3. Server closes the connection to the client (confirm by just looking at the logs maybe?)
  4. Client comes back (blackhole removed) but connection closed so cannot talk to server.

Then translate this to integration tests with wrapper (https://github.com/coreos/etcd/blob/master/clientv3/integration/black_hole_test.go or later #9081).

@spzala
Copy link
Member

spzala commented Jan 12, 2018

@gyuho thanks much!!

@spzala
Copy link
Member

spzala commented Aug 30, 2018

@gyuho hi, I am trying to see the behavior with setting --grpc-keepalive-min-time and --grpc-keepalive-interval, --grpc-keepalive-timeout following steps you suggested above but I guess I am probably not doing it right and not able to see errors like too many ping or even server closing the connection. I have a single node cluster, and I tried testing with using simple go program from another machine (i.e. client pings server every few seconds (set higher than timeout or keep alive interval) I sent you earlier but without any luck of reproducing this behavior. Appreciates any help from you to go forward with it:) Thanks!

@gyuho
Copy link
Contributor Author

gyuho commented Aug 31, 2018

@spzala

probably not doing it right and not able to see errors like too many ping or even server closing the connection.

Have you dropped packets? We want to simulate faulty networks with iptables.

@spzala
Copy link
Member

spzala commented Aug 31, 2018

@gyuho hi, thanks, so I run this on client machine iptables -A INPUT -s <serverip> -j DROP and I see that stopped server message and then when I unblock it, the connection was back and started receiving messages. But no error message or disconnect from server.

@gyuho
Copy link
Contributor Author

gyuho commented Sep 6, 2018

We expect

Server closes the connection to the client (confirm by just looking at the logs maybe?)

You may tune gRPC keepalive timeout in server-side. The disconnect may have been too short for server-side keepalive to kick-in.

@gyuho
Copy link
Contributor Author

gyuho commented Sep 6, 2018

I would also add more debugging lines or adjust log levels in gRPC side. Server may not display all logs.

@spzala
Copy link
Member

spzala commented Oct 18, 2018

@gyuho hi, I am getting back from some vacation and work travel :). I could run tests manually and I think we should try adding two integration test - one for MinTime (i.e. goaway too many pings error) and second for Timeout.
While testing manually, Timeout works as expected to me with actual closing of connection. With MinTime I see couple of things:

  1. I could only see the log messages if I set up env variables export GRPC_GO_LOG_VERBOSITY_LEVEL=2 and export GRPC_GO_LOG_SEVERITY_LEVEL=info on both server and client CLI. I also have to comment this line
    grpclog.SetLoggerV2(lg)
    So to rely on log messages, we need to think more here.
  2. I see following log messages after enabling logging as step one above,
    Server side: ERROR: 2018/10/15 22:35:18 transport: Got too many pings from the client, closing the connection.
    Cline side: INFO: 2018/10/115 22:35:18 Client received GoAway with http2.ErrCodeEnhanceYourCalm.

I will be working on creating the integration test but have another related question, you mentioned to use blackhole - can I use the Blackhole() or something similar on client (clientv3.New(ccfg))?, something similar to

clus.Members[1].Blackhole()
Per what I see is the method is used for cluster members. Thanks!

@gyuho
Copy link
Contributor Author

gyuho commented Oct 18, 2018

@spzala Tests would be great. Thanks a lot!

@spzala
Copy link
Member

spzala commented Oct 18, 2018

@gyuho thanks, and qq can you please help me understand how you are thinking on using blackhole on client side?. Any thoughts would be helpful. Thanks!

spzala added a commit to spzala/etcd that referenced this issue Oct 24, 2018
spzala added a commit to spzala/etcd that referenced this issue Oct 24, 2018
spzala added a commit to spzala/etcd that referenced this issue Oct 25, 2018
spzala added a commit to spzala/etcd that referenced this issue Oct 25, 2018
spzala added a commit to spzala/etcd that referenced this issue Nov 6, 2018
spzala added a commit to spzala/etcd that referenced this issue Nov 6, 2018
spzala added a commit to spzala/etcd that referenced this issue Nov 25, 2018
spzala added a commit to spzala/etcd that referenced this issue Nov 25, 2018
spzala added a commit to spzala/etcd that referenced this issue Nov 29, 2018
spzala added a commit to spzala/etcd that referenced this issue Nov 30, 2018
spzala added a commit to spzala/etcd that referenced this issue Nov 30, 2018
spzala added a commit to spzala/etcd that referenced this issue Nov 30, 2018
spzala added a commit to spzala/etcd that referenced this issue Nov 30, 2018
spzala added a commit to spzala/etcd that referenced this issue Nov 30, 2018
spzala added a commit to spzala/etcd that referenced this issue Nov 30, 2018
spzala added a commit to spzala/etcd that referenced this issue Dec 1, 2018
@gyuho gyuho modified the milestones: etcd-v3.4, etcd-v3.5 Aug 5, 2019
@stale
Copy link

stale bot commented Apr 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 6, 2020
@stale stale bot closed this as completed May 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants