Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClientV3: Add integration test for server timeout #10211

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .words
Original file line number Diff line number Diff line change
Expand Up @@ -104,3 +104,4 @@ PermitWithoutStream
__lostleader
ErrConnClosing
unfreed
ServerParameters
61 changes: 61 additions & 0 deletions clientv3/integration/black_hole_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,69 @@ import (
"go.etcd.io/etcd/integration"
"go.etcd.io/etcd/pkg/testutil"
"google.golang.org/grpc"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
)

// TestServerGRPCKeepAliveTimeout tests ServerParameters keepalive ping on server-side that
// after having pinged for keepalive check, the server waits for a duration of Timeout and
// if no activity is seen even after that the connection is closed on server-side. The keepalive
// takes few extra seconds to get in impact so typically the time it takes to close the connection
// is somewhat higher than specified Timeout value.
func TestServerGRPCKeepAliveTimeout(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also document what this is testing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that sounds good.

defer testutil.AfterTest(t)

clus := integration.NewClusterV3(t, &integration.ClusterConfig{
Size: 1,
GRPCKeepAliveInterval: 1 * time.Second,
GRPCKeepAliveTimeout: 500 * time.Millisecond,
})
defer clus.Terminate(t)

eps := []string{clus.Members[0].GRPCAddr()}
ccfg := clientv3.Config{
Endpoints: []string{eps[0]},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just Endpoints: []string{clus.Members[0].GRPCAddr()}?

DialTimeout: 12 * time.Second,
}
cli, err := clientv3.New(ccfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need cli in this test, giving that NewClusterV3() already created clients?

Copy link
Member Author

@spzala spzala Oct 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the client is created here. I believe we need this for the desired config. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was not able to understand. Do you mean we need this for the desired config for the cluster? My understanding is that the keepAlive parameters were configured on server side. And clus.clients are configured inside NewClusterV3.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what my understanding is - to override defaults used in NewClusterV3. In this particular case, it's just the endpoint though and it's more readable this way. Hope makes sense but I am open for suggestion. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the endpoints of clus.clients are configured in NewClusterV3:
https://github.com/etcd-io/etcd/blob/master/integration/cluster.go#L740

It is not clear to me what override means. cli and clus.client(1) are two different ClientV3 instances.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I tried to fix your concern in the new commit. Thanks!

if err != nil {
t.Fatal(err)
}
defer cli.Close()

// give keepalive some time
time.Sleep(4 * time.Second)

if _, err = cli.Put(context.TODO(), "foo", "bar"); err != nil {
t.Fatal(err)
}
// TODO: keepalive sometimes doesn't work on first attempt.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gyuho hi, somewhat inspired by this https://github.com/etcd-io/etcd/blob/master/clientv3/integration/black_hole_test.go#L197 I have a for loop added here. Also, the info log should help understand how often the test iterates. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... when I run test locally it usually pass in a first attempt and sometime more for the third run of test but here it's failing again :( (Outside tests, it runs fine by manually dropping client connection as well.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gyuho so this time it did pass build. Seems like 1 cpu is always fine but 2 and 4 doesn't go in first attempt (on my local env it actually does pass mostly). Can we keep this solution and watch it (I will be watching) to see behavior? Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gyuho I guess we need a better approach. Thinking but if you have a suggestion please let me know. Thanks!!

for i := 0; i < 5; i++ {
clus.Members[0].Blackhole()
time.Sleep(10 * time.Second)
// remove blackhole but by now the keepalive ping should have triggered server to
// close server-to-client connection.
clus.Members[0].Unblackhole()
_, err = cli.Put(context.TODO(), "foo1", "bar1")
if err != nil {
ev, ok := status.FromError(err)
if !ok {
t.Fatal(err)
}
if ev.Code() != codes.Unavailable {
t.Fatal(err)
}
break
} else {
t.Logf("info: expected an error, received none.")
Copy link
Contributor

@gyuho gyuho Dec 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just t.Log("expected an error from server, but received none")

}
}

if err == nil {
t.Error("rpc error expected")
}
}

// TestBalancerUnderBlackholeKeepAliveWatch tests when watch discovers it cannot talk to
// blackholed endpoint, client balancer switches to healthy one.
// TODO: test server-to-client keepalive ping
Expand Down