-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow writes panic when run with github.com/golang/protobuf v1.4.0+ #12197
Comments
It is getting harder to avoid pulling in github.com/golang/[email protected]+ version via transitive dependencies, but doing so makes the integration etcd server flake/break under load |
cc @ptabor |
@liggitt |
I added this patch to figure out what was triggering the v2 call: diff --git a/vendor/go.etcd.io/etcd/etcdserver/server.go b/vendor/go.etcd.io/etcd/etcdserver/server.go
index a341625dccb..21b8090030c 100644
--- a/vendor/go.etcd.io/etcd/etcdserver/server.go
+++ b/vendor/go.etcd.io/etcd/etcdserver/server.go
@@ -2196,11 +2196,13 @@ func (s *EtcdServer) applyEntryNormal(e *raftpb.Entry) {
var r pb.Request
rp := &r
pbutil.MustUnmarshal(rp, e.Data)
+ panic("!pbutil.MaybeUnmarshal: " + rp.Method + " " + rp.Path)
s.w.Trigger(r.ID, s.applyV2Request((*RequestV2)(rp)))
return
}
if raftReq.V2 != nil {
req := (*RequestV2)(raftReq.V2)
+ panic("raftReq.V2 != nil: " + req.Method + " " + req.Path)
s.w.Trigger(req.ID, s.applyV2Request(req))
return
}
diff --git a/vendor/go.etcd.io/etcd/etcdserver/v2_server.go b/vendor/go.etcd.io/etcd/etcdserver/v2_server.go
index 9238b2dc580..23c8ef2106f 100644
--- a/vendor/go.etcd.io/etcd/etcdserver/v2_server.go
+++ b/vendor/go.etcd.io/etcd/etcdserver/v2_server.go
@@ -119,6 +119,7 @@ func (a *reqV2HandlerEtcdServer) processRaftRequest(ctx context.Context, r *Requ
}
func (s *EtcdServer) Do(ctx context.Context, r pb.Request) (Response, error) {
+ panic("EtcdServer#Do:" + r.Method + " " + r.Path)
r.ID = s.reqIDGen.Next()
h := &reqV2HandlerEtcdServer{
reqV2HandlerStore: reqV2HandlerStore{
@@ -139,6 +140,7 @@ func (s *EtcdServer) Do(ctx context.Context, r pb.Request) (Response, error) {
// respective operation. Do will block until an action is performed or there is
// an error.
func (r *RequestV2) Handle(ctx context.Context, v2api RequestV2Handler) (Response, error) {
+ panic("RequestV2#Handle:" + r.Method + " " + r.Path)
if r.Method == "GET" && r.Quorum {
r.Method = "QGET"
} I'm seeing this panic:
which looks like it is this code release-3.4 version: Lines 2002 to 2031 in 18dfb9c
master version: Lines 1832 to 1858 in f395f82
|
Sorry I lost track of this issue. And thanks for fixing it! |
@jingyih @liggitt I strongly suspect this issue is caused by incompatibility between In etcd/etcdserver/etcdserverpb/etcdserver.pb.go Line 124 in cfdc296
github.com/golang/protobuf was used for serialisation:
Furthermore, If I change the reproducer @liggitt provided to the following: package main
import (
"fmt"
"github.com/gogo/protobuf/proto"
etcdpb "go.etcd.io/etcd/etcdserver/etcdserverpb"
)
func main() {
request := &etcdpb.Request{}
fmt.Println(proto.CompactTextString(request))
} go.mod
It works well:
|
Etcd was using very old protobuf generator (gogo/protobuf-1.0.0). |
I am seeing this in the following environment:
Should I still be seeing this issue if I'm on 3.3.25, which includes the aforementioned workaround? |
This is definitely consequence of mixing gogo-generated protobuf's with protobuf-1.4 runtime, that is unsupported (and not working). We need to somehow exit that glitch.
From the data I collected so far, significant incremental improvement in google-protobuf implementation was done over time, so the gogo vs. google difference is not that significant. Kubernetes team is going to do some benchmarking there (kubernetes/kubernetes#96564). From quick scan of RAFT code, I think there could be a performance improvement from transering to google-proto-gen, as it would enforce us to change a lot of pass-proto-by-value into pass-proto-by-pointer. Don't know to what degree golang optimizer can deal with that, but there is an opportunity. The real challenge for transition gogo->google comes from coordinating the operation across many projects/libraries:
So every library would need to transfer to 1.3.x-google generated protobuf first (without taking benefits of transferring to 1.4 in foreseeable future) and only when all dependencies are ready on 1.3.x-google, the runtime of 1.4.x can be imported (so e.g. grpc 1.30+) The API generated by gogo vs. Google is slightly different and requites pretty boring pass over code and adding &, * and changing proto.Foo into proto.GetFoo():
Assuming lack of upstream support the exit work would need to be following:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
The protobuf runtime 1.5.1. (instead of 1.4.x) solves the problem. |
When the etcd server integration test library is used with github.com/golang/protobuf v1.4.0+, slow writes hit this path:
etcd/etcdserver/apply_v2.go
Line 112 in 1af6d61
stringifying the request:
etcd/etcdserver/util.go
Line 163 in 1af6d61
which calls to
etcd/etcdserver/etcdserverpb/etcdserver.pb.go
Line 162 in 1af6d61
With github.com/golang/protobuf < 1.4.0, this works fine.
With github.com/golang/protobuf >= 1.4.0, this panics with:
Reproducer:
go.mod:
main.go:
The text was updated successfully, but these errors were encountered: