Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Distributed Tracing using OpenTelemetry #12460

Open
dashpole opened this issue Nov 6, 2020 · 17 comments
Open

Add Distributed Tracing using OpenTelemetry #12460

dashpole opened this issue Nov 6, 2020 · 17 comments
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. stage/tracked
Milestone

Comments

@dashpole
Copy link

dashpole commented Nov 6, 2020

Prior issues/PRs related to distributed tracing: #11166, #9242, #5425.

In the kubernetes instrumentation sig, we are adding working to add Distributed Tracing support using OpenTelemetry to the kube-apiserver: kubernetes/enhancements/keps/sig-instrumentation/647-apiserver-tracing. We plan to have Alpha (disabled by default) support for this feature in kubernetes v1.22. Progress can be tracked in kubernetes/enhancements#647. This will generate spans for incoming and outgoing requests, including requests to etcd. It will also propagate the w3c trace context and w3c baggage along with outgoing requests.

This presents an opportunity to make Etcd easier to debug when using kubernetes. Since distributed tracing provides a unified view of a single request across multiple components, a trace including the API Server and Etcd would show both the API call to the APIServer and the resulting Etcd call, providing better context on the call to Etcd. Based on #5425 (comment), there is already interest in distributed tracing, so I won't elaborate further.

Previous attempts at distributed tracing were looking into OpenTracing, OpenCensus, and jaeger formats and libraries. Thankfully, these have since unified behind the OpenTelemetry project, which makes a potential evaluation considerably easier. OpenTelemetry will be GA soon, so now seems like the right time to open a discussion around adding distributed tracing to Etcd.

ccing people who have been involved with or interested in this topic.
@logicalhan @tedsuo @ehashman @brancz @bg451 @bhs @gyuho @jingyih @serathius

@logicalhan
Copy link

/cc @ptabor @wenjiaswe

@hexfusion
Copy link
Contributor

/cc @ironcladlou @retroflexer @marun

@jingyih
Copy link
Contributor

jingyih commented Nov 9, 2020

Today in etcd, tracing is implemented in a similar fashion as in kubernetes (https://github.com/kubernetes/utils/tree/master/trace).

/cc original author for etcd tracing: @YoyinZyc

@stale
Copy link

stale bot commented Mar 9, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Mar 9, 2021
@ptabor ptabor added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed stale labels Mar 9, 2021
@lilic
Copy link
Contributor

lilic commented Apr 22, 2021

Hey, @dashpole 👋 I am always on board when it comes to adding more observability signals into any project and tracing is something that is very useful! With that being said, I have two concerns with adding OpenTelemetry:

  1. The possible resource usage increase, was this resolved already? What would be the resource or performance addition if OpenTel tracing is added?
  2. OpenTelemetry format is not GA yet according to the go client lib https://github.com/open-telemetry/opentelemetry-go#project-status. I know a couple of pieces went GA, do we know when the format goes GA?

Warning: this project is currently in a pre-GA phase. Backwards incompatible changes may be introduced in subsequent minor version releases as we work to track the evolving OpenTelemetry specification and user feedback.

Thank you!

@dashpole
Copy link
Author

dashpole commented Apr 22, 2021

For latency of creating a span, the numbers are currently within 0.01 milliseconds:

$ go test ./... -bench .
goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/trace
cpu: Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
BenchmarkAttributesMapToKeyValue-6     	  237327	      5009 ns/op
BenchmarkStartEndSpan/AlwaysSample-6   	  883227	      1757 ns/op	     848 B/op	       9 allocs/op
BenchmarkStartEndSpan/NeverSample-6    	  738933	      1484 ns/op	     752 B/op	       8 allocs/op
BenchmarkSpanWithAttributes_4/AlwaysSample-6         	  429614	      2916 ns/op	    1584 B/op	      17 allocs/op
BenchmarkSpanWithAttributes_4/NeverSample-6          	  794530	      1721 ns/op	     944 B/op	       9 allocs/op
BenchmarkSpanWithAttributes_8/AlwaysSample-6         	  296242	      4191 ns/op	    2112 B/op	      23 allocs/op
BenchmarkSpanWithAttributes_8/NeverSample-6          	  773558	      1877 ns/op	    1136 B/op	       9 allocs/op
BenchmarkSpanWithAttributes_all/AlwaysSample-6       	  333826	      3792 ns/op	    1936 B/op	      21 allocs/op
BenchmarkSpanWithAttributes_all/NeverSample-6        	  689450	      1785 ns/op	    1072 B/op	       9 allocs/op
BenchmarkSpanWithAttributes_all_2x/AlwaysSample-6    	  183164	      6435 ns/op	    3236 B/op	      32 allocs/op
BenchmarkSpanWithAttributes_all_2x/NeverSample-6     	  612986	      2073 ns/op	    1392 B/op	       9 allocs/op
BenchmarkTraceID_DotString-6                         	 8888558	       145.3 ns/op
BenchmarkSpanID_DotString-6                          	11427596	       111.4 ns/op
PASS
ok  	go.opentelemetry.io/otel/sdk/trace	17.503s

I'll ask about memory usage and get back. I recall there being a small, fixed-size buffer, but i'll confirm.

The first RC is expected to be cut in 2-3 weeks, and 1.0 for the go client is a prerequisite for beta in kubernetes. I'd expect a 1.0 in 2 months or so, to be conservative. But it's OSS, so anything can happen.

@lilic
Copy link
Contributor

lilic commented Apr 27, 2021

@dashpole Thank you for that! I think the value of having one trace context propagated from apiserver to etcd would be amazing for insight!

But code freeze is mid-May for etcd #12330 (comment), so I am wondering if it's possible we even will be able to make that deadline, given the stability of the OpenTel 1.0 go client to be 2 weeks or so. Happy to help out drive this along!

@logicalhan
Copy link

@dashpole Thank you for that! I think the value of having one trace context propagated from apiserver to etcd would be amazing for insight!

I suspect a lot of requests will be served directly from the watch-cache, so we probably shouldn't be expecting all requests to trace back to etcd. Paginated lists will, watches won't and I can't remember if we are serving strongly consistent reads from the cache (there was a KEP proposing it at one point).

@logicalhan
Copy link

I think we should definitely still do it, I'm just trying to temper expectations is all..

@dashpole
Copy link
Author

Following-up from the earlier comment, there isn't a fixed-size buffer. It looks like this is a case where we will just have to try it to find out how much cpu/memory it takes...

The timing would probably be too tight to make it into the next etcd release.

@lilic
Copy link
Contributor

lilic commented Apr 27, 2021

@logicalhan makes sense thank you for that yes!

Following-up from the earlier comment, there isn't a fixed-size buffer. It looks like this is a case where we will just have to try it to find out how much cpu/memory it takes...

@dashpole Thanks! Will try to play around a bit and do a POC locally to see where the cpu/memory is at now, I know it was fairly high in the past, but lets see the current state and I will report back.

@stale
Copy link

stale bot commented Aug 9, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 9, 2021
@stale stale bot closed this as completed Aug 30, 2021
@serathius serathius reopened this Mar 10, 2022
@stale stale bot removed the stale label Mar 10, 2022
@serathius
Copy link
Member

Reopening to track graduation of distributed tracing flags as proposed in #13775

@serathius serathius added this to the etcd-v3.6 milestone Mar 10, 2022
@houseme
Copy link

houseme commented Mar 16, 2022

Can I upgrade to the latest version? open-telemetry/opentelemetry-go#2676

@baryluk
Copy link

baryluk commented Apr 22, 2022

Hi. We had some long apply times in our etcd cluster used by OKD, and enabled tracing to help with debugging this.

I was able to setup it with otelcol and forwarding to tempo and visualizing it in Grafana. I was not able to use jaeger because they do not provide OTEL enabled binaries. (it can be done with otelcol and jaeger exporer plugin from contrib for otelcol, but there are no precompiled binaries for that eitehr).

We checked few traces, and there are few missing elements for it to be really usable.

For RPCs received and sent. Only grpc service and method names are included, but no details about request and response. Only status and uncompressed_sizes info.

Here are some events as logged by otelcol with logging exporter:

Span #282
    Trace ID       : 32a380edb1ccd49e3b4fa8f3fbd329d2
    Parent ID      : 
    ID             : 5d6d8aa231ff85aa
    Name           : etcdserverpb.KV/Range
    Kind           : SPAN_KIND_SERVER
    Start time     : 2022-04-22 13:45:36.11019019 +0000 UTC
    End time       : 2022-04-22 13:45:36.122102025 +0000 UTC
    Status code    : STATUS_CODE_OK
    Status message : 
Attributes:
     -> rpc.system: STRING(grpc)
     -> rpc.service: STRING(etcdserverpb.KV)
     -> rpc.method: STRING(Range)
     -> net.peer.ip: STRING(10.10.0.183)
     -> net.peer.port: STRING(55550)
     -> rpc.grpc.status_code: INT(0)
Events:
SpanEvent #0
     -> Name: message
     -> Timestamp: 2022-04-22 13:45:36.110201481 +0000 UTC
     -> DroppedAttributesCount: 0
     -> Attributes:
         -> message.type: STRING(RECEIVED)
         -> message.id: INT(1)
         -> message.uncompressed_size: INT(56)
SpanEvent #1
     -> Name: message
     -> Timestamp: 2022-04-22 13:45:36.122097256 +0000 UTC
     -> DroppedAttributesCount: 0
     -> Attributes:
         -> message.type: STRING(SENT)
         -> message.id: INT(1)
         -> message.uncompressed_size: INT(19289955)

I believe it would be good to include more grpc data that in spans. For large requests, truncate some big fields if needed (i.e. body content on create / get response / put request, or list of all keys when doing a list response), but other fields should be included (most notably the key in get / put / watch requests).

I checked https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc and there is no transparent way of doing this automatically as far as I can see, but maybe adding few manual attributes (i.e. key / key range) in etcd start of the method implementation, could do a trick.

It also looks that only grpc public api server is intercepted and instrumented, but grpc clients, and the internal (between replicas) grpc server is not instrumented, again reducing usability, especially for things like request forwarding from non-master to master (leader), and proposals sending (i.e. for Put and Txn) to other cluster members.

@jmhbnz
Copy link
Member

jmhbnz commented Jun 20, 2024

Discussed during sig-etcd triage meeting. This is potentially a candidate for graduation via the new design for etcd server feature flags in future.

We could still add an e2e test, assigned to @vivekpatani
/assign @vivekpatani

@k8s-ci-robot
Copy link

@jmhbnz: GitHub didn't allow me to assign the following users: vivekpatani.

Note that only etcd-io members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

Discussed during sig-etcd triage meeting. This is potentially a candidate for graduation via the new design for etcd server feature flags in future.

We could still add an e2e test, assigned to @vivekpatani
/assign @vivekpatani

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. stage/tracked
Development

No branches or pull requests