-
Notifications
You must be signed in to change notification settings - Fork 3
[Nicky] [Sarama] Added several new metrics to see breakdown of Sarama latency numbers #6
[Nicky] [Sarama] Added several new metrics to see breakdown of Sarama latency numbers #6
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. Have you tried it on staging yet? These metrics are quite high cardinality so you need to add them to the allow list to emit them: https://github.com/DataDog/dd-go/blob/69559df965694b0aa46f3a98650253b56124c265/k8s/nicky/values/datacenter/us1.staging.dog.yaml#L92-L96
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@@ -1520,6 +1545,42 @@ func (b *Broker) updateRequestLatencyAndInFlightMetrics(requestLatency time.Dura | |||
b.addRequestInFlightMetrics(-1) | |||
} | |||
|
|||
func (b *Broker) updateConnLockWaitTimeMetric(waitTime time.Duration) { | |||
waitTimeInMs := int64(waitTime / time.Millisecond) | |||
b.connLockWaitTime.Update(waitTimeInMs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can use https://pkg.go.dev/time#Duration.Milliseconds to get ms
connLockWaitTime metrics.Histogram | ||
brokerConnLockWaitTime metrics.Histogram |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like these pairs track the same metric, for my education, why do we need both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc, the first one is global to the sarama producer while the second is on a per-broker basis so we can get both global and per-broker metrics. It is modeled after the standard sarama metrics just above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the metric for broker is changed by appending "-for-broker" at the end of the metric name when we register it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks Remi 👍
Sarama has a relatively high fan-out factors from the initial singleton input channel to per topic channels to per partition channels to per broker channels, at which point it grabs a physical connection to a destination Kafka broker to write the outgoing request to Kafka. This PR added a metric to measure the wait time for getting write access to the underlying connection to Kafka.
Additionally, I added several other metrics to see the components of the total request + response latency that Sarama sees for each request.