Skip to content
This repository has been archived by the owner on Feb 17, 2025. It is now read-only.

[Nicky] [Sarama] Added several new metrics to see breakdown of Sarama latency numbers #6

Merged

Conversation

golam-kawsar
Copy link

@golam-kawsar golam-kawsar commented Jun 8, 2023

Sarama has a relatively high fan-out factors from the initial singleton input channel to per topic channels to per partition channels to per broker channels, at which point it grabs a physical connection to a destination Kafka broker to write the outgoing request to Kafka. This PR added a metric to measure the wait time for getting write access to the underlying connection to Kafka.

Additionally, I added several other metrics to see the components of the total request + response latency that Sarama sees for each request.

@golam-kawsar golam-kawsar requested review from niamster, lliissoonngg and a team June 8, 2023 15:05
@golam-kawsar golam-kawsar changed the title [Nicky] [Sarama] Added a new metric to track connection wait time [Nicky] [Sarama] Added several new metrics to see breakdown of Sarama latency numbers Jun 12, 2023
@golam-kawsar golam-kawsar requested a review from a team June 12, 2023 18:20
Copy link

@remicalixte remicalixte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. Have you tried it on staging yet? These metrics are quite high cardinality so you need to add them to the allow list to emit them: https://github.com/DataDog/dd-go/blob/69559df965694b0aa46f3a98650253b56124c265/k8s/nicky/values/datacenter/us1.staging.dog.yaml#L92-L96

Copy link

@niamster niamster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@@ -1520,6 +1545,42 @@ func (b *Broker) updateRequestLatencyAndInFlightMetrics(requestLatency time.Dura
b.addRequestInFlightMetrics(-1)
}

func (b *Broker) updateConnLockWaitTimeMetric(waitTime time.Duration) {
waitTimeInMs := int64(waitTime / time.Millisecond)
b.connLockWaitTime.Update(waitTimeInMs)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +55 to +56
connLockWaitTime metrics.Histogram
brokerConnLockWaitTime metrics.Histogram

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like these pairs track the same metric, for my education, why do we need both?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, the first one is global to the sarama producer while the second is on a per-broker basis so we can get both global and per-broker metrics. It is modeled after the standard sarama metrics just above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the metric for broker is changed by appending "-for-broker" at the end of the metric name when we register it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks Remi 👍

@golam-kawsar golam-kawsar merged commit f75a462 into dm/v1.30.1-patched Jun 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants