Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gap in Kafka metrics instrumentation #711

Closed
meiao opened this issue Feb 18, 2022 · 6 comments
Closed

Gap in Kafka metrics instrumentation #711

meiao opened this issue Feb 18, 2022 · 6 comments
Assignees
Labels
bug Something isn't working as designed/intended GTSE There is an associated support escalation with this issue.

Comments

@meiao
Copy link
Contributor

meiao commented Feb 18, 2022

Description

The instrumentation module com.newrelic.instrumentation.kafka-clients-metrics-2.0.0 instruments KafkaConsumer#poll(Duration). This method is overloaded (KafkaConsumer#poll(long)) and both call a private method.

This means that if a customer is to call the overloaded method, no metrics will be recorded. So we should instrument the private method instead.

Also, in newer versions of Kafka (3?) the private method has changed its signature, so we will probably have to create a new instrumentation module.

While investigating the GTSE it was believed that the instrumentation module kafka-clients-metrics-2.0.0 would apply to Kafka 3, as it is indicated in the build.gradle file. But actually this module is an increment over the kafka-clients-metrics-0.10.0.0 and only instruments the new methods added in Kafka 2.

We confirmed that the customer from the GTSE is using Kafka 3 and so a new instrumentation module is needed.

Reference

https://newrelic.atlassian.net/browse/GTSE-13566

Aha! Link: https://newrelic.aha.io/features/JAVA-385

@meiao meiao added the bug Something isn't working as designed/intended label Feb 18, 2022
@averkhovtsev-xm
Copy link
Contributor

The title of the ticket is a bit misleading. It is not a gap, none of the producer/consumer metrics are being sent to Newrelic.
the kafka-clients-metrics-0.10.0.0 module is not applied to 3+ version of kafka clients, the kafka-clients-metrics-2.0.0 module instruments only 1 method in KafkaConsumer which is not metrics related.
The reason I'm bringing this up is to increase the priority (and hopefully ETA) of this ticket.

@tbradellis
Copy link
Contributor

tbradellis commented Mar 18, 2022

@meiao The private signature looks like it changed in 2.5.0 to use Timer instead of long.

https://github.com/apache/kafka/blob/ff4dff044a22ef0f3095c6f11c38a1a718ad7d13/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L1217

Instead of creating a new module it looks like I could just instrument both public poll methods.
ConsumerRecords<K, V> poll(Duration timeout)
and
ConsumerRecords<K, V> poll(long timeoutMs)

verifyInstrumentation passes open ended for this from 2.0.0.
passesOnly 'org.apache.kafka:kafka-clients:[2.0.0,)'

What do you think of this approach (PR linked below)? It would save having to create a new module I believe.
#792

@meiao
Copy link
Contributor Author

meiao commented Mar 18, 2022

It is not as simple as this.
We weren't aware that this module is supposed to be used in addition to the 0.10.0.0 instrumentation module.

// we are only weaving this method because it was introduced in kafka-clients 2.0.0 and it's easier than copying a
// whole instrumentation module

Which has instrumentation for the poll(long) method.

@averkhovtsev-xm is right, we do not have instrumentation for 3.0.0. Only what the 2.0.0 provides and that is not enough.
What gave the wrong impression was that the 2.0.0 module had an open verifyInstrumentation, so it was indeed applying to 3.0.0 when it should not, since this is not a complete instrumentation module for Kafka.

We do need to create a new instrumentation module for 3.0.0 since 0.10.0.0 has most of the instrumentation and is not applying to 3. And I would prefer that the 2.0.0 module does not apply to 3.0.0, to avoid confusion like this in the future.

@meiao
Copy link
Contributor Author

meiao commented Mar 18, 2022

We also have to go back and find out what was the gap in version 2.0.0 or if it was just the lack of instrumentation for 3.0.0.

Update: it was indeed not a gap as the customer was using Kafka 3.0.0.

@kford-newrelic kford-newrelic added the GTSE There is an associated support escalation with this issue. label Mar 18, 2022
@tbradellis tbradellis removed their assignment Apr 5, 2022
@meiao meiao self-assigned this Apr 15, 2022
@meiao meiao added the 7.8.0 label May 26, 2022
@fcoutinho-uolinc
Copy link

Any news on this? We've been struggling without metrics after our upgrade.

@meiao
Copy link
Contributor Author

meiao commented Jun 23, 2022

This was delivered in the 7.8.0 agent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as designed/intended GTSE There is an associated support escalation with this issue.
Projects
Archived in project
Development

No branches or pull requests

5 participants