Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Failed to publish metrics to OTLP receiver" error message contains no actionable context #4693

Closed
climategadgets opened this issue Feb 2, 2024 · 12 comments · Fixed by #4829

Comments

@climategadgets
Copy link
Contributor

Describe the bug
Under some circumstances, a message Failed to publish metrics to OTLP receiver with an exception will be logged, usually observed with java.net.ConnectException: Connection refused as a root cause.

This message does not contain any information that allows to determine where exactly the meter registry is trying to connect, or any other relevant context, for that matter. Since it is possible that the registry is being automatically instantiated by frameworks (Quarkus, Spring, etc.) and that process is usually not logged, the only way to fix the problem is to guess what configuration item needs to be fixed.

Environment

Environment agnostic, here:

To Reproduce
How to reproduce the bug:

Have the framework instantiate the registry with no configuration present, or wrong configuration

Expected behavior
Error message must contain enough information to determine what needs to be changed to fix the problem.

  • Minimum: Connection to $host:$port refused
  • Preferred: Connection to $host:$port refused, check the value of x.y.z configuration keyword

It is understood that "connection refused" may not be the only root cause here. This doesn't change the problem definition - not enough context information to fix the problem.

@shakuzen shakuzen added enhancement A general enhancement registry: otlp OpenTelemetry Protocol (OTLP) registry-related and removed waiting-for-triage labels Feb 6, 2024
@shakuzen shakuzen added this to the 1.13 backlog milestone Feb 6, 2024
@shakuzen shakuzen added the help wanted An issue that a contributor can help us with label Feb 6, 2024
@shakuzen
Copy link
Member

shakuzen commented Feb 6, 2024

Thank you for the detailed issue. It sounds like a good enhancement to me. Would you be up for making a pull request for it?

@climategadgets
Copy link
Contributor Author

Thank you for the detailed issue. It sounds like a good enhancement to me. Would you be up for making a pull request for it?

Let me see what I can do.

@climategadgets
Copy link
Contributor Author

Below is a quick fix; sure thing it needs test cases but OtlpMeterRegistry is pretty closed up (httpSender, the main source of problems, can't be easily injected) so I wanted to make sure that this goes in the direction you guys approve before investing time in tests.

Please advise.

diff --git a/implementations/micrometer-registry-otlp/src/main/java/io/micrometer/registry/otlp/OtlpMeterRegistry.java b/implementations/micrometer-registry-otlp/src/main/java/io/micrometer/registry/otlp/OtlpMeterRegistry.java
index ef3eb90d1..0131c2565 100644
--- a/implementations/micrometer-registry-otlp/src/main/java/io/micrometer/registry/otlp/OtlpMeterRegistry.java
+++ b/implementations/micrometer-registry-otlp/src/main/java/io/micrometer/registry/otlp/OtlpMeterRegistry.java
@@ -174,16 +174,28 @@ public class OtlpMeterRegistry extends PushMeterRegistry {
                 this.config.headers().forEach(httpRequest::withHeader);
                 HttpSender.Response response = httpRequest.send();
                 if (!response.isSuccessful()) {
-                    logger.warn("Failed to publish metrics. Server responded with HTTP status code {} and body {}",
+                    logger.warn("Failed to publish metrics (context: {}). Server responded with HTTP status code {} and body {}",
+                            getConfigurationContext(),
                             response.code(), response.body());
                 }
             }
             catch (Throwable e) {
-                logger.warn("Failed to publish metrics to OTLP receiver", e);
+                logger.warn("Failed to publish metrics to OTLP receiver (context: {})", getConfigurationContext(), e);
             }
         }
     }
 
+    /**
+     * Get the configuration context.
+     *
+     * @return A message containing enough information for the log reader to figure out what configuration details
+     * may have contributed to the failure.
+     */
+    private String getConfigurationContext() {
+        // While other values may contribute to failures, these two are most common
+        return "url=" + config.url() + ", resource-attributes=" + config.resourceAttributes();
+    }
+
     @Override
     protected <T> Gauge newGauge(Meter.Id id, @Nullable T obj, ToDoubleFunction<T> valueFunction) {
         return new DefaultGauge<>(id, obj, valueFunction);

@shakuzen
Copy link
Member

shakuzen commented Feb 7, 2024

Looks generally good to me. Logging the response body at warn level could be a bit verbose, though.

@climategadgets
Copy link
Contributor Author

I thought I'm logging resource attributes, not response body - am I missing anything?

@climategadgets
Copy link
Contributor Author

Meanwhile, another diff. sayHello() should be ideally an abstract method called from either PushMeterRegistry#start() or possibly from even MeterRegistry constructor (config is available there) so that subclasses can override it with their relevant details. Not knowing where the metrics are published is a very frequent complaint in my experience.

--- a/implementations/micrometer-registry-otlp/src/main/java/io/micrometer/registry/otlp/OtlpMeterRegistry.java
+++ b/implementations/micrometer-registry-otlp/src/main/java/io/micrometer/registry/otlp/OtlpMeterRegistry.java
@@ -135,6 +135,17 @@ public class OtlpMeterRegistry extends PushMeterRegistry {
             this.meterPollingService.scheduleAtFixedRate(this::pollMetersToRollover, getInitialDelay(),
                     config.step().toMillis(), TimeUnit.MILLISECONDS);
         }
+
+        if (config.enabled()) {
+            sayHello();
+        }
+    }
+
+    private void sayHello() {
+        logger.info("Publishing metrics for {} to {} with resource attributes {}",
+            getClass().getSimpleName(),
+            config.url(),
+            config.resourceAttributes());
     }

@shakuzen
Copy link
Member

shakuzen commented Feb 8, 2024

I thought I'm logging resource attributes, not response body - am I missing anything?

Sorry, I mixed up the existing code with what you added in the diff. It looks good to me.

For the other diff, let's separate that into another issue so we can track and merge them separately so one doesn't block progress on the other.

@jonatan-ivanov
Copy link
Member

@climategadgets Would you mind creating a PR?
We can apply the patches you sent as comments above but it would be more trackable if you could open a PR.

@climategadgets
Copy link
Contributor Author

@climategadgets Would you mind creating a PR? We can apply the patches you sent as comments above but it would be more trackable if you could open a PR.

Of course, will do.

@shakuzen
Copy link
Member

shakuzen commented Mar 1, 2024

1.13.0-M2 is planned for March 11. It would be nice to include this in that pre-release so users can test it out and give feedback.

@shakuzen shakuzen modified the milestones: 1.13.x, 1.13.0-M2 Mar 1, 2024
@climategadgets
Copy link
Contributor Author

1.13.0-M2 is planned for March 11. It would be nice to include this in that pre-release so users can test it out and give feedback.

Hopefully, tonight.

@climategadgets
Copy link
Contributor Author

For the other diff, let's separate that into another issue so we can track and merge them separately so one doesn't block progress on the other.

#4828

climategadgets added a commit to climategadgets/micrometer that referenced this issue Mar 5, 2024
climategadgets added a commit to climategadgets/micrometer that referenced this issue Mar 5, 2024
climategadgets added a commit to climategadgets/micrometer that referenced this issue Mar 5, 2024
shakuzen pushed a commit that referenced this issue Mar 11, 2024
#4829)

The context added to the log messages contains the config URL and resource attributes. This will make troubleshooting easier.

Closes gh-4693
@shakuzen shakuzen removed this from the 1.13.0-M2 milestone Mar 11, 2024
@shakuzen shakuzen removed the enhancement A general enhancement label Mar 11, 2024
@shakuzen shakuzen removed help wanted An issue that a contributor can help us with registry: otlp OpenTelemetry Protocol (OTLP) registry-related labels Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants