Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle duplicated TYPE line for prometheus metrics #18813

Closed
crisdarocha opened this issue May 28, 2020 · 12 comments · Fixed by #33865
Closed

Handle duplicated TYPE line for prometheus metrics #18813

crisdarocha opened this issue May 28, 2020 · 12 comments · Fixed by #33865
Labels
Metricbeat Metricbeat Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team

Comments

@crisdarocha
Copy link

Describe the enhancement:
Opening the issue for enhancement on behalf of a user.

They are collecting MicroProfile Metrics from Payara in JSON format.

- module: openmetrics
metricsets: ['collector']
period: 10s
hosts: ['localhost:8080']

# This module uses the Prometheus collector metricset, all
# the options for this metricset are also available here.
metrics_path: /metrics/
metrics_filters:
include: []
exclude: []

Unfortunately the Payara versions 5.193.1, 5.194 and 5.201 have a bug in their MicroProfile Metrics implementation and the output contains repeated TYPE lines

# TYPE base_gc_total_total counter
# HELP base_gc_total_total Displays the total number of collections that have occurred. This attribute lists -1 if the collection count is undefined for this collector.
base_gc_total_total{name="PS MarkSweep"} 4
...
# TYPE base_gc_total_total counter
# HELP base_gc_total_total Displays the total number of collections that have occurred. This attribute lists -1 if the collection count is undefined for this collector.
base_gc_total_total{name="PS Scavenge"} 34

This violates the standard and Metricbeat yields an error:

  "error": {
    "message": "unable to decode response from prometheus endpoint: decoding of metric family failed: text format parsing error in line 43: second TYPE line for metric name \"base_gc_total_total\", or TYPE reported after samples"
  },

The request is to be able to have Metricbeat to ignore duplicate (identical) TYPE lines (convert Error to Warning) and process data nevertheless.

This bug is fixed in Payara 5.202RC1 but the upgrade is complex and lengthy due to scale of usage.

Describe a specific use case for the enhancement or feature:
Allow users in "bugged" version of Payara to still use Metricbeat.

As per private discussion with @exekias and @sorantis . Opening the case to keep a register of demand.

@crisdarocha crisdarocha added the Metricbeat Metricbeat label May 28, 2020
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 28, 2020
@andresrc andresrc added the Team:Platforms Label for the Integrations - Platforms team label May 28, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 28, 2020
@ChrsMark ChrsMark added the good first issue Indicates a good issue for first-time contributors label May 28, 2020
@exekias
Copy link
Contributor

exekias commented May 28, 2020

I think the underlying problem is that we use a different lib to parse metrics than Prometheus is using, this seems to cause some unexpected behaviors when the source data doesn't really follow with the format.

We may want to investigate a way to use the same code paths that Prometheus is using to collect metrcs

@ChrsMark I'm not sure this one is trivial, what's the approach you had in mind?

@ChrsMark
Copy link
Member

Hmm, yeap it might not be so easy yes. The code cannot even "unpack" the response, right?
I had in mind that the error occurs after the response is unpacked and can be processed to fix this kind of issues.

@exekias exekias removed the good first issue Indicates a good issue for first-time contributors label May 28, 2020
@hgruck
Copy link

hgruck commented Mar 31, 2021

Is there any plan to fix this?

@ChrsMark
Copy link
Member

Hey we plan to move to an improved parsing library so this might fix this one too: #24707

@xuoguoto
Copy link

I too am getting this error on metricbeat version 7.13.1 (amd64), libbeat 7.13.1 [2d80f6e99f41b65a270d61706fa98d13cfbda18d]

module/wrapper.go:259 Error fetching data for metricset prometheus.collector: unable to decode response from prometheus endpoint: decoding of metric family failed: text format parsing error in line 45: second TYPE line for metric name "_err_null_node_blackholed_packets", or TYPE reported after samples

@ChrsMark ChrsMark added Team:Integrations Label for the Integrations team and removed Team:Platforms Label for the Integrations - Platforms team labels Jun 11, 2021
@ChrsMark
Copy link
Member

@xuoguoto do you have a similar case with what is described in this issue's description? If so I'm afraid that there is no quick fix for this at the moment since this violated the Prometheus standard. As mentioned in previous comment these kind of issues might be resolved when/if we finally move to a new parsing library (#24707).

@xuoguoto
Copy link

@ChrsMark From the exporter, here is what I see when greping for _err_null_node_blackholed_packets:

# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="0"} 0
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="1"} 250319
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="2"} 1
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="3"} 140111
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="4"} 0
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="5"} 0
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="6"} 0
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="7"} 0
# TYPE _err_null_node_blackholed_packets counter
_err_null_node_blackholed_packets{thread="8"} 0

Is this a problem?

@hamelg
Copy link

hamelg commented Sep 16, 2021

Here we got this issue but with a slight variation.

unable to decode response from prometheus endpoint: decoding of metric family failed: text format parsing error in line 58: second TYPE line for metric name "jvm_classes_loaded", or TYPE reported after samples

sh-4.2# curl -s http://10.1.86.129:9779/metrics|cat -n |grep jvm_classes_loaded
   54  # HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
   55  # TYPE jvm_classes_loaded gauge
   56  jvm_classes_loaded 28959.0
   57  # HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started ex
ecution
   58  # TYPE jvm_classes_loaded_total counter
   59  jvm_classes_loaded_total 29166.0

@peterschrott
Copy link

@hamelg, I encountered the same issue as you. Metrics are exposed via Prometheus JMX Exporter. The weird thing is, that metricbeat behaves differnt on different versions of the JMX Exporter

With MX Exporter v 0.14.0 everything works as expected – metrics are exported, with v 0.16.1 I get the following error

2022-04-05T17:33:07.769+0200	INFO	module/wrapper.go:259	Error fetching data for metricset prometheus.collector: unable to decode response from prometheus endpoint: decoding of metric family failed: text format parsing error in line 4: second TYPE line for metric name "jvm_classes_loaded", or TYPE reported after samples

Output with MX Exporter v 0.14.0

# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 39039.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 39481.0

Output with MX Exporter v 0.16.1:

# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 18998.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 18998.0

@ChrsMark ChrsMark added Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team and removed Team:Integrations Label for the Integrations team labels Dec 12, 2022
@ChrsMark
Copy link
Member

Hey @peterschrott ! Could you also share the returned headers in both cases if you curl the endpoints?

@ChrsMark
Copy link
Member

ChrsMark commented Dec 13, 2022

A quick heads-up on this.

A Prometheus server is able to scrape metrics from an endpoint that exposes duplicated metrics. In that case both metrics are collected without and issue. I verified that the case reported in the issue's description can be handled without an issue by a Prom Server.

So for an endpoint exposing the following:

# TYPE base_gc_total_total counter
# HELP base_gc_total_total Displays the total number of collections that have occurred. This attribute lists -1 if the collection count is undefined for this collector.
base_gc_total_total{name="PS MarkSweep"} 4
# TYPE base_gc_total_total counter
# HELP base_gc_total_total Displays the total number of collections that have occurred. This attribute lists -1 if the collection count is undefined for this collector.
base_gc_total_total{name="PS Scavenge"} 34

The Prom Server will collect both metrics for example:

base_gc_total_total{instance="containerd:1338", job="duplicate-types", name="PS MarkSweep"}  4
base_gc_total_total{instance="containerd:1338", job="duplicate-types", name="PS Scavenge"} 34

So in that case with the current Metricbeat module we are not able to provide the same experience. The upgrade of the library at #33865 will solve this issue.

As far as the java client exporters is concerned, I cannot say for sure what was the issue but I suspect that it has to do with https://github.com/prometheus/client_java/releases/tag/parent-0.10.0 or something similar as reported at #24554. In such cases the headers need to be verified and if the endpoint is openmetrics users are advised to use the openmetrics module introduced with #27269.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Metricbeat Metricbeat Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants