Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Prometheus linter fails for Prometheus metrics reported by Ballerina #39862

Closed
nilushancosta opened this issue Mar 13, 2023 · 6 comments · Fixed by ballerina-platform/module-ballerinax-prometheus#158
Assignees
Labels
Team/Observability Observability in general, Metrics, Tracing & Logging Type/Bug

Comments

@nilushancosta
Copy link

Description

Prometheus has a promtool check metrics commands. This runs a built in linter where we can pass Prometheus metrics over stdin to lint them for consistency and correctness.

When I ran this command for Ballerina Promethus metrics, linting failed.
E.g. Error message

error while linting: text format parsing error in line 17: second HELP line for metric name "response_time_seconds"

Steps to Reproduce

  1. Create a new project by executing bal new command
  2. Add the following program to the main.bal file
import ballerina/http;
import ballerinax/prometheus as _;

service /srvc on new http:Listener(8090) {
    resource function get test() returns record{}|error {
        http:Client httpEp = check new (url = "https://postman-echo.com");
        record {} getResponse = check httpEp->get(path = "/get");
        return getResponse;
    }
}
  1. Add a Config.toml file with the following configuration to enable the Prometheus metrics exporter
[ballerina.observe]
metricsEnabled=true
metricsReporter="prometheus"
  1. Run it by executing bal run and send several requests to `http://localhost:8090/srvc/test
  2. Execute the following to run the linter. If you are using brew you can install the tool through Prometheus (brew install prometheus)
curl http://localhost:9797/metrics | promtool check metrics

Affected Version(s)

Ballerina 2201.4.0 (Swan Lake Update 4)
Language specification 2022R4
Update Tool 1.3.13

OS, DB, other environment details and versions

No response

Related area

-> Observability

Related issue(s) (optional)

No response

Suggested label(s) (optional)

No response

Suggested assignee(s) (optional)

No response

Copy link

This issue is NOT closed with a proper Reason/ label. Make sure to add proper reason label before closing. Please add or leave a comment with the proper reason label now.

      - Reason/EngineeringMistake - The issue occurred due to a mistake made in the past.
      - Reason/Regression - The issue has introduced a regression.
      - Reason/MultipleComponentInteraction - Issue occured due to interactions in multiple components.
      - Reason/Complex - Issue occurred due to complex scenario.
      - Reason/Invalid - Issue is invalid.
      - Reason/Other - None of the above cases.

@keizer619 keizer619 reopened this Jun 14, 2024
@NipunaMadhushan
Copy link
Contributor

This error occurs due to formatting error in the following several metrics.

response_time_seconds_mean
response_time_seconds_max
response_time_seconds_min
response_time_seconds_stdDev

These metrics are positioned under summary metrics for response_time_seconds which contains the headers for TYPE and HELP text only for response_time_seconds metric.

@NipunaMadhushan NipunaMadhushan moved this from Done to On Hold in Ballerina Team Main Board Sep 24, 2024
@NipunaMadhushan
Copy link
Contributor

Noticed that we provide HELP text and TYPE text each and every metric with the same metric name which is not the correct format of metric exposition.

The metric exposition format can be found here.

@NipunaMadhushan
Copy link
Contributor

NipunaMadhushan commented Oct 23, 2024

After correcting the above mentioned formatting issue, I received the following response for the promtool metric check.

requests_total_value counter metrics should have "_total" suffix
response_time_nanoseconds_total_value counter metrics should have "_total" suffix
response_time_nanoseconds_total_value use base unit "seconds" instead of "nanoseconds"
response_time_seconds_max no help text
response_time_seconds_mean no help text
response_time_seconds_min no help text
response_time_seconds_stdDev metric names should be written in 'snake_case' not 'camelCase'
response_time_seconds_stdDev no help text

Usually counter metric name should end with total or sum according to the best practices of metrics naming and labelling.
The best practices for naming metrics and labels can be found here.
The data model of metrics can be found here.

response_time_seconds_max, response_time_seconds_mean, response_time_seconds_min, response_time_seconds_stdDev are included in the summary metric. But summary metric has a format and it is as follows.

  • streaming φ-quantiles (0 ≤ φ ≤ 1) of observed events, exposed as {quantile="<φ>"}
  • the total sum of all observed values, exposed as _sum
  • the count of events that have been observed, exposed as _count

This format should be followed to be identified as a summary metric.

@NipunaMadhushan
Copy link
Contributor

NipunaMadhushan commented Oct 24, 2024

PR will resolve the formatting issues regarding the metric data model.

How ever there are still other formatting issues related to metric and label naming conventions.

  • requests_total_value - counter metrics should have "_total" suffix
  • response_time_seconds_stdDev - metric names should be written in 'snake_case' not 'camelCase'
  • timeWindow - label names should be written in 'snake_case' not 'camelCase'

@NipunaMadhushan NipunaMadhushan moved this from On Hold to PR Sent in Ballerina Team Main Board Oct 24, 2024
@github-project-automation github-project-automation bot moved this from PR Sent to Done in Ballerina Team Main Board Oct 24, 2024
Copy link

This issue is NOT closed with a proper Reason/ label. Make sure to add proper reason label before closing. Please add or leave a comment with the proper reason label now.

      - Reason/EngineeringMistake - The issue occurred due to a mistake made in the past.
      - Reason/Regression - The issue has introduced a regression.
      - Reason/MultipleComponentInteraction - Issue occured due to interactions in multiple components.
      - Reason/Complex - Issue occurred due to complex scenario.
      - Reason/Invalid - Issue is invalid.
      - Reason/Other - None of the above cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team/Observability Observability in general, Metrics, Tracing & Logging Type/Bug
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants