Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a possible_prometheus_urls parameter to the OpenMetrics base check #9573

Merged
merged 12 commits into from
Jun 30, 2021

Conversation

L3n41c
Copy link
Member

@L3n41c L3n41c commented Jun 22, 2021

What does this PR do?

Add a new possible_prometheus_urls parameter to the OpenMetrics base check.
It takes a list of possible OpenMetrics endpoints to try.
The data are then scraped from the first endpoint that worked.

Motivation

The goal of the agent auto-discovery is to ease the agent configuration by enabling and configuring as much checks as we can automatically.
In the context of Kubernetes control plane components, this goal is thwarted by the diversity of Kubernetes versions and distributions that the agent must support.
In some versions of Kubernetes, the endpoint is plain http, on other versions, only the secure https endpoint is available. On some distributions, the port is bound only on localhost whereas on others, the endpoint is accessible on the node IP, etc.
In order to provide “out-of-the-box” monitoring of those kind of components, we need the check to be able to try several “standard” endpoints.

Additional Notes

When a user is explicitly setting a check up, they must keep on using the single-valued prometheus_url.
possible_prometheus_urls is mainly designed to be used in the auto_conf.yaml files in the scope of auto-discovery.

This new parameter is needed by the following checks:

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have changelog/ and integration/ labels attached

@codecov
Copy link

codecov bot commented Jun 22, 2021

Codecov Report

Merging #9573 (0e515e0) into master (498cc43) will decrease coverage by 0.07%.
The diff coverage is 22.22%.

Flag Coverage Δ
btrfs 82.91% <ø> (ø)
dns_check 94.00% <ø> (-0.45%) ⬇️
eks_fargate 94.05% <ø> (ø)
external_dns 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Copy link
Contributor

@ahmed-mez ahmed-mez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good thanks!
Two notes:

  • I think the new parameter needs to be documented here.
  • Openmetrics V2 needs to support this at some point, otherwise we couldn't migrate the kube_* checks to the V2. The question is, should we make Openmetrics V2 support prometheus_possible_urls as part of this PR or not?

@L3n41c
Copy link
Member Author

L3n41c commented Jun 22, 2021

@ahmed-mez said:

I think the new parameter needs to be documented here.

If I add the new parameter in this file like in 07ba551, then the new parameter will need to be added in the conf.yaml.example file of all the OpenMetrics based checks: see CI failure here.

I think that, for other, non-auto-configured non-kube checks, it would be better to encourage the users to specify a single URL with the prometheus_url parameter in order to have the most deterministic behaviour as possible.
That’s why I’d rather use prometheus_possible_urls only in the auto_conf.yaml files of the kube checks and wanted to not document it in the conf.yaml.example files of all the other checks.

Copy link
Contributor

@ofek ofek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this. Would it not make more sense for AD itself to try each URL and create the instance based on the first accessible one?

@clamoriniere
Copy link
Contributor

clamoriniere commented Jun 23, 2021

I'm not a fan of this. Would it not make more sense for AD itself to try each URL and create the instance based on the first accessible one?

@ofek , IMO, The problem will be that the AD will need to know for each integration the possible URL, so it breaks the separation of concern by putting specific check logic into the AD.

@L3n41c
Copy link
Member Author

L3n41c commented Jun 23, 2021

@ofek said:

Would it not make more sense for AD itself to try each URL and create the instance based on the first accessible one?

Could you detail how you would enhance AD to do that ?
So far, AD detects some “triggers” to schedule checks (a container is created with a specific image name or k8s annotation or docker label, etc.). But AD mechanism is check-agnostic. Hence, the AD cannot understand or modify a config check.

I could replace this:

ad_identifiers:
  - kube-scheduler
init_config:
instances:
  - prometheus_possible_urls:
      - https://%%host%%:10259/metrics
      - https://localhost:10259/metrics
      - http://%%host%%:10251/metrics
      - http://localhost:10251/metrics

by that:

ad_identifiers:
  - kube-scheduler
init_config:
instances:
  - prometheus_url: https://%%host%%:10259/metrics
  - prometheus_url: https://localhost:10259/metrics
  - prometheus_url: http://%%host%%:10251/metrics
  - prometheus_url: http://localhost:10251/metrics

But that would schedule 4 checks. Potentially, 2 would be in permanent error and 2 would report duplicated metrics.

And I think it’s best to keep AD check-agnostic.

@ahmed-mez
Copy link
Contributor

not much to add here, but I definitely agree that configuring checks is not the job of Autodiscovery.

ofek
ofek previously approved these changes Jun 25, 2021
@ofek
Copy link
Contributor

ofek commented Jun 25, 2021

I'll implement in v2. FYI I plan to match the v2 name and call it possible_openmetrics_endpoints

@ofek
Copy link
Contributor

ofek commented Jun 25, 2021

Actually, what do all of you think about only adding this feature to v2?

@L3n41c
Copy link
Member Author

L3n41c commented Jun 28, 2021

Actually, what do all of you think about only adding this feature to v2?

In fact, I began trying to migrate the concerned checks to OpenMetricsBaseCheckV2, but I realised that it would require more efforts than just changing the base class.
Not only the internal APIs are a little bit different, but it would also change the parameters used in the conf.yaml file exposed to the users. tls_verify instead of ssl_verify for ex.
So, I thought that migrating to OpenMetricsBaseCheckV2 would deserve a dedicated work and extra-care about backward compatibility for users shipping their own conf.yaml files.
That’s why I finally decided to implement this feature on the V1.

@L3n41c
Copy link
Member Author

L3n41c commented Jun 28, 2021

/azp help

@azure-pipelines
Copy link

Supported commands
  • help:
    • Get descriptions, examples and documentation about supported commands
    • Example: help "command_name"
  • list:
    • List all pipelines for this repository using a comment.
    • Example: "list"
  • run:
    • Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run.
    • Example: "run" or "run pipeline_name, pipeline_name, pipeline_name"
  • where:
    • Report back the Azure DevOps orgs that are related to this repository and org
    • Example: "where"

See additional documentation.

@L3n41c
Copy link
Member Author

L3n41c commented Jun 28, 2021

/azp list

@L3n41c
Copy link
Member Author

L3n41c commented Jun 28, 2021

/azp run PR All

@DataDog DataDog deleted a comment from azure-pipelines bot Jun 28, 2021
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@L3n41c
Copy link
Member Author

L3n41c commented Jun 28, 2021

/azp run PR All

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@L3n41c L3n41c requested review from vickenty, ahmed-mez and ofek June 29, 2021 08:35
@ofek
Copy link
Contributor

ofek commented Jun 29, 2021

Config won't have to change, see: #8438

class OpenMetricsCompatibilityScraper(OpenMetricsScraper):

ahmed-mez
ahmed-mez previously approved these changes Jun 30, 2021
vickenty
vickenty previously approved these changes Jun 30, 2021
@L3n41c L3n41c dismissed stale reviews from vickenty and ahmed-mez via 0e515e0 June 30, 2021 16:11
@ofek ofek changed the title Add a prometheus_possible_urls parameter to the OpenMetrics base check Add a possible_prometheus_urls parameter to the OpenMetrics base check Jun 30, 2021
@ofek ofek merged commit 099995a into master Jun 30, 2021
@ofek ofek deleted the lenaic/prometheus_possible_urls branch June 30, 2021 17:05
github-actions bot pushed a commit that referenced this pull request Jun 30, 2021
…eck (#9573)

* Add a `prometheus_possible_urls` parameter to the OpenMetrics base check

* Remove non-ASCII character

* Document the new `prometheus_possible_urls` parameter

* Fix flake8 errors

* Revert "Document the new `prometheus_possible_urls` parameter"

This reverts commit 07ba551.

* Fix formatting

* Fix formatting

* Log an error if no possible url is reachable

* Simplify log

* Rename `prometheus_possible_urls` into `possible_prometheus_urls` 099995a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants