Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Windows Performance Counters Receiver #1088

Closed
james-bebbington opened this issue Sep 22, 2020 · 6 comments · Fixed by #1506
Closed

Add Windows Performance Counters Receiver #1088

james-bebbington opened this issue Sep 22, 2020 · 6 comments · Fixed by #1506
Labels
enhancement New feature or request

Comments

@james-bebbington
Copy link
Member

james-bebbington commented Sep 22, 2020

Windows Performance Counters Receiver Proposal

This receiver, for Windows only, will capture any system, application, or custom performance counter data from the Windows registry. This receiver will use the PDH interface to capture this data.

Note there are multiple ways to consume counter data from Windows as described here (or see a more detailed explanation here). The primary methods are:

  1. WMI: The highest level interface for accessing "typed" metric data. I don't think this supports accessing custom counters (not 100% sure). Regardless, we want to avoid using this due to performance concerns as documented in detail here: CPU collector blocks every ~17 minutes on call to wmi.Query prometheus-community/windows_exporter#89
  2. PDH: A low level interface for accessing performance counter data. This is a relatively straightforward to use interface that returns performance counter data in the format expected by users.
  3. Registry Counters: The lowest level interface for accessing performance counter data. This returns raw values, often Sums, instead of rates / averages. While this is useful in some cases, this is generally difficult to use, and returns values that will be difficult for users to interpret, so is probably not appropriate for a generic receiver.

==> We will use (2), and port the existing implementation from https://github.com/open-telemetry/opentelemetry-collector/tree/master/receiver/hostmetricsreceiver/internal/windows/pdh as described in open-telemetry/opentelemetry-collector#973.

Proposed Configuration

Option One: Build expressive syntax for generating metrics from Performance Counters.

windowsperformancecountersreceiver:
  metrics:
    # simple gauge metric populated from single counter value
    - name: iis/current_connections
      type: gauge # could consider making gauge the default since this will be the case for 90% of perf counters
      counter: "Web Service (_Total)/Current Connections"

    # counter metric that has two dimensions (one populated from the perf counter instance name & another populated from a set of perf counters)
    - name: iis/request_count
      type: counter
      instance_label: website # optional (if this value is specified, then the counter is expected to return multiple results)
      instance_filter: [ "MyWebsite", "AnotherWebsite", "_Total" ]
      counter_label: http_method
      counters:
        - counter: "Web Service (*)/Total Get Requests"
          label_value: get
        - counter: "Web Service (*)/Total Post Requests"
          label_value: post

Option Two: Build simple syntax for generating metrics from Performance Counters, and extend the Metrics Transform processor instead.

windowsperformancecountersreceiver:
  counters:
    - object: "Web Service"
      instances: [ "_Total" ]
      counters:
        - "Current Connections"
    - object: "Web Service"
      instances: [ "MyWebsite", "AnotherWebsite", "_Total" ]
      counters:
        - "Total Get Requests"
        - "Total Post Requests"

...

metricstransformprocessor:
  transforms:
    # rename "Web Service (*)/Current Connections" -> iis/current_connections
    - metric_name: "Web Service (*)/Current Connections"
      action: update
      new_name: iis/current_connections

    # combine "Web Service (*)/Total * Requests" -> iis/request_count (this is not existing functionality)
    - metric_names:
        - name: "Web Service (*)/Total Get Requests"
          label_value: get
        - name: "Web Service (*)/Total Post Requests"
          label_value: post
      action: combine
      label: http_method
      new_name: iis/request_count
      # change label instance -> website
        - action: update_label
          label: instance
          new_label: website
@jrcamp
Copy link
Contributor

jrcamp commented Sep 23, 2020

Do you think IIS would have its own dedicated receiver in the future (that wraps or leverages this one)? That way once we have docs, etc. it'll show up there? Or were you thinking of just supporting it (and the other performance counter ones) via configuration? I think the best user experience is to have a dedicated receiver for it which doesn't rely user having to add lots of special stuff to their configuration.

@james-bebbington
Copy link
Member Author

james-bebbington commented Sep 23, 2020

I don't have any plans to add a receiver for IIS, etc, specifically. Specifying this in config is fine for us, and super flexible for future uses. If you think that would be useful, it could easily be added later though.

As another alternative, it could be worth investigating how config works so that we can split up config to make this tidier, rather than having one giant config file, e.g.

<config.yaml>
includes:
  - config_self.yaml
  - config_iis.yaml
  - config_sqlserver.yaml

# shared components listed here
<config_self.yaml>
pipelines:
  metrics/self:
    receivers: [ prometheus/self ]
    processors: [ metricstransformprocessor/self ]
    exporters: [ ... ]

receivers:
  prometheus/self:
    # scrape self metrics

processors:
  metricstransformprocessor/self:
    ...
<config_iis.yaml>
pipelines:
  metrics/iis:
    receivers: [ windowsperformancecountersreceiver/iis ]
    processors: [ metricstransformprocessor/iis ]
    exporters: [ ... ]

receivers:
  windowsperformancecountersreceiver/iis:
    ...

processors:
  metricstransformprocessor/iis:
    ...
<config_sqlserver.yaml>
pipelines:
  metrics/sqlserver:
    receivers: [ windowsperformancecountersreceiver/sqlserver ]
    processors: [ metricstransformprocessor/sqlserver ]
    exporters: [ ... ]

receivers:
  windowsperformancecountersreceiver/sqlserver:
    ...

processors:
  metricstransformprocessor/sqlserver:
    ...

I looked into this briefly a while ago. Viper does support loading config from multiple files, but with some limitations (arrays are replaced, not merged). It also doesn't support includes; if the following PR was merged it would allow us to do the suggestion above: spf13/viper#893

@jrcamp
Copy link
Contributor

jrcamp commented Sep 23, 2020

One thing I've been thinking about for a while is a concept of "scoped processors". I think sometimes it's confusing to have to do filtering or renaming in a separate processor section away from the receiver that's collecting it. What if any receiver could configure processors within it? For example (calling option #3):

windowsperformancecounters:
  counters:
    - object: "Web Service"
      instances: [ "_Total" ]
      counters:
        - "Current Connections"
    - object: "Web Service"
      instances: [ "MyWebsite", "AnotherWebsite", "_Total" ]
      counters:
        - "Total Get Requests"
        - "Total Post Requests"

  processors:
    - metricstransform:
        transforms:
          # rename "Web Service (*)/Current Connections" -> iis/current_connections
          - metric_name: "Web Service (*)/Current Connections"
            action: update
            new_name: iis/current_connections

I don't think this is as good in terms of UX as Option #1 but better than Option #2. It would also work well if we were able to split up configs like you showed above. However most people will probably either a) be using a wrapped receiver (which is why I wanted to ask the above question first) or b) pre-canned configs provided to them. Given that I probably slightly prefer #2 or #3 due to being able to reuse metricstransform processor logic. And I prefer #3 because people can copy/include just 1 section of config rather than having to copy a receiver section and a processor section separately.

@james-bebbington
Copy link
Member Author

That is an interesting proposal. Note I updated my example above to include processors as well to show that they can still be described in the same block / file as the related receiver.

Having processors "scoped" to a receiver doesn't seem all that different from having a separate pipeline for each receiver, so I feel that (2) & (3) are quite similar. But being able to separate a giant config across multiple files seems quite useful.

@james-bebbington
Copy link
Member Author

james-bebbington commented Oct 12, 2020

Based on feedback, I went with Option 2. We could extend this to Option 3 in the future if desired, but I will leave that as possible future work.


So the next question is how to best implement the combine operation in the Metrics transform processor? My initial thought was to configure this as described in the issue description, but that config is very specific to this operation.

A better alternative is proposed below:

Step 1: Change the metric_name filter to be a filterset. This will apply to all existing transforms, meaning you can do things like:

# update label keys and values for all system.network metrics
- metric_names: ^system\.network\.
  match_type: regexp
  action: update
  operations:
    # change label interface -> device
    - action: update_label
      label: interface
      new_label: device
    # change direction label values receive -> rx
    - action: update_label
      label: direction
      value_actions:
        # receive -> rx
        - value: receive
          new_value: rx

Step 2: Add a combine action:

# combine "Web Service (*)/Total .* Requests" -> iis/request_count
- metric_names: ^Web Service (*)/Total .* Requests$
  match_type: regexp
  action: combine
  new_name: iis/request_count # mandatory if action = combine (or new)
  combine_labels: # mandatory if action = combine 
    - name: "Web Service (*)/Total Get Requests"
      labels: { http_method: get }
    - name: "Web Service (*)/Total Post Requests"
      label_value: { http_method: post }
    # could add a fallback option here; for now will ignore + report if any matched metrics are not in this list
  operations:
    # change label instance -> website
    - action: update_label
      label: instance
      new_label: website

^ Note combine differs from merge, which is another future transform that might be desired (i.e. merge / aggregate data points across metrics).

tigrannajaryan pushed a commit that referenced this issue Oct 30, 2020
…#1211)

Adds skeleton for Windows Performance Counters Receiver with very basic initial functionality that can read simple counters that do not include any instance into a Gauge metric - see the added README.md for more details.

**Link to tracking Issue:** #1088

Note the scaffolding code makes use of the new `receiverhelper` & `scraperhelper` functions added in the Core PRs referenced below.

**Depends on:** #1175, open-telemetry/opentelemetry-collector#1886, open-telemetry/opentelemetry-collector#1890
@james-bebbington
Copy link
Member Author

james-bebbington commented Nov 3, 2020

A much more simplified variant of the second option is to just use regexp substitution to specify label names, i.e. the above becomes:

# combine "Web Service (*)/Total .* Requests" -> iis/request_count
- metric_names: ^Web Service \(\*\)/Total (?P<http_method>.*) Requests$
  match_type: regexp
  action: combine
  new_name: iis/request_count
  operations:
    # change label instance -> website
    - action: update_label
      label: instance
      new_label: website

I will start with this and we can add more complex alternatives in the future if needed.

@andrewhsu andrewhsu added the enhancement New feature or request label Jan 6, 2021
codeboten pushed a commit that referenced this issue Nov 23, 2022
The failing test was introduced I believe by a change in behaviour noted in this issue: getmoto/moto#3292. The solution is to set a region in the create_bucket call.

Fixes #1088
codeboten pushed a commit that referenced this issue Nov 23, 2022
The failing test was introduced I believe by a change in behaviour noted in this issue: getmoto/moto#3292. The solution is to set a region in the create_bucket call.

Fixes #1088
codeboten pushed a commit that referenced this issue Nov 23, 2022
* Use a shorter timeout for AWS EC2 metadata requests

Fix #1088 

According to the docs, the value for `timeout` is in seconds: https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen. 1000 seconds seems slow and in some cases can block the startup of the program being instrumented (see #1088 as an example), because the request will hang indefinitely in non-AWS environments. Using a much shorter 1 second timeout seems like a reasonable workaround for this.

* add changelog entry for timeout change

* use 5s timeout for ECS and EKS, update changelog

Co-authored-by: Srikanth Chekuri <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants