-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup disk metrics #5
Cleanup disk metrics #5
Conversation
Also, just re-read the recommendations for writing exporters and about not exposing percentages. Fix coming. |
Looks great. Waiting for @brian-brazil to comment before merging |
I did some more cleanup, primarily cosmetic stuff on the metric names, as well as updating the help texts from |
It also seems that some of the data we get might actually be counters rather than gauges - the ones with CounterType |
Regarding help texts, I think it's a good idea to keep the WMI names in there too (maybe in parenthesis, don't know if there are specific conventions for the help texts in prometheus), to make it meaningful for users to understand what the measure is derived from. From the prometheus docs:
https://prometheus.io/docs/instrumenting/writing_exporters/#help-strings |
Yes, agreed. What do you think about this format?
|
That would be great, also |
bea56fe
to
63c7640
Compare
Done! |
By the way, feel free to rename perf.go to logical_disk.go in this PR |
"AvgDiskBytesPerTransfer", | ||
ReadsPerSec: prometheus.NewDesc( | ||
prometheus.BuildFQName(wmiNamespace, "perf", "reads_per_sec"), | ||
"The rate of read operations on the disk (LogicalDisk.DiskReadsPerSec)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over what time period?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not really specified in the documentation, sadly :( Would that make it useless to expose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes it a lot less useful, but if it's all we have it's all we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs we need are https://msdn.microsoft.com/en-us/library/ms803973.aspx and then to https://msdn.microsoft.com/en-us/library/ms804018.aspx
This is actually a Counter, as we're pulling from PerfDataRaw
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found the same thing, but at a different part of msdn (it's a maze of a site...): https://msdn.microsoft.com/en-us/windows/hardware/aa389383(v=vs.71). I'm changing the metrics to counters, now. What would be good new names? reads_total
etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, wmi_perf_logicaldisk_reads_total
.
Is the perf
redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the perf part has been dropped (not yet pushed, though). All the WMI classes we'll use are prefixed with perf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add https://msdn.microsoft.com/en-us/library/ms803973.aspx to the PR too for future reference?
Now that we've mostly figured out the type system, we should start a wiki page or something documenting the mapping from the various "Counter" types to Prometheus Metrics. |
AvgDiskBytesPerTransfer_Base uint32 | ||
AvgDiskBytesPerWrite uint64 | ||
AvgDiskBytesPerWrite_Base uint32 | ||
AvgDiskQueueLength uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that this is a Counter - which makes no sense for a queue length,
What it actually is is disk requests queued.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to requests_queued_total
Hm, the rename wasn't very good, now all the comments are hidden away in other diffs :/ |
I'm still perusing the docs, but I have a crazy idea. It looks like it's possible to get to CounterType with WMI (though we'll need to use a different WMI library). Given that and presuming that metric naming is relatively consistent we may be able to fully automate the transformation of metric names and the user would only have to provide a WMI class name. We'll likely still need some hardcoded rules to fix e.g. bytes->megabytes. I also note we cannot completely rely on documentation alone for CounterTypes - several of the ones in this PR changed in Windows 2000 (which means this won't quite work on XP). |
@brian-brazil Interesting idea! We could just have a mapping list that depends on the OS version, for compatibility. Plus some utility functions for standard unit conversions in case they are not fully standardized. Along with the discussion re IIS/SQL server metrics, if those should become separate exporters maybe such automatic transformation and conversion could form the base for a wmi exporter library that can be used to easily build new exporters? |
@brian-brazil I think I've addressed all the comments now, and the numbers look reasonable from what I can see |
The counter types is what changed between versions. What I'd see happening for IIS/SQL is that you'd run the WMI exporter but with different flags just pulling in IIS/SQL metrics. |
|
||
ch <- prometheus.MustNewConstMetric( | ||
c.RequestsQueued, | ||
prometheus.CounterValue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gauge
Hm, maybe need to rebase this branch. Appveyor is complaining it is unmergable without conflicts now. |
cb3027f
to
fcf5125
Compare
Yep, just pushed :) (Very nice work adding it, btw!) |
Need to squash before merging as well |
@brian-brazil interesting idea, indeed. there is hundreds of WMI sources, but I'm not sure most of them is very usable. Let's discuss that in a separate issue #12 |
👍 I'd suggest squashing commits. |
fcf5125
to
1caa658
Compare
Done! |
Great work! |
i love this diff :) thanks for all the work |
Addresses #4 by:
Because of how the percent metrics are calculated (
percent_metric_name / percent_metric_name_base
), they get "fake precision" when getting close to zero, egwmi_perf_percent_disk_write_time{volume="HarddiskVolume3"} 2.931438995229291e-11
.Possibly we should truncate to zero at some point? @brian-brazil - any recommendations here?