-
Notifications
You must be signed in to change notification settings - Fork 302
dcgm-exporter missing many metrics after upgrade #143
Comments
After upgrading from 2.0.0-rc12 to 2.1.2 (building from source using the tags in the Git repo), I'm missing these:
The rest appear to be there, but I haven't really compared the values to see if they end up in the same ballpark. |
based upon: 2.0.0-rc.12...master ... there are some changes related to metrics.... filtering zero values and masking others 'based on constant'... worthy looking into these to see if they're causing the missing metrics. iirc there were a few that would never display real values... |
One of our machines "involuntarily" updated the dcgm exporter docker image and we're now missing some metrics like Here's the full list:
We also gained these:
|
Thank you for using the dcgm-exporter project and reporting this issue. We are sad to hear your scenarios were negatively affected by our changes. Unfortunately, we deliberately made the changes to the set of the enabled by default metrics. I’d recommend you to provide your .csv configuration file with only those metrics that you need and use. Considering all the above, we changed the default .csv configuration file and kept only a basic set of metrics that would not made unnecessary load on users’ systems. And we urge you to provide your .csv configuration files with carefully selected metrics that you need to monitor. We have not deleted the metrics themselves, so you can get the previous metrics ignoring my recommendations about deprecated ones. |
@nikkon-dev thanks for the update. For the meantime we re-enabled Effectively the issue we had was one of documentation. We deploy the dcgm-exporter docker image as a systemd service as defined by deepops. It pulls the newest image whenever the service starts - that's the root problem in my book and we're looking at options how to fix that. From our point of view metrics we needed just suddenly disappeared and we couldn't figure out on our own how to get them back. Looking through the commits it was version 2.3.0 that disabled |
@nikkon-dev what recommendation do you have for people using https://grafana.com/grafana/dashboards/12239 ? |
@mattf, Thank you for pointing to that Grafana dashboard. I reached to the author, and we will update the dashboard according to the current set of enabled-by-default metrics. For the future, we want to research if such dashboards could be autogenerated based on the dcgm-exporter configuration. |
@nikkon-dev Do you have new updated dashboard ? |
We updated the dashboard to reflect the current state of the default dcgm-exporter configuration. WBR, |
I've updated our dcgm-exporter deployed directly in docker to tag 2.0.13-2.1.2-ubuntu20.04, but many metrics are missing.
It only exports 18 metrics, compared with 34 in tag 1.7.2. Is this expected? or it is a bug?
This is the command we use:
The following metrics are missing. I do see them enabled in
default-counters.csv
though.The text was updated successfully, but these errors were encountered: