-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix missing GPM metrics #122
Conversation
pkg/nvml/gpm.go
Outdated
defer func() { | ||
metricsGet.Metrics = nvmlMetricsGet.Metrics | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid doing a defer like this and do something more explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can store the incoming ret
.
I was just wondering whether we should copy the metrics that are returned regardless of the ret
value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to remember why we pass a pointer to the GpmMetricsGetType
in here (and fill it out) rather than just returning (GpmMetricsGetType, Return)
. I have a vague memory of needing to do this but I don't know why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember now -- because the API requires some fields to be set as INPUTs and other fields to be set as OUTPUTs in the struct: https://github.com/NVIDIA/go-nvml/blob/main/gen/nvml/nvml.h#L9401-L9409
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a matter of interest, if some fields are used as inputs, using the GpmMetricsGetV
type seems quite akward. I suppose this is generated due to the versioned union though.
pkg/nvml/gpm.go
Outdated
metricsGet.Metrics = nvmlMetricsGet.Metrics | ||
}() | ||
|
||
return nvmlGpmMetricsGetStub(nvmlMetricsGet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea of adding a one-off stub like this. If we are going to do this, we should do it for everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I added one like this for the topology functions that I also added tests for. I didn't see the value in adding all of them if we weren't going to use them immediately.
Would you rather I remove the tests and the stubs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's fine then. I think we can add a level of indirection pretty easily though. We just instruct nvidia.yml
to rewrite all generated functions as nvml*Internal
and then do something similar to how I generate the various interfaces
to generate variables for all nvml* = nvml*Internal
by default. Even better would be to make them all function pointers hanging off of the library
type (so we don't have to do the weird update/reset in the tests), but that might be more trouble than it's worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Let's have a look at this at some point in the (near) future.
pkg/nvml/gpm.go
Outdated
@@ -57,14 +57,23 @@ type GpmMetricsGetVType struct { | |||
func (l *library) GpmMetricsGetV(metricsGet *GpmMetricsGetType) GpmMetricsGetVType { | |||
return GpmMetricsGetVType{metricsGet.convert()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be updated somehow too? Otherwise, how do the assigned metrics make their way out of this call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .convert()
call does also copy the metrics meaning that any metrics in metricsGet
will be in the result of convert()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should change this whole thing:
// nvml.GpmMetricsGet()
type GpmMetricsGetVType struct {
metricsGet *nvmlGpmMetricsGetType
}
func (l *library) GpmMetricsGetV(metricsGet *GpmMetricsGetType) GpmMetricsGetVType {
return GpmMetricsGetVType{metricsGet.convert()}
}
func (metricsGetV GpmMetricsGetVType) V1() Return {
metricsGetV.metricsGet.Version = 1
return nvmlGpmMetricsGet(metricsGetV.metricsGet)
}
func (l *library) GpmMetricsGet(metricsGet *GpmMetricsGetType) Return {
metricsGet.Version = GPM_METRICS_GET_VERSION
return nvmlGpmMetricsGet(metricsGet.convert())
}
to this:
// nvml.GpmMetricsGet()
type GpmMetricsGetVType struct {
metricsGet *GpmMetricsGetType
}
func (l *library) GpmMetricsGetV(metricsGet *GpmMetricsGetType) GpmMetricsGetVType {
return GpmMetricsGetVType{metricsGet}
}
func (metricsGetV GpmMetricsGetVType) V1() Return {
nvmlMetricsGet := metricsGetV.metricsGet.convert()
nvmlMetricsGet.Version = 1
ret := nvmlGpmMetricsGet(&nvmlMetricsGet)
*metricsGetV.metricsGet = nvmlMetricsGet.convert()
return ret
}
func (l *library) GpmMetricsGet(metricsGet *GpmMetricsGetType) Return {
nvmlMetricsGet := metricsGet.convert()
nvmlMetricsGet.Version = GPM_METRICS_GET_VERSION
ret := nvmlGpmMetricsGet(&nvmlMetricsGet)
*metricsGetV.metricsGet = nvmlMetricsGet.convert()
return ret
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see how your most recent changes will work for GpmMetricsGet()
(even though I still think we should use an explicit convert) -- but I still don't understand how you believe they work for GpmMetricsGetV(&metricsGet).V1()
without something like what I have above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right about V1()
. I've updated the revision to add a test for that too and also update the implementation.
Yes, but we need the values set back in the original variable passed in. |
pkg/dl/dl.go
Outdated
@@ -65,6 +65,7 @@ func dlError() error { | |||
} | |||
|
|||
func (dl *DynamicLibrary) Open() error { | |||
fmt.Printf("%+v\n", dl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Sorry, was editing in the wrong window here.
pkg/nvml/gpm.go
Outdated
return nvmlGpmMetricsGet(metricsGet.convert()) | ||
nvmlMetricsGet := metricsGet.convert() | ||
ret := nvmlGpmMetricsGetStub(nvmlMetricsGet) | ||
metricsGet.Metrics = nvmlMetricsGet.Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klueska we create a local instance of an nvml type before the function call and we set the return value explicitly here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, something changed since I made my previous comment. I was still seeing:
func (l *library) GpmMetricsGetV(metricsGet *GpmMetricsGetType) GpmMetricsGetVType {
return GpmMetricsGetVType{metricsGet.convert()}
}
whereby we lose our reference to the original metricsGet
variable as soon as we do metricsGet.convert()
. The current code seems to basically do what I was suggesting now.
The one difference being that you do:
metricsGet.Metrics = nvmlMetricsGet.Metrics
instead of:
*metricsGet = nvmlMetricsGet.convert()
This is equivalent for now -- but your version would need to be updated if any other fields become OUT fields in the future, wheras using convert()
avoids this.
pkg/nvml/gpm.go
Outdated
func gpmMetricsGet(metricsGet *GpmMetricsGetType) Return { | ||
nvmlMetricsGet := metricsGet.convert() | ||
ret := nvmlGpmMetricsGetStub(nvmlMetricsGet) | ||
metricsGet.Metrics = nvmlMetricsGet.Metrics | ||
return ret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klueska following your comments, I could update this whole thing to:
func gpmMetricsGet(metricsGet *GpmMetricsGetType) Return { | |
nvmlMetricsGet := metricsGet.convert() | |
ret := nvmlGpmMetricsGetStub(nvmlMetricsGet) | |
metricsGet.Metrics = nvmlMetricsGet.Metrics | |
return ret | |
func gpmMetricsGet(metricsGet *GpmMetricsGetType) Return { | |
nvmlMetricsGet := metricsGet.convert() | |
ret := nvmlGpmMetricsGet(nvmlMetricsGet) | |
*metricsGet = *nvmlMetricsGet.convert() | |
return ret |
do you have a preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is equivalent for now -- but your version would need to be updated if any other fields become OUT fields in the future, wheras using convert() avoids this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Let's switch to the full assignmet to better handle future fields.
Signed-off-by: Evan Lezar <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
These changes ensure that the collected GPM metrics are transferred to the original data structure passed to
GpmMetricsGet
.Closes #121