-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix platform power return unit #1468
Conversation
🤖 SeineSailor Here is a concise summary of the pull request changes: Summary: This pull request addresses issues related to platform power handling in the Kepler project, focusing on consistent energy unit handling and correction of power estimation functions. Key Modifications:
Impact: These changes ensure consistent energy unit handling across components and platforms, aligning with the rest of the Kepler project. The updates to power estimation functions and the introduction of a new utility function improve the accuracy and reliability of power handling in the codebase. Observations/Suggestions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of nits, but overall LGTM
@@ -31,7 +31,7 @@ var ( | |||
var _ = Describe("Test Exponential Predictor Unit", func() { | |||
It("Get Node Platform Power By Exponential Regression", func() { | |||
powers := GetNodePlatformPowerFromDummyServer(dummyExponentialWeightHandler, types.ExponentialTrainer) | |||
Expect(int(powers[0])).Should(BeEquivalentTo(4)) | |||
Expect(int(powers[0]/1000) * 1000).Should(BeEquivalentTo(4000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the *1000
and /1000
cancel out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are to ignore the leftover from exponential, logarithm, and logistic function but still keep the unit to millijoule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may make sense to break this out into a test helper function with a comment on what it's doing then.
@@ -31,7 +31,7 @@ var ( | |||
var _ = Describe("Test Logarithmic Predictor Unit", func() { | |||
It("Get Node Platform Power By Logarithmic Regression", func() { | |||
powers := GetNodePlatformPowerFromDummyServer(dummyLogarithmicWeightHandler, types.LogarithmicTrainer) | |||
Expect(int(powers[0])).Should(BeEquivalentTo(2)) | |||
Expect(int(powers[0]/1000) * 1000).Should(BeEquivalentTo(2000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
@@ -31,7 +31,7 @@ var ( | |||
var _ = Describe("Test Logistic Predictor Unit", func() { | |||
It("Get Node Platform Power By Logistic Regression", func() { | |||
powers := GetNodePlatformPowerFromDummyServer(dummyLogisticWeightHandler, types.LogisticTrainer) | |||
Expect(int(powers[0])).Should(BeEquivalentTo(2)) | |||
Expect(int(powers[0]/1000) * 1000).Should(BeEquivalentTo(2000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here
pkg/metrics/consts/conts.go
Outdated
@@ -24,7 +24,7 @@ const ( | |||
MetricsNamespace = "kepler" | |||
EnergyMetricNameSuffix = "_joules_total" | |||
UsageMetricNameSuffix = "_total" | |||
MiliJouleToJoule = 1000 | |||
MilliJouleToJoule = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This could be used to convert in either direction depending on the operator used. So it may be better to rename something like JouleMillijouleConversionFactor
🤷 naming is hard.
@@ -104,7 +104,7 @@ func (r *RatioPowerModel) GetPlatformPower(isIdlePower bool) ([]float64, error) | |||
} else { | |||
processPower = r.getPowerByRatio(processIdx, int(PlatformUsageMetric), int(PlatformDynPower), numProcesses) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may make more sense here to fix the return value of r.getPowerByRatio
given the float is immediately converted to uint64 here.
@@ -31,7 +31,7 @@ var ( | |||
var _ = Describe("Test Exponential Predictor Unit", func() { | |||
It("Get Node Platform Power By Exponential Regression", func() { | |||
powers := GetNodePlatformPowerFromDummyServer(dummyExponentialWeightHandler, types.ExponentialTrainer) | |||
Expect(int(powers[0])).Should(BeEquivalentTo(4)) | |||
Expect(int(powers[0]/1000) * 1000).Should(BeEquivalentTo(4000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may make sense to break this out into a test helper function with a comment on what it's doing then.
pkg/model/utils/utils.go
Outdated
@@ -21,7 +21,7 @@ import ( | |||
) | |||
|
|||
const ( | |||
jouleToMiliJoule = 1000 | |||
jouleToMilliJoule = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this use the same constant as eariler?
@dave-tucker Thank you so much for the review. [edit] add minor go-lint fix: here |
Signed-off-by: Sunyanan Choochotkaew <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR is to fix the issue regarding platform power reported in the slack channel as well as my direct experience in #1466.
I have found the different handling between component power and platform power in the model package.
Power model return power in joule unit in float. However, we handle the power in millijoule in Kepler.
Component power has been converted correctly by GetComponentPower in utils module. On the other hand, platform power is used as-is.
After fix:
As seen in the above validation, there are still more points to investigate or fix.
I have stressed 70% of CPU in a single CPU (it supposes to use 700ms).
The most critical part is the BPF CPU time. As seen above, the total CPU time is much higher than stress time.
However, as seen in the top two panel which are the power meters, the power is significantly increased at the stress time.
Another point need concern is the model that are being used. The default one is 4 cores which produces maximum power at around 200W. In the above result, I changed to use 32 cores model. We need to change default power model to the largest one on the estimator case.
Signed-off-by: Sunyanan Choochotkaew [email protected]