-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] CarbonEmissionMonitor
#148
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work! I went through files in detail except for the test file.
CarbonEmissionMonitor
@@ -54,7 +62,7 @@ def get_current_carbon_intensity(self) -> float: | |||
pass | |||
|
|||
@abc.abstractmethod | |||
def get_recent_carbon_intensity(self) -> dict[str, float]: | |||
def get_recent_carbon_intensity(self) -> dict[datetime, float]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: When I get a dictionary that maps 2024 December 18th 10:33 PM ET as the key to a certain carbon intensity number, what does that mean? Carbon intensity of the hour that includes 2024 December 18th 10:33 PM ET, i.e., 2024 December 18th 10:00 - 10:59 PM ET? The duration and boundary of the timeframe that corresponds to each carbon intensity value is unclear from this API contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe in that example, the duration is from 2024 December 18th 10:00 - 10:59 PM ET.
If I make a fetch for live carbon intensity at 2024 December 18th 10:33 PM ET, the datetime recorded in the response is 2024 December 18th 10:00 ET. This will continue to be the live carbon intensity until the current datetime becomes 2024 December 18th 11:00 ET.
There shouldn't be a situation where 2024 December 18th 10:33 PM ET is a key in the map, the dictionary only includes keys of "whole hours" (2024 December 18th 10:00, 2024 December 18th 11:00, 2024 December 18th 12:00, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so for ElectricityMaps, a datetime with 10 PM maps to the carbon intensity value of 10 PM to 11 PM. And we're doing all this back and forth because this is ambiguous in the API contract. What if someone implemented a new carbon intensity client that return 11 PM to mean the carbon intensity value for 10 PM to 11 PM? Or, what if another service has a granularity of 30 minutes, and returns dictionaries with keys 11:00 PM, 11:30 PM, 12:00 AM, etc.? Downstream calculations will be wrong inside the polling process will be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, ok this makes sense. I think what we can do is have each specific carbon intensity class store an attribute with the polling frequency and we can pass that to polling process. As for time ranges, would most simple be having the carbon intensity providers map the beginning of each window with the carbon intensity?
For example, if the granularity of updates for the carbon intensity provider is 30 minutes, carbon intensity provider will provider:
{10:00PM -> 300, 10:30PM->345, 11:00PM->343}, where 300 is the carbon intensity for 10:00-10:30PM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree that making "the datetime key must mark the beginning of the period" part of the contract is the way to go. And for the granularity, I can think of something like this:
@property
@abstractmethod
def update_period(self) -> int:
"""How long each carbon intensity value in the history dict remains current, in seconds."""
return 3600
Or, datetime.timedelta(hours=1)
should also work. That might actually be easier in some places because we already have datetime.datetime
objects around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question-- should we also set up a similar method that can return how often you need to fetch from the API? ElectricityMaps needs to be fetched every 23 hours because thats how far back they give recent carbon intensity values for but other APIs might be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps that can be inferred from calling len
on the return value of get_recent_carbon_intensity
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up creating a similar @property
method. I was thinking that this could save 1 API request and allows for the programmer to have finer control over the polling frequency from APIs.
zeus/monitor/carbon.py
Outdated
for dt, measurement_map in energy_measurements[key].items(): | ||
for index, energy in measurement_map.items(): | ||
hardware_type, num_index = index.split("_") | ||
|
||
for hour_key, carbon_intensity in carbon_intensities.items(): | ||
for gpu_index in zeus_monitor.gpu_indices: | ||
# divide by 3600 to convert joules -> Wh | ||
gpu_carbon_emissions[key][gpu_index] += ( | ||
energy_measurements[key][hour_key][f"gpu_{gpu_index}"] | ||
/ 3600 | ||
* carbon_intensity | ||
) | ||
|
||
for cpu_index in zeus_monitor.cpu_indices: | ||
# divide by 3600 to convert joules -> Wh | ||
cpu_carbon_emissions[key][cpu_index] += ( | ||
energy_measurements[key][hour_key][f"cpu_{cpu_index}"] | ||
/ 3600 | ||
* carbon_intensity | ||
) | ||
if hardware_type == "gpu": | ||
gpu_carbon_emissions[key][int(num_index)] += ( | ||
energy / 3.6e6 * carbon_intensities[dt] | ||
) | ||
elif hardware_type == "cpu": | ||
cpu_carbon_emissions[key][int(num_index)] += ( | ||
energy / 3.6e6 * carbon_intensities[dt] | ||
) | ||
elif hardware_type == "dram": | ||
dram_carbon_emissions[key][int(num_index)] += ( | ||
energy / 3.6e6 * carbon_intensities[dt] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, depending on the meaning of the datetime
object part of the CarbonIntensityProvider
API contract, this may or may not be correct. Here, for a datetime
key in energy_measurements
(this datetime
object is from hour_floor
, so it's probably the energy consumed by the previous one hour) is accessed. This is matched with the carbon intensity value of the carbon_intensities
corresponding to the same datetime
object used to access energy_measurements
, meaning that it the carbon intensity value accessed must be for the past hour as well. This makes sense, but it is not apparent from the API contract.
Assuming that I understood things correctly, for the purposes of this PR, I think documenting this assumption in the CarbonIntensityProvider
method is sufficient. But, in an ideal implementation, what the CarbonIntensityProvider
API should provide is a dictionary that maps time ranges to carbon intensity values and the polling process should handle that correspondingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my code, hour_floor is used as the key and denotes the beginning of the windows of energy measurements. At the end of that hour, the ZeusMonitor.end_window()
for each active window is called and the measurement is record with the hour_floor used at the beginning of the window. Example: 4PM is hour_floor and when ZeusMonitor.begin_window()
are called for every active window. At 5PM, ZeusMonitor.end_window()
is called for every active window, but each measurement collected is recorded into energy_measurements
with the key 4PM. Hence, wouldn’t using hour_floor as the key mean each datetime key refers to the energy consumed within that hour and not the energy consumed with the previous hour? (i.e. if datetime key was 4PM in energy_measurements
, energy consumed is measured in the time range from 4PM to 5PM and not from 3PM to 4PM?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, together with how ElectricityMaps return their carbon intensity data, then this would be correct. My point about API contract stands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautifully done! Thanks again for your contribution.
This pull requests implements the CarbonEmissionMonitor that measures carbon emission. The class features the same methods exposes by ZeusMonitor and is built with a single polling process retrieving energy consumption measurements from ZeusMonitor and carbon intensity measurements from ElectricityMaps.
This pull request also includes tests that test the following scenarios: