-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factor out generation and curtailment plots #201
Conversation
Do we need a rebase past the merge-point of #198? |
Which parts of this do you think are fairly settled, and which parts are most 'drafty' and you want feedback on design? Or is it a draft mostly because part (b) isn't done yet, but will be soon? |
It is not affected since we don't use time series resample by mean here. Will do that once this is finalized for sure. |
Everything except for the to-dos. |
) | ||
curtailment.rename(lambda x: x + "_curtailment", axis="columns", inplace=True) | ||
|
||
return curtailment.clip(lower=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it do beyond calculate_curtailment_time_series_by_areas_and_resources
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked into all these functions in details when I was working on the refactor. I was trying to use the existing functions as much as possible, but in some situations (like here), I chose not to bother. They general functionalities of these two functions are very similar. The differences are mostly in the format of inputs and outputs.
calculate...
takes parameterarea
as a dict with keys specifying area types and returns the data in a nested dict which contains the time series curtailment for each plant by for every combination of resources and areas.get...
takes parameterarea
as a string and takes the advantages ofarea_to_loadzone
function to adaptively identify thearea_type
. The user doesn't need to specify thearea_type
parameter in most of the cases. The only exceptions are when we have same name for different area types, such asTexas
(could be state or interconnect) orNorth Carolina
(could be loadzone or state), then the user could specifyarea_type
parameter to get what he/she means to. Also, it simply returns the aggregated data frame for all curtailable resources in the area, which could feed directly into the plot function.
In particular for this plot, I would like to reduce the overhead of using this function from the user's perspective and in the mean time, separate the data processing from the plot function. Hence, the latter choice better satisfies these requirements.
& (grid.plant["type"].isin(resource_set)) | ||
].index.tolist() | ||
|
||
return plant_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say this function should be named get_plant_id_for_resources_in_areas
to mirror the name of the other functions in this module.
Also, you can use _check_scenario_is_in_analyze_state
and get_plant_id_for_resources_in_loadzones
. The latter checks that the resources are in grid so you are good on the TypeError/ValueError of scenario
and resources
So you should have something like:
_check_scenario_is_in_analyze_state(scenario)
grid = scenario.state.get_grid()
loadzones = area_to_loadzone(grid, area, area_type=area_type)
plant_id = get_plant_id_for_resources_in_loadzones(resources, loadzones, grid)
return plant_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with the renaming idea and the check on the analyze
state of the input scenario. Regarding the checks on resources and loadzones:
- resources: I was thinking about it when refactoring the code. The user does not necessarily know what resources are available in the given area, say Maine, when using the plot function. The most likely situation would be the user specifies a superset of resources and the resultant figure only shows the ones are available so that people will know which one is not available by checking the figure. I would like to avoid the case that the user specifies all resources at the very beginning and the code keeps popping errors until the resource list is modified correctly to match the available resources in the area.
- loadzones:
area_to_loadzone
also validates the inputs but with less restrictive error checks, say, the scenario is a Western scenario, but the user can still set area to beTexas
orMT
, in which case the plot will show the data within the available loadzones of the scenario only instead of popping up an error, assuming the user is aware of only part of the area is available in WECC and is expecting to see the results in that part without specifying the exact area name, i.e.El Paso
orMontana Western
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the user should be able to get a plot for Texas/Montana/New Mexico in the Western Interconnect. In theses cases the user should explicitly specify the load zones El Paso, Montana Western and New Mexico Western. Otherwise, the results will be misleading both in terms of generation level and resources in the area.I think we had this discussion in the past with @danielolsen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I still remember our discussion before. With the current implementation, it works in both ways. We simply assume the user is aware of this issue and generate plots accordingly instead of throwing an error to force the user to make input parameters more accurate. Different from calculations, people can figure out issues like this easily from a plot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we converged here but if you decide to stick with:
plant_id = grid.plant[
(grid.plant["zone_name"].isin(loadzone_set))
& (grid.plant["type"].isin(resource_set))
].index.tolist()
I twould say you have to handle the error in the case (grid.plant["zone_name"].isin(loadzone_set)) & (grid.plant["type"].isin(resource_set))
is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rouille It will return an empty list in that case and can be properly handled by the rest of the code.
I have noticed that you don't use the check functions dedicated to validate the areas: interconnect, states, state abbreviation and load zones. These are extensively tested. You use the are_to_loadzone from PowerSimData. Would it be possible to conciliate the two? |
See the comments above. |
The |
The |
Since 'interconnect' is a column that is saved in some of the tables of
|
So if a Europe grid with let's say the interconnects are the countries and I called |
The interconnects are not countries. https://en.wikipedia.org/wiki/Synchronous_grid_of_Continental_Europe |
With |
In this case, we can simply set |
It was an example. It could be Western Europe, Southern Europe, Eastern Europe and Northern Europe instead of Eastern, Texas and Western. |
I think it would be nice to have areas that are countries. Then we could simulate the continental Europe grid, but still select just things from e.g. France. But maybe this is outside the scope of this PR, for us to be adding hierarchy levels to areas. It can get weird too because the ordering isn't necessarily fixed. E.g. in the US, we have interconnections smaller than countries, but in Europe it is the opposite. |
One of the reason |
What I see in
so if area is an interconnect in Europe whose name is neither Texas,, Western and Eastern it wont return the load zones in the interconnect and raise the error the |
It would be nice to fix this, rather than design around it forever. |
If it is not tailored for the usa_tamu grid it needs to be moved out of the |
I've gone through all the code so far, the analysis functions are nice and modular and the plotting code is pretty straightforward. I added a few comments where I think things could be a little simpler/clearer, but overall I think this is looking very good. |
b05cd58
to
074dac4
Compare
@rouille All comments are addressed. Tests for new functions are added. This branch is rebased to the latest |
2ac18be
to
5e2f2fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks a lot.
|
||
|
||
def get_capacity_by_resources(scenario, area, resources, area_type=None): | ||
"""Get total capacity value of certain resources in the specific area of a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rephrase "capacity value" as this might be misinterpreted as something besides the nameplate Pmax capacity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_nameplate_capacity
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the function name is fine, I was talking about the docstring. "capacity" by itself doesn't imply the same thing as "capacity value".
|
||
|
||
def get_storage_capacity(scenario, area, area_type=None): | ||
"""Get total storage capacity value in the specific area of a scenario. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here re "capacity value"
return pg.groupby(grid.plant.loc[plant_id, "type"].values, axis=1).sum() | ||
|
||
|
||
def get_storage_time_series(scenario, area, area_type=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call this something like get_storage_pg_time_series
, or have another parameter that specifies whether we want storage_pg
or storage_e
? This could easily be in a different PR though, we don't need to hold this one up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in #215.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big improvement, thanks.
Pull Request Etiquette doc
Purpose
This draft PR in its current form serves as a major effort for analyze_pg refactor (see #323). There are two plots we would like to keep and decouple from
analyze_pg
together with all the corresponding support calculations, such as input data preparation and formatting:a. time series generation stack plot
b. time series curtailment plot
One major purpose of this draft PR is to setup an example/standard of plot function implementation in PostREISE so that the code could be useful in many situations, such as informal analysis, formal report generation and front end redistribution, to reduce duplicated efforts. This could be a long term iterative process.
What the code is doing
Time series generation stack plot and time series curtailment plot are decoupled from
analyze_pg
and implemented independently inplot_generation_ts_stack.py
andplot_curtailment_ts.py.
All support calculations are implemented in different functions and put into proper folders in the current architecture.to-dos:
analyze_pg
officiallyTesting
Integration tests are conducted in the corresponding demo notebooks. The new code generates exactly the same plots as we used in the seams study report.
Where to look
Almost everywhere
plot_generation_ts_stack.py
andplot_curtailment_ts.py
contain the main body of the plot code.Usage Example/Visuals
See
plot_generation_time_series_stack_demo.ipynb
andplot_curtailment_time_series_demo.ipynb
Time estimate
1 hour or more. We would like to dive into details of this PR to come up with a plot function implementation standard, which makes the code useful in multi situations.