Factor out generation and curtailment plots #201

BainanXia · 2021-01-07T02:18:04Z

Purpose

This draft PR in its current form serves as a major effort for analyze_pg refactor (see #323). There are two plots we would like to keep and decouple from analyze_pg together with all the corresponding support calculations, such as input data preparation and formatting:

a. time series generation stack plot
b. time series curtailment plot

One major purpose of this draft PR is to setup an example/standard of plot function implementation in PostREISE so that the code could be useful in many situations, such as informal analysis, formal report generation and front end redistribution, to reduce duplicated efforts. This could be a long term iterative process.

What the code is doing

Time series generation stack plot and time series curtailment plot are decoupled from analyze_pg and implemented independently in plot_generation_ts_stack.py and plot_curtailment_ts.py. All support calculations are implemented in different functions and put into proper folders in the current architecture.

to-dos:

add tests for support functions once finalized.
implement time series curtailment plot in a similar way.
address comments and potentially conduct major refactor to make it compatible with front end tools.
update doc strings.
update readme.
retire analyze_pg officially

Testing

Integration tests are conducted in the corresponding demo notebooks. The new code generates exactly the same plots as we used in the seams study report.

Where to look

Almost everywhere

/postreise/analyze/demand.py
/postreise/analyze/helpers.py
/postreise/analyze/generation/summarize.py
/postreise/analyze/generation/capacity_value.py
/postreise/plot/plot_generation_ts_stack.py
/postreise/plot/plot_curtailment_ts.py
/postreise/plot/demo/plot_generation_time_series_stack_demo.ipynb
/postreise/plot/demo/plot_curtailment_time_series_demo.ipynb

plot_generation_ts_stack.py and plot_curtailment_ts.py contain the main body of the plot code.

Usage Example/Visuals

See plot_generation_time_series_stack_demo.ipynb and plot_curtailment_time_series_demo.ipynb

Time estimate

1 hour or more. We would like to dive into details of this PR to come up with a plot function implementation standard, which makes the code useful in multi situations.

danielolsen · 2021-01-07T02:25:50Z

Do we need a rebase past the merge-point of #198?

danielolsen · 2021-01-07T02:28:52Z

Which parts of this do you think are fairly settled, and which parts are most 'drafty' and you want feedback on design? Or is it a draft mostly because part (b) isn't done yet, but will be soon?

BainanXia · 2021-01-07T02:29:44Z

It is not affected since we don't use time series resample by mean here. Will do that once this is finalized for sure.

BainanXia · 2021-01-07T02:30:26Z

Everything except for the to-dos.

rouille · 2021-01-07T06:02:14Z

postreise/analyze/generation/curtailment.py

+                )
+    curtailment.rename(lambda x: x + "_curtailment", axis="columns", inplace=True)
+
+    return curtailment.clip(lower=0)


What does it do beyond calculate_curtailment_time_series_by_areas_and_resources?

I've looked into all these functions in details when I was working on the refactor. I was trying to use the existing functions as much as possible, but in some situations (like here), I chose not to bother. They general functionalities of these two functions are very similar. The differences are mostly in the format of inputs and outputs.

calculate... takes parameter area as a dict with keys specifying area types and returns the data in a nested dict which contains the time series curtailment for each plant by for every combination of resources and areas.

get... takes parameter area as a string and takes the advantages of area_to_loadzone function to adaptively identify the area_type. The user doesn't need to specify the area_type parameter in most of the cases. The only exceptions are when we have same name for different area types, such as Texas (could be state or interconnect) or North Carolina (could be loadzone or state), then the user could specify area_type parameter to get what he/she means to. Also, it simply returns the aggregated data frame for all curtailable resources in the area, which could feed directly into the plot function.

In particular for this plot, I would like to reduce the overhead of using this function from the user's perspective and in the mean time, separate the data processing from the plot function. Hence, the latter choice better satisfies these requirements.

postreise/plot/plot_generation_ts_stack.py

rouille · 2021-01-07T06:57:49Z

postreise/analyze/helpers.py

+        & (grid.plant["type"].isin(resource_set))
+    ].index.tolist()
+
+    return plant_id


I would say this function should be named get_plant_id_for_resources_in_areas to mirror the name of the other functions in this module.

Also, you can use _check_scenario_is_in_analyze_state and get_plant_id_for_resources_in_loadzones. The latter checks that the resources are in grid so you are good on the TypeError/ValueError of scenario and resources

So you should have something like:

_check_scenario_is_in_analyze_state(scenario) grid = scenario.state.get_grid() loadzones = area_to_loadzone(grid, area, area_type=area_type) plant_id = get_plant_id_for_resources_in_loadzones(resources, loadzones, grid) return plant_id

Agree with the renaming idea and the check on the analyze state of the input scenario. Regarding the checks on resources and loadzones:

resources: I was thinking about it when refactoring the code. The user does not necessarily know what resources are available in the given area, say Maine, when using the plot function. The most likely situation would be the user specifies a superset of resources and the resultant figure only shows the ones are available so that people will know which one is not available by checking the figure. I would like to avoid the case that the user specifies all resources at the very beginning and the code keeps popping errors until the resource list is modified correctly to match the available resources in the area.

loadzones: area_to_loadzone also validates the inputs but with less restrictive error checks, say, the scenario is a Western scenario, but the user can still set area to be Texas or MT, in which case the plot will show the data within the available loadzones of the scenario only instead of popping up an error, assuming the user is aware of only part of the area is available in WECC and is expecting to see the results in that part without specifying the exact area name, i.e. El Paso or Montana Western.

I don't think the user should be able to get a plot for Texas/Montana/New Mexico in the Western Interconnect. In theses cases the user should explicitly specify the load zones El Paso, Montana Western and New Mexico Western. Otherwise, the results will be misleading both in terms of generation level and resources in the area.I think we had this discussion in the past with @danielolsen

Right. I still remember our discussion before. With the current implementation, it works in both ways. We simply assume the user is aware of this issue and generate plots accordingly instead of throwing an error to force the user to make input parameters more accurate. Different from calculations, people can figure out issues like this easily from a plot.

I am not sure if we converged here but if you decide to stick with:

plant_id = grid.plant[ (grid.plant["zone_name"].isin(loadzone_set)) & (grid.plant["type"].isin(resource_set)) ].index.tolist()

I twould say you have to handle the error in the case (grid.plant["zone_name"].isin(loadzone_set)) & (grid.plant["type"].isin(resource_set)) is empty.

@rouille It will return an empty list in that case and can be properly handled by the rest of the code.

rouille · 2021-01-07T07:11:33Z

I have noticed that you don't use the check functions dedicated to validate the areas: interconnect, states, state abbreviation and load zones. These are extensively tested. You use the are_to_loadzone from PowerSimData. Would it be possible to conciliate the two?

BainanXia · 2021-01-07T19:08:27Z

See the comments above.

rouille · 2021-01-07T19:54:19Z

The area_to_loadzone function is tailored for the usa_tamu network. The 3 interconnects are hard coded within the function. The _check_areas_are_in_grid_and_format does assume we are using a US grid.

BainanXia · 2021-01-07T20:38:55Z

The area_to_loadzone is not tailored for the usa_tamu network. The hardcoded 3 interconnects are used only if the area_type is specified as interconnect or area is one of {Western, Eastern}. It can be used in any network with loadzones defined. In particular, if the usa_tamu is used, it can adaptively match states and interconnects as well.

danielolsen · 2021-01-07T20:47:31Z

Since 'interconnect' is a column that is saved in some of the tables of grid.mat files, should we assume that any grid has at least one interconnect? E.g. France would have UCTE or similar as the interconnect for every bus.

Western & Eastern seems like at least a little bit hardcoded... maybe we could look at the unique values of the interconnect column to see what strings should be interpreted as interconnects. Texas gets tricky, but maybe that means we should finally rename the Texas interconnect to ERCOT.

rouille · 2021-01-07T20:49:27Z

So if a Europe grid with let's say the interconnects are the countries and I called are_to_loadzone with are="France" and area_type="interconnect", it is going to work?

danielolsen · 2021-01-07T20:51:15Z

The interconnects are not countries. https://en.wikipedia.org/wiki/Synchronous_grid_of_Continental_Europe

BainanXia · 2021-01-07T20:51:50Z

With area_to_loadzone, if the user set area == "Western" and the network is usa_tamu, it works properly and if the network is France, it work throw an error saying invalid area. This is what we would like to see, right? It is implemented in the way that is compatible with usa_tamu rather than hardcoded things.

BainanXia · 2021-01-07T20:53:19Z

In this case, we can simply set area=='all'.

rouille · 2021-01-07T21:02:48Z

It was an example. It could be Western Europe, Southern Europe, Eastern Europe and Northern Europe instead of Eastern, Texas and Western.

danielolsen · 2021-01-07T21:06:23Z

I think it would be nice to have areas that are countries. Then we could simulate the continental Europe grid, but still select just things from e.g. France. But maybe this is outside the scope of this PR, for us to be adding hierarchy levels to areas.

It can get weird too because the ordering isn't necessarily fixed. E.g. in the US, we have interconnections smaller than countries, but in Europe it is the opposite.

BainanXia · 2021-01-07T21:11:21Z

One of the reason area_to_loadzone has hardcoded interconnects for usa_tamu is when dealing with a USA scenario, grid.interconnect gives ['USA'] instead of ['Texas', 'Western', 'Eastern'], however, USA is not an interconnect. Otherwise the hardcoded part can be replaced by grid.interconnect easily. Or it can be solved if we have a country column in the future.

rouille · 2021-01-07T21:12:03Z

What I see in area_to_loadzone:

elif area in {"Texas", "Western", "Eastern"}:
    loadzone_set = interconnect2loadzone[area]

so if area is an interconnect in Europe whose name is neither Texas,, Western and Eastern it wont return the load zones in the interconnect and raise the error the Invalid area error. So I disagree that this function is not tailored for usa_tamu or at least a US grid.

danielolsen · 2021-01-07T21:13:20Z

It would be nice to fix this, rather than design around it forever.

rouille · 2021-01-07T21:16:04Z

If it is not tailored for the usa_tamu grid it needs to be moved out of the powersimdata.network.usa_tamu.usa_tamu_model module.

postreise/plot/plot_curtailment_ts.py

postreise/plot/plot_generation_ts_stack.py

danielolsen · 2021-01-22T20:09:47Z

I've gone through all the code so far, the analysis functions are nice and modular and the plotting code is pretty straightforward. I added a few comments where I think things could be a little simpler/clearer, but overall I think this is looking very good.

postreise/analyze/generation/capacity_value.py

postreise/analyze/demand.py

BainanXia · 2021-01-28T19:00:01Z

@rouille All comments are addressed. Tests for new functions are added. This branch is rebased to the latest develop and commit history is cleaned up. Will update Readme after we retire analyze_pg completely. Is this a good point to coordinate with the website team and get their suggestions, or this should be handled in a new PR?

postreise/analyze/demand.py

postreise/analyze/generation/summarize.py

rouille

Looks great. Thanks a lot.

danielolsen · 2021-02-02T23:14:44Z

postreise/analyze/generation/capacity_value.py

+
+
+def get_capacity_by_resources(scenario, area, resources, area_type=None):
+    """Get total capacity value of certain resources in the specific area of a


I would rephrase "capacity value" as this might be misinterpreted as something besides the nameplate Pmax capacity

get_nameplate_capacity ?

I think the function name is fine, I was talking about the docstring. "capacity" by itself doesn't imply the same thing as "capacity value".

danielolsen · 2021-02-02T23:15:06Z

postreise/analyze/generation/capacity_value.py

+
+
+def get_storage_capacity(scenario, area, area_type=None):
+    """Get total storage capacity value in the specific area of a scenario.


Same comment here re "capacity value"

danielolsen · 2021-02-02T23:17:40Z

postreise/analyze/generation/summarize.py

+    return pg.groupby(grid.plant.loc[plant_id, "type"].values, axis=1).sum()
+
+
+def get_storage_time_series(scenario, area, area_type=None):


Should we call this something like get_storage_pg_time_series, or have another parameter that specifies whether we want storage_pg or storage_e? This could easily be in a different PR though, we don't need to hold this one up.

Done in #215.

danielolsen

This is a big improvement, thanks.

BainanXia added the visualization Issue related to visualization label Jan 7, 2021

BainanXia requested review from danielolsen, kasparm and rouille January 7, 2021 02:18

BainanXia self-assigned this Jan 7, 2021

BainanXia requested review from dmuldrew, merrielle and jenhagg January 7, 2021 02:19

rouille reviewed Jan 7, 2021

View reviewed changes

postreise/plot/plot_generation_ts_stack.py Outdated Show resolved Hide resolved

rouille reviewed Jan 7, 2021

View reviewed changes