Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor out generation and curtailment plots #201

Merged
merged 11 commits into from
Feb 2, 2021

Conversation

BainanXia
Copy link
Collaborator

@BainanXia BainanXia commented Jan 7, 2021

Pull Request Etiquette doc

Purpose

This draft PR in its current form serves as a major effort for analyze_pg refactor (see #323). There are two plots we would like to keep and decouple from analyze_pg together with all the corresponding support calculations, such as input data preparation and formatting:

a. time series generation stack plot
b. time series curtailment plot

One major purpose of this draft PR is to setup an example/standard of plot function implementation in PostREISE so that the code could be useful in many situations, such as informal analysis, formal report generation and front end redistribution, to reduce duplicated efforts. This could be a long term iterative process.

What the code is doing

Time series generation stack plot and time series curtailment plot are decoupled from analyze_pg and implemented independently in plot_generation_ts_stack.py and plot_curtailment_ts.py. All support calculations are implemented in different functions and put into proper folders in the current architecture.

to-dos:

  • add tests for support functions once finalized.
  • implement time series curtailment plot in a similar way.
  • address comments and potentially conduct major refactor to make it compatible with front end tools.
  • update doc strings.
  • update readme.
  • retire analyze_pg officially

Testing

Integration tests are conducted in the corresponding demo notebooks. The new code generates exactly the same plots as we used in the seams study report.

Where to look

Almost everywhere

/postreise/analyze/demand.py
/postreise/analyze/helpers.py
/postreise/analyze/generation/summarize.py
/postreise/analyze/generation/capacity_value.py
/postreise/plot/plot_generation_ts_stack.py
/postreise/plot/plot_curtailment_ts.py
/postreise/plot/demo/plot_generation_time_series_stack_demo.ipynb
/postreise/plot/demo/plot_curtailment_time_series_demo.ipynb

plot_generation_ts_stack.py and plot_curtailment_ts.py contain the main body of the plot code.

Usage Example/Visuals

See plot_generation_time_series_stack_demo.ipynb and plot_curtailment_time_series_demo.ipynb

Time estimate

1 hour or more. We would like to dive into details of this PR to come up with a plot function implementation standard, which makes the code useful in multi situations.

@BainanXia BainanXia added the visualization Issue related to visualization label Jan 7, 2021
@BainanXia BainanXia self-assigned this Jan 7, 2021
@danielolsen
Copy link
Contributor

Do we need a rebase past the merge-point of #198?

@danielolsen
Copy link
Contributor

Which parts of this do you think are fairly settled, and which parts are most 'drafty' and you want feedback on design? Or is it a draft mostly because part (b) isn't done yet, but will be soon?

@BainanXia
Copy link
Collaborator Author

Do we need a rebase past the merge-point of #198?

It is not affected since we don't use time series resample by mean here. Will do that once this is finalized for sure.

@BainanXia
Copy link
Collaborator Author

Which parts of this do you think are fairly settled, and which parts are most 'drafty' and you want feedback on design? Or is it a draft mostly because part (b) isn't done yet, but will be soon?

Everything except for the to-dos.

)
curtailment.rename(lambda x: x + "_curtailment", axis="columns", inplace=True)

return curtailment.clip(lower=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it do beyond calculate_curtailment_time_series_by_areas_and_resources?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked into all these functions in details when I was working on the refactor. I was trying to use the existing functions as much as possible, but in some situations (like here), I chose not to bother. They general functionalities of these two functions are very similar. The differences are mostly in the format of inputs and outputs.

  • calculate... takes parameter area as a dict with keys specifying area types and returns the data in a nested dict which contains the time series curtailment for each plant by for every combination of resources and areas.
  • get... takes parameter area as a string and takes the advantages of area_to_loadzone function to adaptively identify the area_type. The user doesn't need to specify the area_type parameter in most of the cases. The only exceptions are when we have same name for different area types, such as Texas (could be state or interconnect) or North Carolina (could be loadzone or state), then the user could specify area_type parameter to get what he/she means to. Also, it simply returns the aggregated data frame for all curtailable resources in the area, which could feed directly into the plot function.

In particular for this plot, I would like to reduce the overhead of using this function from the user's perspective and in the mean time, separate the data processing from the plot function. Hence, the latter choice better satisfies these requirements.

& (grid.plant["type"].isin(resource_set))
].index.tolist()

return plant_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say this function should be named get_plant_id_for_resources_in_areas to mirror the name of the other functions in this module.

Also, you can use _check_scenario_is_in_analyze_state and get_plant_id_for_resources_in_loadzones. The latter checks that the resources are in grid so you are good on the TypeError/ValueError of scenario and resources

So you should have something like:

_check_scenario_is_in_analyze_state(scenario)
grid = scenario.state.get_grid()
loadzones = area_to_loadzone(grid, area, area_type=area_type)
plant_id = get_plant_id_for_resources_in_loadzones(resources, loadzones, grid)

return plant_id

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the renaming idea and the check on the analyze state of the input scenario. Regarding the checks on resources and loadzones:

  • resources: I was thinking about it when refactoring the code. The user does not necessarily know what resources are available in the given area, say Maine, when using the plot function. The most likely situation would be the user specifies a superset of resources and the resultant figure only shows the ones are available so that people will know which one is not available by checking the figure. I would like to avoid the case that the user specifies all resources at the very beginning and the code keeps popping errors until the resource list is modified correctly to match the available resources in the area.
  • loadzones: area_to_loadzone also validates the inputs but with less restrictive error checks, say, the scenario is a Western scenario, but the user can still set area to be Texas or MT, in which case the plot will show the data within the available loadzones of the scenario only instead of popping up an error, assuming the user is aware of only part of the area is available in WECC and is expecting to see the results in that part without specifying the exact area name, i.e. El Paso or Montana Western.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the user should be able to get a plot for Texas/Montana/New Mexico in the Western Interconnect. In theses cases the user should explicitly specify the load zones El Paso, Montana Western and New Mexico Western. Otherwise, the results will be misleading both in terms of generation level and resources in the area.I think we had this discussion in the past with @danielolsen

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I still remember our discussion before. With the current implementation, it works in both ways. We simply assume the user is aware of this issue and generate plots accordingly instead of throwing an error to force the user to make input parameters more accurate. Different from calculations, people can figure out issues like this easily from a plot.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we converged here but if you decide to stick with:

   plant_id = grid.plant[
        (grid.plant["zone_name"].isin(loadzone_set))
        & (grid.plant["type"].isin(resource_set))
    ].index.tolist()

I twould say you have to handle the error in the case (grid.plant["zone_name"].isin(loadzone_set)) & (grid.plant["type"].isin(resource_set)) is empty.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rouille It will return an empty list in that case and can be properly handled by the rest of the code.

@rouille
Copy link
Collaborator

rouille commented Jan 7, 2021

I have noticed that you don't use the check functions dedicated to validate the areas: interconnect, states, state abbreviation and load zones. These are extensively tested. You use the are_to_loadzone from PowerSimData. Would it be possible to conciliate the two?

@BainanXia
Copy link
Collaborator Author

I have noticed that you don't use the check functions dedicated to validate the areas: interconnect, states, state abbreviation and load zones. These are extensively tested. You use the are_to_loadzone from PowerSimData. Would it be possible to conciliate the two?

See the comments above.

@rouille
Copy link
Collaborator

rouille commented Jan 7, 2021

I have noticed that you don't use the check functions dedicated to validate the areas: interconnect, states, state abbreviation and load zones. These are extensively tested. You use the are_to_loadzone from PowerSimData. Would it be possible to conciliate the two?

See the comments above.

The area_to_loadzone function is tailored for the usa_tamu network. The 3 interconnects are hard coded within the function. The _check_areas_are_in_grid_and_format does assume we are using a US grid.

@BainanXia
Copy link
Collaborator Author

I have noticed that you don't use the check functions dedicated to validate the areas: interconnect, states, state abbreviation and load zones. These are extensively tested. You use the are_to_loadzone from PowerSimData. Would it be possible to conciliate the two?

See the comments above.

The area_to_loadzone function is tailored for the usa_tamu network. The 3 interconnects are hard coded within the function. The _check_areas_are_in_grid_and_format does assume we are using a US grid.

The area_to_loadzone is not tailored for the usa_tamu network. The hardcoded 3 interconnects are used only if the area_type is specified as interconnect or area is one of {Western, Eastern}. It can be used in any network with loadzones defined. In particular, if the usa_tamu is used, it can adaptively match states and interconnects as well.

@danielolsen
Copy link
Contributor

I have noticed that you don't use the check functions dedicated to validate the areas: interconnect, states, state abbreviation and load zones. These are extensively tested. You use the are_to_loadzone from PowerSimData. Would it be possible to conciliate the two?

See the comments above.

The area_to_loadzone function is tailored for the usa_tamu network. The 3 interconnects are hard coded within the function. The _check_areas_are_in_grid_and_format does assume we are using a US grid.

The area_to_loadzone is not tailored for the usa_tamu network. The hardcoded 3 interconnects are used only if the area_type is specified as interconnect or area is one of {Western, Eastern}. It can be used in any network with loadzones defined. In particular, if the usa_tamu is used, it can adaptively match states and interconnects as well.

Since 'interconnect' is a column that is saved in some of the tables of grid.mat files, should we assume that any grid has at least one interconnect? E.g. France would have UCTE or similar as the interconnect for every bus.

Western & Eastern seems like at least a little bit hardcoded... maybe we could look at the unique values of the interconnect column to see what strings should be interpreted as interconnects. Texas gets tricky, but maybe that means we should finally rename the Texas interconnect to ERCOT.

@rouille
Copy link
Collaborator

rouille commented Jan 7, 2021

I have noticed that you don't use the check functions dedicated to validate the areas: interconnect, states, state abbreviation and load zones. These are extensively tested. You use the are_to_loadzone from PowerSimData. Would it be possible to conciliate the two?

See the comments above.

The area_to_loadzone function is tailored for the usa_tamu network. The 3 interconnects are hard coded within the function. The _check_areas_are_in_grid_and_format does assume we are using a US grid.

The area_to_loadzone is not tailored for the usa_tamu network. The hardcoded 3 interconnects are used only if the area_type is specified as interconnect or area is one of {Western, Eastern}. It can be used in any network with loadzones defined. In particular, if the usa_tamu is used, it can adaptively match states and interconnects as well.

So if a Europe grid with let's say the interconnects are the countries and I called are_to_loadzone with are="France" and area_type="interconnect", it is going to work?

@danielolsen
Copy link
Contributor

I have noticed that you don't use the check functions dedicated to validate the areas: interconnect, states, state abbreviation and load zones. These are extensively tested. You use the are_to_loadzone from PowerSimData. Would it be possible to conciliate the two?

See the comments above.

The area_to_loadzone function is tailored for the usa_tamu network. The 3 interconnects are hard coded within the function. The _check_areas_are_in_grid_and_format does assume we are using a US grid.

The area_to_loadzone is not tailored for the usa_tamu network. The hardcoded 3 interconnects are used only if the area_type is specified as interconnect or area is one of {Western, Eastern}. It can be used in any network with loadzones defined. In particular, if the usa_tamu is used, it can adaptively match states and interconnects as well.

So if a Europe grid with let's say the interconnects are the countries and I called are_to_loadzone with are="France" and area_type="interconnect", it is going to work?

The interconnects are not countries. https://en.wikipedia.org/wiki/Synchronous_grid_of_Continental_Europe

@BainanXia
Copy link
Collaborator Author

I have noticed that you don't use the check functions dedicated to validate the areas: interconnect, states, state abbreviation and load zones. These are extensively tested. You use the are_to_loadzone from PowerSimData. Would it be possible to conciliate the two?

See the comments above.

The area_to_loadzone function is tailored for the usa_tamu network. The 3 interconnects are hard coded within the function. The _check_areas_are_in_grid_and_format does assume we are using a US grid.

The area_to_loadzone is not tailored for the usa_tamu network. The hardcoded 3 interconnects are used only if the area_type is specified as interconnect or area is one of {Western, Eastern}. It can be used in any network with loadzones defined. In particular, if the usa_tamu is used, it can adaptively match states and interconnects as well.

Since 'interconnect' is a column that is saved in some of the tables of grid.mat files, should we assume that any grid has at least one interconnect? E.g. France would have UCTE or similar as the interconnect for every bus.

Western & Eastern seems like at least a little bit hardcoded... maybe we could look at the unique values of the interconnect column to see what strings should be interpreted as interconnects. Texas gets tricky, but maybe that means we should finally rename the Texas interconnect to ERCOT.

With area_to_loadzone, if the user set area == "Western" and the network is usa_tamu, it works properly and if the network is France, it work throw an error saying invalid area. This is what we would like to see, right? It is implemented in the way that is compatible with usa_tamu rather than hardcoded things.

@BainanXia
Copy link
Collaborator Author

I have noticed that you don't use the check functions dedicated to validate the areas: interconnect, states, state abbreviation and load zones. These are extensively tested. You use the are_to_loadzone from PowerSimData. Would it be possible to conciliate the two?

See the comments above.

The area_to_loadzone function is tailored for the usa_tamu network. The 3 interconnects are hard coded within the function. The _check_areas_are_in_grid_and_format does assume we are using a US grid.

The area_to_loadzone is not tailored for the usa_tamu network. The hardcoded 3 interconnects are used only if the area_type is specified as interconnect or area is one of {Western, Eastern}. It can be used in any network with loadzones defined. In particular, if the usa_tamu is used, it can adaptively match states and interconnects as well.

So if a Europe grid with let's say the interconnects are the countries and I called are_to_loadzone with are="France" and area_type="interconnect", it is going to work?

In this case, we can simply set area=='all'.

@rouille
Copy link
Collaborator

rouille commented Jan 7, 2021

The interconnects are not countries. https://en.wikipedia.org/wiki/Synchronous_grid_of_Continental_Europe

It was an example. It could be Western Europe, Southern Europe, Eastern Europe and Northern Europe instead of Eastern, Texas and Western.

@danielolsen
Copy link
Contributor

danielolsen commented Jan 7, 2021

I think it would be nice to have areas that are countries. Then we could simulate the continental Europe grid, but still select just things from e.g. France. But maybe this is outside the scope of this PR, for us to be adding hierarchy levels to areas.

It can get weird too because the ordering isn't necessarily fixed. E.g. in the US, we have interconnections smaller than countries, but in Europe it is the opposite.

@BainanXia
Copy link
Collaborator Author

The interconnects are not countries. https://en.wikipedia.org/wiki/Synchronous_grid_of_Continental_Europe

It was an example. It could be Western Europe, Southern Europe, Eastern Europe and Northern Europe instead of Eastern, Texas and Western.

One of the reason area_to_loadzone has hardcoded interconnects for usa_tamu is when dealing with a USA scenario, grid.interconnect gives ['USA'] instead of ['Texas', 'Western', 'Eastern'], however, USA is not an interconnect. Otherwise the hardcoded part can be replaced by grid.interconnect easily. Or it can be solved if we have a country column in the future.

@rouille
Copy link
Collaborator

rouille commented Jan 7, 2021

I have noticed that you don't use the check functions dedicated to validate the areas: interconnect, states, state abbreviation and load zones. These are extensively tested. You use the are_to_loadzone from PowerSimData. Would it be possible to conciliate the two?

See the comments above.

The area_to_loadzone function is tailored for the usa_tamu network. The 3 interconnects are hard coded within the function. The _check_areas_are_in_grid_and_format does assume we are using a US grid.

The area_to_loadzone is not tailored for the usa_tamu network. The hardcoded 3 interconnects are used only if the area_type is specified as interconnect or area is one of {Western, Eastern}. It can be used in any network with loadzones defined. In particular, if the usa_tamu is used, it can adaptively match states and interconnects as well.

Since 'interconnect' is a column that is saved in some of the tables of grid.mat files, should we assume that any grid has at least one interconnect? E.g. France would have UCTE or similar as the interconnect for every bus.
Western & Eastern seems like at least a little bit hardcoded... maybe we could look at the unique values of the interconnect column to see what strings should be interpreted as interconnects. Texas gets tricky, but maybe that means we should finally rename the Texas interconnect to ERCOT.

With area_to_loadzone, if the user set area == "Western" and the network is usa_tamu, it works properly and if the network is France, it work throw an error saying invalid area. This is what we would like to see, right? It is implemented in the way that is compatible with usa_tamu rather than hardcoded things.

What I see in area_to_loadzone:

elif area in {"Texas", "Western", "Eastern"}:
    loadzone_set = interconnect2loadzone[area]

so if area is an interconnect in Europe whose name is neither Texas,, Western and Eastern it wont return the load zones in the interconnect and raise the error the Invalid area error. So I disagree that this function is not tailored for usa_tamu or at least a US grid.

@danielolsen
Copy link
Contributor

One of the reason area_to_loadzone has hardcoded interconnects for usa_tamu is when dealing with a USA scenario, grid.interconnect gives ['USA'] instead of ['Texas', 'Western', 'Eastern'], however, USA is not an interconnect.

It would be nice to fix this, rather than design around it forever.

@rouille
Copy link
Collaborator

rouille commented Jan 7, 2021

One of the reason area_to_loadzone has hardcoded interconnects for usa_tamu is when dealing with a USA scenario, grid.interconnect gives ['USA'] instead of ['Texas', 'Western', 'Eastern'], however, USA is not an interconnect.

It would be nice to fix this, rather than design around it forever.

If it is not tailored for the usa_tamu grid it needs to be moved out of the powersimdata.network.usa_tamu.usa_tamu_model module.

@danielolsen
Copy link
Contributor

I've gone through all the code so far, the analysis functions are nice and modular and the plotting code is pretty straightforward. I added a few comments where I think things could be a little simpler/clearer, but overall I think this is looking very good.

@BainanXia
Copy link
Collaborator Author

BainanXia commented Jan 28, 2021

@rouille All comments are addressed. Tests for new functions are added. This branch is rebased to the latest develop and commit history is cleaned up. Will update Readme after we retire analyze_pg completely. Is this a good point to coordinate with the website team and get their suggestions, or this should be handled in a new PR?

postreise/analyze/demand.py Outdated Show resolved Hide resolved
postreise/analyze/demand.py Outdated Show resolved Hide resolved
@BainanXia BainanXia force-pushed the bainan/analyze_pg_refactor branch from 2ac18be to 5e2f2fb Compare February 2, 2021 21:27
Copy link
Collaborator

@rouille rouille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks a lot.

@BainanXia BainanXia merged commit db2b940 into develop Feb 2, 2021
@BainanXia BainanXia deleted the bainan/analyze_pg_refactor branch February 2, 2021 22:07


def get_capacity_by_resources(scenario, area, resources, area_type=None):
"""Get total capacity value of certain resources in the specific area of a
Copy link
Contributor

@danielolsen danielolsen Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rephrase "capacity value" as this might be misinterpreted as something besides the nameplate Pmax capacity

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_nameplate_capacity ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function name is fine, I was talking about the docstring. "capacity" by itself doesn't imply the same thing as "capacity value".



def get_storage_capacity(scenario, area, area_type=None):
"""Get total storage capacity value in the specific area of a scenario.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here re "capacity value"

return pg.groupby(grid.plant.loc[plant_id, "type"].values, axis=1).sum()


def get_storage_time_series(scenario, area, area_type=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this something like get_storage_pg_time_series, or have another parameter that specifies whether we want storage_pg or storage_e? This could easily be in a different PR though, we don't need to hold this one up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #215.

Copy link
Contributor

@danielolsen danielolsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big improvement, thanks.

@ahurli ahurli mentioned this pull request Mar 16, 2021
@rouille rouille changed the title analyze_pg refactor Factor out generation and curtailment plots Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
visualization Issue related to visualization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants