-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate part of spot check - simulated and historical data prep and plot by state #129
Conversation
|
||
|
||
def get_generation(s_info): | ||
def get_value_at(state, resource): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether this is necessary here, given it simply calls the get_generation
function via an instance of ScenarioInfo
class, which can be put into a one-liner.
to each (state, resource) pair to populate the resulting dataframe. This | ||
function is meant to contain the core logic of summarize_by_state and is | ||
parameterized to simplify testing. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the reason of breaking this up into parts is for simplicity of testing. However, maybe it is worth creating a Mock ScenarioInfo
object or improve the Mock Scenario
object with time series attributes such that it can be loaded into a ScenarioInfo
object, since ScenarioInfo
is widely used in the spot check integration script, which will make the tests in the future much easier once we have a mock for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was pondering the same thing. Think you're right about simplifying future code/testing, so I'll look into setting up a mock implementation. Leaning towards mocking ScenarioInfo
itself, so the tests for spot check logic are totally independent of ScenarioInfo
implementation, which should be tested separately against mock Scenario
or State
instances. At least, that's my current frame of mind, we'll see how it goes :)
Use state as a dict key if index is a smaller region (e.g. Texas East), | ||
otherwise use the given index. | ||
""" | ||
for state in ("Texas", "New Mexico", "Montana"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for handling this ugly setup :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this tuple 'states that are split across interconnections'? If so, do you think this would fit better as a named global variable instead of hardcoded within this variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but some examples from the raw data are "Montana Western", "Texas North Central", "Texas Far West" etc. Is that consistent with your description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. The TAMU network mostly follows state lines when approximating the borders between interconnections (even though small bits of some plains states stretch over the lines), I believe Montana, Texas, and New Mexico are the only exceptions.
|
||
|
||
def summarize_hist_gen(hist_gen_raw, all_resources): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we start to finalize this PR, we should come up with better names of these intermediate functions together with other parameter documentations.
width = 0.3 # width of bars | ||
fontsize = 15 | ||
size_inches = (20, 10) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Organizing these plot parameters in a class is clean which is something in my mind for a while. However, setting up these parameters with hard coded constants is tricky given they might not fit with all the plots including the future ones we may have, specifically here, different bar plots may have adaptive bar width according the total number of bars within a plot. Let's think about a better way to handle general cases later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I figured we'd need to do more here but left it as kind of a reminder.
postreise/plot/plot_sim_vs_hist.py
Outdated
size_inches = (20, 10) | ||
|
||
|
||
def plot_generation_sim_vs_hist_all(sim_gen, hist_gen, s_info): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether we need functions like this in the codebase. I think having a simple loop to call a function with parameters is acceptable in the user's script.
|
||
def calc_max(gen_type): | ||
with warnings.catch_warnings(): | ||
warnings.simplefilter("ignore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of warnings we are trying to simplify here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, it's from get_capacity
in ScenarioInfo
object. Basically, when we are trying to get capacities of a specific type of generation which does not exist in certain area, it returns 0 with warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't actually know that. We have suppression in the notebook, so I just switched it to use the context manager since that seemed right.
ax.set_xticks(x) | ||
ax.set_xticklabels(labels, fontsize=fontsize) | ||
ax.legend(fontsize=fontsize) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was implementing these functions, I simply wrote down the code that appeared on the top of my head. Not sure whether there are cleaner ways to set up these parameters when prepare a plot. If it looks bad to you, free feel to refactor :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know :) I'm pretty amateur at matplotlib so played it safe for now, but I'll come back to it and see what we can do.
ha="center", | ||
va="bottom", | ||
fontsize=8, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I simply found an example from StackoverFlow of labeling the bars in a bar chart and used it here.
interconnect = ["Western"] | ||
s_info = MockScenarioInfo() | ||
s_info.grid.interconnect = interconnect | ||
monkeypatch.setattr(s_info, "get_available_resource", mock_resource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've heard of it but never used it before. I presume monkeypatch
is defined in pytest.fixture
, which makes it easy to mock function returns in a mock object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone through this again together with the previous PR on MockScenarioInfo
, https://github.com/Breakthrough-Energy/PowerSimData/pull/227/files. I think the testing part is much cleaner with MockScenarioInfo
. Thanks for the effort. I will approve it once our file system is ready and I could check it with some integration tests. In the meantime, it would be great if we can get a second eye on this given it is the first PR on moving stuff in spot check into code base, which sets an example of merging plot features into PostREISE. We will refactor AnalyzePG in the same fashion.
) | ||
|
||
|
||
colname_map = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we correct the column names in the input CSV and avoid this dictionary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that'd be cleaner, not sure how the csv columns are generated though, or if it's part of some process. Probably doesn't hurt to have this kind of mapping but could move it somewhere more intentional if there are other usages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I don't remember where they come from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I brought this up long time ago to @danlivengood who creates these input CSVs. At that time the argument was that "names on the left (keys in colname_map
) make more sense for human reading, whereas names on the right (values in colname_map
) make more sense for computer reading (what we defined in the grid DataFrame)."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. The spot check script quickly grew into a beast, I'm sorry that you got tasked with cleaning up our mess.
I've seen much worse of a mess, glad to do it (it's also informative). One thing I noticed though is the plot fails on Texas since we have "Texas" in the index for both state and interconnect. I'll look at a fix for that before merging. |
Good catch, previously it's fine since we were summarizing simulated results via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice. Thanks.
Purpose - Move logic from the spot check notebook into the code base.
What it does - Add functions to prepare data for analysis and one of the plots, and refactored code where possible for readability and/or using built in pandas methods (hopefully idiomatically). There are some unit tests for the data manipulation logic but could probably add more.
The main additions, in terms of what I'd expect to be used by the notebook, are the following:
Time to review - 30 mins (rough estimate). For testing I compared results of the original notebook with these functions, to spot check (pun intended) that we get the same results numerically, visually, etc. The unit tests cover some basic assumptions as well but I think it's a good point to get user feedback, or try it on other scenarios before adding a bunch more test coverage. I'm pretty sure the docstrings can be improved but wasn't sure what best practices are so just left for code review.
Questions - Would it be helpful to pair review? I can demo my notebook used for testing if that's useful. And how do we update the notebook script once we want to start using these?