-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create functions that operate on indices of time series #165
Conversation
postreise/analyze/check.py
Outdated
@@ -220,6 +221,24 @@ def _check_date_range(scenario, start, end): | |||
raise ValueError("Must have scenario_start <= start <= end <= scenario_end") | |||
|
|||
|
|||
def _check_date_range_in_time_series(df, start, end): | |||
"""Check if start time and endtime define a valid time range of the given scenario. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the different between this function and the one above: _check_date_range_in_scenario
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it would be better to have two separate functions instead of a combine one that does a bunch of if
/else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that we have inconsistent time range between scenario.info and the corresponding date frame, i.e. scenario.info['start_date']!=df.index[0], scenario.info['end_date']!=df.index[-1]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no reference to a Scenario
instance in _check_date_range_in_time_series
. In principle, the check will for any time series (I realize that I need to raise an error if the indices are not a DatetimeIndex
, i,e, df
is not a time series). However, I believe the input time series will always be input and output data of a scenario retrieved using an Analyze
object (PG, LMP, etc.)
This is perhaps outside of the scope of this PR and more suitable for a followup, but do we want these new functions to be able to operate on pandas.Series objects as well? They also have a resample method: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.resample.html |
Another outside-the-scope comment: so far this is always doing summation during resampling, but we may also want to find the mean instead, which if we resample by month isn't as simple as dividing by a constant, the way it is for days and weeks. |
@danielolsen, when implementing the mean aggregation method for the resampling, I had to make a choice for the weekly frequency. Weeks start on Sunday. The first Sunday of 2016 is January 3rd (I use 2016 as an example but the reasoning works for any year). When we use the Also, I now calculate and use the resolution of the input time series. Before I assumed it was hourly. |
@rouille you are right, it would be funny to have a 2015 timestamp. Perhaps we should clip incomplete weeks from the means as well. |
I will clip both the first and last one then as we do for the |
Done. I have also updated the docstring to explain the strategy regarding to clipping for both aggregation methods |
e9cf980
to
5433c88
Compare
""" | ||
_check_time_series(ts, "time series") | ||
|
||
if pd.infer_freq(ts.index) != "H": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I didn't know about this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I accidentally resampled the time to review from 15 minutes to 15 hours.
Thanks for all the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice test coverage. I haven't looked super carefully at the most recent changes, but seems good enough to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good to go as well. Thanks!
Purpose
Write as set of functions that operate on the rows of our time series.
What is the code doing?
There are 3 functions that perform the following operations:
Where to look
The code is in the
postreise.analyze.time
module and the tests are inpostreise.analyze.tests.test_time
. Also I have added a function inpostreise.analyze.check
and the associated tests inpostreise.analyze.tests.test_check
.Time estimate
15 min