Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create functions that operate on indices of time series #165

Merged
merged 7 commits into from
Sep 17, 2020
Merged

Conversation

rouille
Copy link
Collaborator

@rouille rouille commented Sep 11, 2020

Purpose

Write as set of functions that operate on the rows of our time series.

What is the code doing?

There are 3 functions that perform the following operations:

  • slice time series between 2 dates.
  • resample a time series. Daily, weekly and monthly resampling are available. For all the resampling intervals, the left edge of the bin is closed and used for labeling. Incomplete day, week and month are clipped.
  • change the time zone of a time series.

Where to look

The code is in the postreise.analyze.time module and the tests are in postreise.analyze.tests.test_time. Also I have added a function in postreise.analyze.check and the associated tests in postreise.analyze.tests.test_check.

Time estimate

15 min

@@ -220,6 +221,24 @@ def _check_date_range(scenario, start, end):
raise ValueError("Must have scenario_start <= start <= end <= scenario_end")


def _check_date_range_in_time_series(df, start, end):
"""Check if start time and endtime define a valid time range of the given scenario.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the different between this function and the one above: _check_date_range_in_scenario?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it would be better to have two separate functions instead of a combine one that does a bunch of if/else

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that we have inconsistent time range between scenario.info and the corresponding date frame, i.e. scenario.info['start_date']!=df.index[0], scenario.info['end_date']!=df.index[-1]?

Copy link
Collaborator Author

@rouille rouille Sep 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reference to a Scenario instance in _check_date_range_in_time_series. In principle, the check will for any time series (I realize that I need to raise an error if the indices are not a DatetimeIndex, i,e, df is not a time series). However, I believe the input time series will always be input and output data of a scenario retrieved using an Analyze object (PG, LMP, etc.)

postreise/analyze/check.py Outdated Show resolved Hide resolved
postreise/analyze/time.py Outdated Show resolved Hide resolved
@danielolsen
Copy link
Contributor

This is perhaps outside of the scope of this PR and more suitable for a followup, but do we want these new functions to be able to operate on pandas.Series objects as well? They also have a resample method: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.resample.html

@danielolsen
Copy link
Contributor

Another outside-the-scope comment: so far this is always doing summation during resampling, but we may also want to find the mean instead, which if we resample by month isn't as simple as dividing by a constant, the way it is for days and weeks.

@rouille
Copy link
Collaborator Author

rouille commented Sep 13, 2020

@danielolsen, when implementing the mean aggregation method for the resampling, I had to make a choice for the weekly frequency. Weeks start on Sunday. The first Sunday of 2016 is January 3rd (I use 2016 as an example but the reasoning works for any year). When we use the mean, we don't clip incomplete week and hence the first 2 days of January are used to calculate the value. The problem is that the timestamp for this value is Sunday December 27th 2015. As it will seem odd when printing/plotting the data I enforced to have all weeks of the resampled time series in 2016 (l.56 of the time module). The last week of the year can be incomplete (not for 2016 though) but we won't have this problem. What do you think?

Also, I now calculate and use the resolution of the input time series. Before I assumed it was hourly.

@danielolsen
Copy link
Contributor

@rouille you are right, it would be funny to have a 2015 timestamp. Perhaps we should clip incomplete weeks from the means as well.

@rouille
Copy link
Collaborator Author

rouille commented Sep 14, 2020

@rouille you are right, it would be funny to have a 2015 timestamp. Perhaps we should clip incomplete weeks from the means as well.

I will clip both the first and last one then as we do for the sum.

@rouille
Copy link
Collaborator Author

rouille commented Sep 14, 2020

@rouille you are right, it would be funny to have a 2015 timestamp. Perhaps we should clip incomplete weeks from the means as well.

I will clip both the first and last one then as we do for the sum.

Done. I have also updated the docstring to explain the strategy regarding to clipping for both aggregation methods

@rouille rouille force-pushed the ben/time branch 2 times, most recently from e9cf980 to 5433c88 Compare September 15, 2020 21:19
"""
_check_time_series(ts, "time series")

if pd.infer_freq(ts.index) != "H":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I didn't know about this function.

Copy link
Contributor

@danielolsen danielolsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I accidentally resampled the time to review from 15 minutes to 15 hours.

Thanks for all the changes.

Copy link
Collaborator

@jenhagg jenhagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice test coverage. I haven't looked super carefully at the most recent changes, but seems good enough to merge.

Copy link
Collaborator

@BainanXia BainanXia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go as well. Thanks!

@rouille rouille merged commit 775a92e into develop Sep 17, 2020
@rouille rouille deleted the ben/time branch September 17, 2020 03:00
@ahurli ahurli mentioned this pull request Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants