Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Rework getting started guide and single problem forecasting loaders #2248

Merged
merged 35 commits into from
Nov 4, 2024

Conversation

TonyBagnall
Copy link
Contributor

@TonyBagnall TonyBagnall commented Oct 25, 2024

fixes #2246 part of #1518

This has expanded to tidying up the single problem data loaders for forecasting. So its in two related parts

datasets._single_problem_loaders

There are seven baked in forecasting data sets and were eight loaders.

  1. I have removed load_macroeconomic because it was just a wrapper for the statsmodels loader
  2. For five univariate series loaders, I have added a return_array boolean argument that defaults to true. This makes it load data as an np.ndarray, if false it returns a pd.Series
  3. There are two multivariate loaders, uschange and longley. These adopted a structure like this
y, X = load_longley(y_series = "Consumption")

firstly returning y, X is opposite to collections, and secondly there seems no need to split the data in the loader, the user can surely do that themselves. Changed to

data =load_longley()

returns numpy array with axis == 0, i.e. n_channels, n_timepoints

and

data =load_longley(return_array = False)

returns a data frame with axis == 1 and all the column names set as before.

Read me

Split along series/collection estimators and adding an example for each module, including experimental. It makes it longer, maybe we dont want it, but its a good top level intro imo, will link for further details

First version done, highlighted there is no anomaly detection notebook, see #1960 and the transformers notebooks need an overhaul, but thats future work. The main goal is to get things ready for the new forecasting base class

@TonyBagnall TonyBagnall added the documentation Improvements or additions to documentation label Oct 25, 2024
@aeon-actions-bot
Copy link
Contributor

aeon-actions-bot bot commented Oct 25, 2024

Thank you for contributing to aeon

I did not find any labels to add that did not already exist. If the content of your PR changes, make sure to update the labels accordingly.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

  • Run pre-commit checks for all files
  • Run mypy typecheck tests
  • Run all pytest tests and configurations
  • Run all notebook example tests
  • Run numba-disabled codecov tests
  • Stop automatic pre-commit fixes (always disabled for drafts)
  • Disable numba cache loading
  • Push an empty commit to re-run CI checks

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@TonyBagnall TonyBagnall changed the title [DOC] Rework getting started guide [DOC] Rework getting started guide and single problem forecasting loaders Oct 29, 2024
@TonyBagnall TonyBagnall mentioned this pull request Oct 29, 2024
4 tasks
@aeon-actions-bot aeon-actions-bot bot added the full examples run Run all examples on a PR label Oct 29, 2024
@TonyBagnall TonyBagnall marked this pull request as ready for review October 29, 2024 17:34
@MatthewMiddlehurst
Copy link
Member

we are defaulting, but it is not universal, see anomaly detection.

Where in anomaly detection? I am referring just to inputs, dont really care what format they want the data internally

You cant load into a dataframe with n_channels, n_timepoints and keep the column names. I have left the pandas stuff for legacy reasons really. I think I would rather remove these loaders completely than process to have series in rows in a dataframe.

cant you just transpose the dataframe? seems very odd if that removes the indicies

@TonyBagnall
Copy link
Contributor Author

we are defaulting, but it is not universal, see anomaly detection.

Where in anomaly detection? I am referring just to inputs, dont really care what format they want the data internally

You cant load into a dataframe with n_channels, n_timepoints and keep the column names. I have left the pandas stuff for legacy reasons really. I think I would rather remove these loaders completely than process to have series in rows in a dataframe.

cant you just transpose the dataframe? seems very odd if that removes the indicies

I guess I can, seems odd to do so and I thought the examples made it clear, but sure, Need to rewrite the tests that use col_names

@MatthewMiddlehurst
Copy link
Member

I remember some previous changes where we tried to remove n_timepoints, n_channels as much as we could. Think it was df-list and the other dataframe one for collections. May have been some other changes removing references to it as well.

Seems better to completely follow one format and leave changing that to the axis stuff if thats how we want it

@TonyBagnall
Copy link
Contributor Author

we are defaulting, but it is not universal, see anomaly detection.

Where in anomaly detection? I am referring just to inputs, dont really care what format they want the data internally

You cant load into a dataframe with n_channels, n_timepoints and keep the column names. I have left the pandas stuff for legacy reasons really. I think I would rather remove these loaders completely than process to have series in rows in a dataframe.

cant you just transpose the dataframe? seems very odd if that removes the indicies

yes but you cannot then extract series X["channel_name"], anyway, I have done it and removed the test for column names

@MatthewMiddlehurst
Copy link
Member

You would just use X.loc["channel_name"] im pretty sure

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html

@TonyBagnall
Copy link
Contributor Author

You would just use X.loc["channel_name"] im pretty sure

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html

It was more I didn't want to rewrite the notebooks that used plot_series. But I've ditched that now

Copy link
Member

@hadifawaz1999 hadifawaz1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just small questions, if you want to keep them for later i dont mind

docs/getting_started.md Outdated Show resolved Hide resolved
docs/getting_started.md Outdated Show resolved Hide resolved
Copy link

review-notebook-app bot commented Nov 3, 2024

View / edit / reply to this conversation on ReviewNB

hadifawaz1999 commented on 2024-11-03T12:51:53Z
----------------------------------------------------------------

i would think its better to add a section per task to use the load any dataset function, load_classification load_regression etc. what do u think ?


TonyBagnall commented on 2024-11-03T16:46:53Z
----------------------------------------------------------------

yes I agree, but maybe not in this PR? Really only wanted to do the getting_started.md, then will work through notebooks module by module, starting with datasets

Copy link
Contributor Author

yes I agree, but maybe not in this PR? Really only wanted to do the getting_started.md, then will work through notebooks module by module, starting with datasets


View entire conversation on ReviewNB

Copy link
Member

@hadifawaz1999 hadifawaz1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@TonyBagnall TonyBagnall merged commit 3992bc7 into main Nov 4, 2024
15 checks passed
@TonyBagnall TonyBagnall deleted the ajb/getting_started branch November 4, 2024 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation full examples run Run all examples on a PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] update getting started guide
5 participants