Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StatsForecast models producing NotImplementedError: tiny datasets in 0.4.0 #238

Closed
tbellamey opened this issue Oct 4, 2023 · 5 comments
Closed
Labels

Comments

@tbellamey
Copy link

What happened + What you expected to happen

Carryover from: #234

I upgraded my installation to 0.4.0. However, upon running my script (no code changed), I am now getting the below error. This seems to be inducing an error in using StatsForecast AutoETS models now. I also tried StatsForecast HoltWinters and received the same error.

This error did not raise with the same dataset & script when running 0.3.0 - any ideas what might have changed the behavior from 0.3.0 to 0.4.0?


NotImplementedError Traceback (most recent call last)
Cell In[8], line 109
98 #valid_agg_reset = valid_agg.reset_index()
100 model = StatsForecast(models=[
102 AutoETS(season_length=12,model='AAA',alias='AutoETS_AAA')
(...)
107 ],
108 freq='MS', n_jobs=1, verbose=True)
--> 109 model.fit(train_agg)
111 p = model.forecast(h=h_months, fitted=True)
112 p_fitted = model.forecast_fitted_values()

File ~/lib/python3.10/site-packages/statsforecast/core.py:880, in StatsForecast.fit(self, df, sort_df, prediction_intervals)
878 self.prepare_fit(df, sort_df)
879 if self.n_jobs == 1:
--> 880 self.fitted
= self.ga.fit(models=self.models)
881 else:
882 self.fitted
= self._fit_parallel()

File ~/lib/python3.10/site-packages/statsforecast/core.py:77, in GroupedArray.fit(self, models)
75 for i_model, model in enumerate(models):
76 new_model = model.new()
---> 77 fm[i, i_model] = new_model.fit(y=y, X=X)
78 return fm

File ~/lib/python3.10/site-packages/statsforecast/models.py:650, in AutoETS.fit(self, y, X)
628 def fit(
629 self,
630 y: np.ndarray,
631 X: Optional[np.ndarray] = None,
632 ):
633 """Fit the Exponential Smoothing model.
634
635 Fit an Exponential Smoothing model to a time series (numpy array) y
(...)
648 Exponential Smoothing fitted model.
649 """
--> 650 self.model_ = ets_f(
651 y, m=self.season_length, model=self.model, damped=self.damped
652 )
653 self.model_["actual_residuals"] = y - self.model_["fitted"]
654 self._store_cs(y=y, X=X)

File ~/lib/python3.10/site-packages/statsforecast/ets.py:1241, in ets_f(y, m, model, damped, alpha, beta, gamma, phi, additive_only, blambda, biasadj, lower, upper, opt_crit, nmse, bounds, ic, restrict, allow_multiplicative_trend, use_initial_values, maxit)
1238 # ses for non-optimized tiny datasets
1239 if n <= npars + 4:
1240 # we need HoltWintersZZ function
-> 1241 raise NotImplementedError("tiny datasets")
1242 # fit model (assuming only one nonseasonal model)
1243 if errortype == "Z":

NotImplementedError: tiny datasets

Versions / Dependencies


dateutil 2.8.2
hierarchicalforecast 0.4.0
matplotlib 3.7.1
numpy 1.23.5
pandas 2.0.2
session_info 1.0.0
statsforecast 1.6.0


IPython 8.14.0
jupyter_client 8.2.0
jupyter_core 5.3.0
notebook 6.5.4

Python 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0]
Linux-4.18.0-372.16.1.0.1.el8_6.x86_64-x86_64-with-glibc2.35

Reproduction script

model = StatsForecast(models=[AutoETS(season_length=12,model='AAA',alias='AutoETS_AAA') ], freq='MS', n_jobs=1, verbose=True)
model.fit(train_agg)

The call to model.fit generates the NotImplementedError: tiny datasets from statsforecast/ets.py

The same code executes successfully when running version 0.3.0, instead of 0.4.0

Issue Severity

High: It blocks me from completing my task.

@tbellamey tbellamey added the bug label Oct 4, 2023
@jmoralez
Copy link
Member

jmoralez commented Oct 4, 2023

Hey. Without an example it's hard to tell. Are you using aggregate? #189 was fixed in 0.4.0, so you were maybe getting leading zeros giving your series some more samples, which is no longer the case.

@tbellamey
Copy link
Author

Hi @jmoralez , I inspected the train_agg dataframe (produced using the aggregate function) for 0.3.0 vs 0.4.0

I'm inspecting the result of this line:
train_agg, S_train, tags = aggregate(df_train, spec)

0.3.0
In 0.3.0, the aggregate function is interpolating 0 values for 'y' in 'ds' periods where df_train has null values
So, for example, if I have a 'ds' range from '2018-01-01' thru '2018-12-01', but I'm missing 'y' values for months '2018-03-01' and '2018-04-01', the aggregate function will still populate train_agg at these 'ds' values with 'y' = 0

This allows the script to fit the StatsForecast AutoETS model and execute reconciliation for train_agg

0.4.0
In 0.4.0, the aggregate function no longer interpolates 0 values for 'y' in 'ds' periods where df_train has null values
This seems to be breaking the call to model.fit(train_agg), whereas before it was executing in 0.3.0

Should I aim to add back in the interpolated 'y'=0 values for the missing 'ds' values to replicate the 0.3.0 behavior for model.fit()? Just want to ensure this is the intended behavior for the aggregate function, before I implement a post-hoc fix

@jmoralez
Copy link
Member

jmoralez commented Oct 4, 2023

The problem with aggregate was leading zeros, e.g. if one of your series started at 2018-01-01 and another one at 2019-01-01 the aggregate function would then add all of 2018 as 0 for the second one. The fact that you have gaps in your series is a different problem and you should address it first (before running aggregate), you can use the fill_gaps function for that.

@tbellamey
Copy link
Author

Thanks! The fill_gaps function helped resolve this issue & successfully executed the full script. However, I did have to set fill_gaps(df,freq='MS',start='global'), which reintroduces the leading zeros problem you're referencing for late-start series.

I tried leaving the start param at its default (start=‘per_serie’), but this still generated the NotImplementedError: tiny datasets.

Looking at statsforecast/ets.py where this error is tracing, I believe it may be a problem specific to my dataset:
https://github.com/Nixtla/statsforecast/blob/main/statsforecast/ets.py
n = len(y)
npars = 2 # alpha + l0
if trendtype in ["A", "M"]:
npars += 2 # beta + b0
if seasontype in ["A", "M"]:
npars += 2 # gamma + s
if damped is not None:
npars += damped
# ses for non-optimized tiny datasets
if n <= npars + 4:
# we need HoltWintersZZ function
raise NotImplementedError("tiny datasets")

I have sub-series in the hierarchy with too few data points (without adding in leading zeros). Since I am trying to fit AutoETS(model='AAA') onto all series, the (npars + 4) term is greater than n=len(y), which is raising the "tiny datasets" error.

Therefore, I believe this issue can be closed, since it's specific to a modeling approach vs. a bug in the code. Thanks for your help!

Incidentally, are there any plans to implement a MinTraceSparse(nonnegative=True) method in the future? I can handle negative values post-reconciliation, just curious about the roadmap.

@jmoralez
Copy link
Member

jmoralez commented Oct 4, 2023

Thanks. Can you please open a new issue requesting the nonnegative sparse MinTrace?

@jmoralez jmoralez closed this as completed Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants