StatsForecast models producing NotImplementedError: tiny datasets in 0.4.0 #238

tbellamey · 2023-10-04T15:06:35Z

What happened + What you expected to happen

Carryover from: #234

I upgraded my installation to 0.4.0. However, upon running my script (no code changed), I am now getting the below error. This seems to be inducing an error in using StatsForecast AutoETS models now. I also tried StatsForecast HoltWinters and received the same error.

This error did not raise with the same dataset & script when running 0.3.0 - any ideas what might have changed the behavior from 0.3.0 to 0.4.0?

NotImplementedError Traceback (most recent call last)
Cell In[8], line 109
98 #valid_agg_reset = valid_agg.reset_index()
100 model = StatsForecast(models=[
102 AutoETS(season_length=12,model='AAA',alias='AutoETS_AAA')
(...)
107 ],
108 freq='MS', n_jobs=1, verbose=True)
--> 109 model.fit(train_agg)
111 p = model.forecast(h=h_months, fitted=True)
112 p_fitted = model.forecast_fitted_values()

File ~/lib/python3.10/site-packages/statsforecast/core.py:880, in StatsForecast.fit(self, df, sort_df, prediction_intervals)
878 self.prepare_fit(df, sort_df)
879 if self.n_jobs == 1:
--> 880 self.fitted = self.ga.fit(models=self.models)
881 else:
882 self.fitted = self._fit_parallel()

File ~/lib/python3.10/site-packages/statsforecast/core.py:77, in GroupedArray.fit(self, models)
75 for i_model, model in enumerate(models):
76 new_model = model.new()
---> 77 fm[i, i_model] = new_model.fit(y=y, X=X)
78 return fm

File ~/lib/python3.10/site-packages/statsforecast/models.py:650, in AutoETS.fit(self, y, X)
628 def fit(
629 self,
630 y: np.ndarray,
631 X: Optional[np.ndarray] = None,
632 ):
633 """Fit the Exponential Smoothing model.
634
635 Fit an Exponential Smoothing model to a time series (numpy array) y
(...)
648 Exponential Smoothing fitted model.
649 """
--> 650 self.model_ = ets_f(
651 y, m=self.season_length, model=self.model, damped=self.damped
652 )
653 self.model_["actual_residuals"] = y - self.model_["fitted"]
654 self._store_cs(y=y, X=X)

File ~/lib/python3.10/site-packages/statsforecast/ets.py:1241, in ets_f(y, m, model, damped, alpha, beta, gamma, phi, additive_only, blambda, biasadj, lower, upper, opt_crit, nmse, bounds, ic, restrict, allow_multiplicative_trend, use_initial_values, maxit)
1238 # ses for non-optimized tiny datasets
1239 if n <= npars + 4:
1240 # we need HoltWintersZZ function
-> 1241 raise NotImplementedError("tiny datasets")
1242 # fit model (assuming only one nonseasonal model)
1243 if errortype == "Z":

NotImplementedError: tiny datasets

Versions / Dependencies

dateutil 2.8.2
hierarchicalforecast 0.4.0
matplotlib 3.7.1
numpy 1.23.5
pandas 2.0.2
session_info 1.0.0
statsforecast 1.6.0

IPython 8.14.0
jupyter_client 8.2.0
jupyter_core 5.3.0
notebook 6.5.4

Python 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0]
Linux-4.18.0-372.16.1.0.1.el8_6.x86_64-x86_64-with-glibc2.35

Reproduction script

model = StatsForecast(models=[AutoETS(season_length=12,model='AAA',alias='AutoETS_AAA') ], freq='MS', n_jobs=1, verbose=True)
model.fit(train_agg)

The call to model.fit generates the NotImplementedError: tiny datasets from statsforecast/ets.py

The same code executes successfully when running version 0.3.0, instead of 0.4.0

Issue Severity

High: It blocks me from completing my task.

The text was updated successfully, but these errors were encountered:

jmoralez · 2023-10-04T16:18:37Z

Hey. Without an example it's hard to tell. Are you using aggregate? #189 was fixed in 0.4.0, so you were maybe getting leading zeros giving your series some more samples, which is no longer the case.

tbellamey · 2023-10-04T17:26:05Z

Hi @jmoralez , I inspected the train_agg dataframe (produced using the aggregate function) for 0.3.0 vs 0.4.0

I'm inspecting the result of this line:
train_agg, S_train, tags = aggregate(df_train, spec)

0.3.0
In 0.3.0, the aggregate function is interpolating 0 values for 'y' in 'ds' periods where df_train has null values
So, for example, if I have a 'ds' range from '2018-01-01' thru '2018-12-01', but I'm missing 'y' values for months '2018-03-01' and '2018-04-01', the aggregate function will still populate train_agg at these 'ds' values with 'y' = 0

This allows the script to fit the StatsForecast AutoETS model and execute reconciliation for train_agg

0.4.0
In 0.4.0, the aggregate function no longer interpolates 0 values for 'y' in 'ds' periods where df_train has null values
This seems to be breaking the call to model.fit(train_agg), whereas before it was executing in 0.3.0

Should I aim to add back in the interpolated 'y'=0 values for the missing 'ds' values to replicate the 0.3.0 behavior for model.fit()? Just want to ensure this is the intended behavior for the aggregate function, before I implement a post-hoc fix

jmoralez · 2023-10-04T17:42:35Z

The problem with aggregate was leading zeros, e.g. if one of your series started at 2018-01-01 and another one at 2019-01-01 the aggregate function would then add all of 2018 as 0 for the second one. The fact that you have gaps in your series is a different problem and you should address it first (before running aggregate), you can use the fill_gaps function for that.

tbellamey · 2023-10-04T19:06:15Z

Thanks! The fill_gaps function helped resolve this issue & successfully executed the full script. However, I did have to set fill_gaps(df,freq='MS',start='global'), which reintroduces the leading zeros problem you're referencing for late-start series.

I tried leaving the start param at its default (start=‘per_serie’), but this still generated the NotImplementedError: tiny datasets.

Looking at statsforecast/ets.py where this error is tracing, I believe it may be a problem specific to my dataset:
https://github.com/Nixtla/statsforecast/blob/main/statsforecast/ets.py
n = len(y)
npars = 2 # alpha + l0
if trendtype in ["A", "M"]:
npars += 2 # beta + b0
if seasontype in ["A", "M"]:
npars += 2 # gamma + s
if damped is not None:
npars += damped
# ses for non-optimized tiny datasets
if n <= npars + 4:
# we need HoltWintersZZ function
raise NotImplementedError("tiny datasets")

I have sub-series in the hierarchy with too few data points (without adding in leading zeros). Since I am trying to fit AutoETS(model='AAA') onto all series, the (npars + 4) term is greater than n=len(y), which is raising the "tiny datasets" error.

Therefore, I believe this issue can be closed, since it's specific to a modeling approach vs. a bug in the code. Thanks for your help!

Incidentally, are there any plans to implement a MinTraceSparse(nonnegative=True) method in the future? I can handle negative values post-reconciliation, just curious about the roadmap.

jmoralez · 2023-10-04T20:03:03Z

Thanks. Can you please open a new issue requesting the nonnegative sparse MinTrace?

tbellamey added the bug label Oct 4, 2023

jmoralez closed this as completed Oct 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StatsForecast models producing NotImplementedError: tiny datasets in 0.4.0 #238

StatsForecast models producing NotImplementedError: tiny datasets in 0.4.0 #238

tbellamey commented Oct 4, 2023

jmoralez commented Oct 4, 2023

tbellamey commented Oct 4, 2023

jmoralez commented Oct 4, 2023

tbellamey commented Oct 4, 2023

jmoralez commented Oct 4, 2023

StatsForecast models producing NotImplementedError: tiny datasets in 0.4.0 #238

StatsForecast models producing NotImplementedError: tiny datasets in 0.4.0 #238

Comments

tbellamey commented Oct 4, 2023

What happened + What you expected to happen

Versions / Dependencies

dateutil 2.8.2 hierarchicalforecast 0.4.0 matplotlib 3.7.1 numpy 1.23.5 pandas 2.0.2 session_info 1.0.0 statsforecast 1.6.0

IPython 8.14.0 jupyter_client 8.2.0 jupyter_core 5.3.0 notebook 6.5.4

Reproduction script

Issue Severity

jmoralez commented Oct 4, 2023

tbellamey commented Oct 4, 2023

jmoralez commented Oct 4, 2023

tbellamey commented Oct 4, 2023

jmoralez commented Oct 4, 2023

dateutil 2.8.2
hierarchicalforecast 0.4.0
matplotlib 3.7.1
numpy 1.23.5
pandas 2.0.2
session_info 1.0.0
statsforecast 1.6.0

IPython 8.14.0
jupyter_client 8.2.0
jupyter_core 5.3.0
notebook 6.5.4