Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Improvements to forecasting robustness (part 2) #6

Closed
2 tasks done
tveasey opened this issue Feb 23, 2018 · 1 comment
Closed
2 tasks done

[ML] Improvements to forecasting robustness (part 2) #6

tveasey opened this issue Feb 23, 2018 · 1 comment

Comments

@tveasey
Copy link
Contributor

tveasey commented Feb 23, 2018

Following on from issue #5. We also need better support from a forecasting perspective for at least some types of time series' change points. In particular, we need to:

  • Improve handling of discontinuities in the time series values. I think the best way to do this is to have an additive piecewise constant function as part of our trend model. This sort of problem is much easier to do in a global context, i.e. with access to the entire time series, so the challenging part of this is going to be to identify step changes online in a manner which doesn't overfit. I have some ideas for this which I will need to experiment with.
  • Model the statistical properties of the discontinuities, this will allow us to roll out candidate forecast paths including predicted discontinuities. In this context, I think the distribution of step size, step interval and the value when the time series steps should be included in the model as a minimum. This should cover common cases such as discontinuities occur at predictable intervals (i.e. scheduled tasks), discontinuities occur at a particular level for the time series (i.e. garbage collection).
@tveasey
Copy link
Contributor Author

tveasey commented Feb 23, 2018

A note on methodology for item 1 above.

The issue we run into is that we currently have to make up our minds immediately as to whether a step has occurred or end up polluting the residual model with values before we've decided there was a step: these will typically be way off the typical (until the trend has adjusted to the new level).

The brute force approach for dealing with this would be to create models for hypotheses: there was a step change at T(1), T(2), .... In practice, one would very quickly kill off most alternatives, but this still explodes our memory usage. However, I think for certain types of simple hypotheses, which we also expect to be common, we can do much better from a memory perspective than creating multiple models. In particular, the idea would be to track enough state to apply the change, corresponding to a hypothesis, to the decomposition model and only copy the residual distribution model (which is only a small proportion of the model size). Hypotheses for which this is both possible and makes sense are:

  1. Level shifts
  2. +/- 1 hour shifts in time (i.e. daylight savings)
  3. A constant scaling of the periodic components (handling for this case has come up as useful in a variety customer conversations in the past)

We can delay committing to a hypothesis until we are confident it is true and can also reuse all the machinery we have for model selection for this purpose, so this is actually a fairly small change.

@sophiec20 sophiec20 changed the title Improvements to forecasting robustness (part 2) [ML] Improvements to forecasting robustness (part 2) Mar 15, 2018
@tveasey tveasey changed the title [ML] Improvements to forecasting robustness (part 2) Improvements to forecasting robustness (part 2) Mar 26, 2018
@sophiec20 sophiec20 added the :ml label Mar 28, 2018
@sophiec20 sophiec20 changed the title Improvements to forecasting robustness (part 2) [ML] Improvements to forecasting robustness (part 2) Mar 28, 2018
tveasey added a commit that referenced this issue May 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants