-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Extend unit testing and make some improvements to deal with changes in seasonal components #91
[ML] Extend unit testing and make some improvements to deal with changes in seasonal components #91
Conversation
@@ -255,6 +258,7 @@ const TSizeVecVec SC_TRANSITION_FUNCTION{ | |||
TSizeVec{SC_NORMAL, SC_NORMAL, SC_NORMAL, SC_NORMAL}}; | |||
|
|||
const std::string VERSION_6_3_TAG("6.3"); | |||
const std::string VERSION_6_4_TAG("6.4"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to prevent a job that has written 6.4 state from running on a 6.3 node in a mixed 6.3/6.4 cluster. I guess if this isn't done then a couple of fields won't get initialised. The way to do that is to change line 78 of CAnomalyJob.cc.
docs/CHANGELOG.asciidoc
Outdated
@@ -29,6 +29,7 @@ | |||
|
|||
Improve and use periodic boundary condition for seasonal component modeling ({pull}84[#84]) | |||
Improve robustness w.r.t. outliers of detection and initialisation of seasonal components ({pull}90[#90]) | |||
Extend unit testing and make some improvements to deal with changes in seasonal components ({pull}91[#91]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the release notes need to mention unit test changes - someone trying to find out what's new in ML isn't going to care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I'll update.
lib/maths/CSeasonalComponent.cc
Outdated
scale = CTools::truncate(1.002 * scale - 0.001, 0.0, 1.0); | ||
|
||
return -scale * min[0] * CTools::sign(shortPeriodValue); | ||
return bias.signMargin() != 0.0 ? bias.signMargin() : cancellation.signMargin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you want to treat very small numbers as 0? For example, if bias.signMargin()
was 1e-308 would you still want to use it? (This is only a question - if the answer is yes then fine.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could, for example, not bother if it is << amplitude. The chances of this being very small, but not identically zero, is actually small. If there is no bias then we expect the differences at the different times in the period to not all have the same sign with a high probability because noise in the underlying values should cause them to randomly land on different sides of the mean. Also, it doesn't have a significant impact to have a small, but non-zero, delta. So on balance I don't think this warrants the extra complication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This turned out to be more involved: testing with variable decay rate I again hit the instability. I've therefore significantly extended the original PR:
I split the extra changes into two commits. The first is a pure refactor to better encapsulate seasonal and calendar components. The second makes the functional changes. Can you take another look @droberts195. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
d716093
to
d6b87d3
Compare
In the course of other testing related to #6, I found evidence of potential instability in the decomposition when the seasonal components in the time series abruptly change. In addition, our lifecycle management of seasonal components was not reliably removing components which were no longer improving predictions. This was related to the way we were trying to preserve components which carry information about periodic variance in the data.
This adds a unit test to exercise these problems and makes some related improvements. As part of these I've implemented a failsafe to remove all seasonal components and start again if instability occurs. The current behaviour on this unit test (red actuals, blue predictions) is as follows:
For reference, this is what was happening before:
This can affect results on metric and count analysis with seasonal signals where seasonality changes significantly from time to time.
Release note: Improves behavior when there are abrupt changes in the seasonal components present in a time series