-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Fail start up if state is missing #4
Conversation
lib/api/CAnomalyJob.cc
Outdated
" - ignoring it as current state version is " << | ||
model::CAnomalyDetector::STATE_VERSION); | ||
|
||
// This counts as successful restoration | ||
return true; | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one should stay as it is, otherwise we can never ever bump the model state version.
However, there is another problem here. If the job is doing categorization as well as anomaly detection then the stream will be incorrectly positioned if we simply return here, so in the case where the model state version has been bumped but the categorizer state version hasn't the categorizer restoration will fail with a horrible error. (Simply returning dates back to when we used to run completely separate searches against ES to get the different state types.) It needs to read and discard the old model state leaving the stream positioned at the start of the categorizer state. I'm happy to leave this problem for a followup PR, as it's not possible to hit it until we eventually bump the state version, so you could just add a TODO for it.
lib/api/unittest/CAnomalyJobTest.cc
Outdated
fieldConfig.initFromClause(clauses); | ||
model::CAnomalyDetectorModelConfig modelConfig = | ||
model::CAnomalyDetectorModelConfig::defaultConfig(BUCKET_SIZE); | ||
std::stringstream outputStrm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use ostringstream
as it's only for output.
// This is not an error if no data has been persisted | ||
return true; | ||
LOG_ERROR("Expected persisted state but no state exists"); | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same change is also needed on line 282 of CFieldDataTyper.cc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -888,8 +888,8 @@ bool CAnomalyJob::restoreState(core::CDataSearcher &restoreSearcher, | |||
|
|||
if (strm->fail()) | |||
{ | |||
// This is not fatal - we just didn't find the given document number | |||
return true; | |||
// This is fatal. If the stream exists and has failed then state is missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think eof
is a special case of fail
, so in the case where there are no state documents at all this will trigger instead of the eof
check in the method below. So it's probably worth adding LOG_ERROR("Expected persisted state but no state exists");
here too, so we get an error logged in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if before anything has been read from the stream so eof
will not be set. Considering bad
is checked above this is very unlikely to fail but it does need a log message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point. I don't think this can happen for our named pipe restoration, as we'll fail earlier on if the named pipe doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed a few minor nits but LGTM when they're fixed.
//! | ||
//! DESCRIPTION:\n | ||
//! CDataSearcher that returns an empty stream. | ||
//! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Doxygen comment should go above the class CEmptySearcher
line.
//! | ||
//! DESCRIPTION:\n | ||
//! CDataSearcher that returns an empty stream. | ||
//! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Doxygen comment should go above the class CEmptySearcher
line.
lib/api/CFieldDataTyper.cc
Outdated
@@ -308,7 +309,14 @@ bool CFieldDataTyper::restoreState(core::CDataSearcher &restoreSearcher, | |||
|
|||
bool CFieldDataTyper::acceptRestoreTraverser(core::CStateRestoreTraverser &traverser) | |||
{ | |||
if (traverser.name() == VERSION_TAG) | |||
std::string firstFieldName(traverser.name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to copy here - you can just keep a reference:
const std::string &firstFieldName = traverser.name();
* Fail start up if state is missing * Fail categorizer restore if state is missing
* Fail start up if state is missing * Fail categorizer restore if state is missing
* Fail start up if state is missing * Fail categorizer restore if state is missing
Fail restoring job if state input stream is empty.
If the restore state argument is passed to autodetect then a state input stream is created and it is an error if that stream contains no state.