-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Add control message to start background persistence #19
Conversation
@@ -393,6 +393,14 @@ bool CAnomalyJob::handleControlMessage(const std::string &controlMessage) | |||
case 'p': | |||
this->doForecast(controlMessage); | |||
break; | |||
case 'w': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'p' for persistence and 's' for save are already taken so 'w' for write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -393,6 +393,14 @@ bool CAnomalyJob::handleControlMessage(const std::string &controlMessage) | |||
case 'p': | |||
this->doForecast(controlMessage); | |||
break; | |||
case 'w': | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we'll probably also want the option to request persistence in the foreground.
This could be indicated using a second letter after the 'w', say 'b' for background and 'f' for foreground.
Obviously all this could be introduced in the future PR where the foreground persistence option is added. However, it would make adding that change smoother (with CI interaction between repos) if the second letter was added now, with a warning message if no second letter was present or was not understood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I like the idea of persist foreground and background.
I wanted to propose a different change I was going to do in another PR but I'll raise it now. This command is added for when the datafeed has finished lookback, it's possible that if a lookback took more than 3 hours (or a multiple of) then it would have recently persisted. How about an optional duration following the 'b' which would be interpreted as 'start bg persist now if you have not done it in the last N minutes'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. That also answers the question about what to do if a background persist is currently in progress when the message is received - it would be reasonable not to do it ever under these circumstances.
'start bg persist now if you have not done it in the last N minutes'
I would change 'minutes' to 'seconds' in the command, as everything else works in terms of seconds in the C++.
lib/api/CBackgroundPersister.cc
Outdated
@@ -178,6 +178,11 @@ bool CBackgroundPersister::firstProcessorPeriodicPersistFunc(const TFirstProcess | |||
return true; | |||
} | |||
|
|||
bool CBackgroundPersister::startBackgroundPersist(void) | |||
{ | |||
return this->startBackgroundPersist(core::CTimeUtils::now()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is called while a background persist is currently in progress then there are two potential problems:
- It won't actually do another background persist, but the only indication of this will be the error message "Failed to start background persistence" - there will be no extra explanation of the reason why.
- Before it fails it will copy all the models. We know that background persist can lead to a doubling of process memory usage, but this situation would lead to a temporary tripling of process memory.
One way to solve these problems is as follows:
- Add comments to
CBackgroundPersister::startPersist()
andCBackgroundPersister::startBackgroundPersist(core_t::TTime)
saying that they must only be called whenm_Mutex
is locked. I think this is OK for private methods. - Change
m_Mutex
from acore::CFastMutex
to acore::CMutex
, as it's going to be held for longer due to the rest of this proposal. (Or change it to astd::mutex
if you prefer.) - Before the
if (this->isBusy())
check inCBackgroundPersister::startBackgroundPersistIfAppropriate()
, lockm_Mutex
. - At the beginning of
CBackgroundPersister::startBackgroundPersist()
(i.e. the public method), lockm_Mutex
. Then, in between the mutex acquisition andreturn this->startBackgroundPersist(core::CTimeUtils::now());
checkif (this->isBusy())
and log a warning message that clearly states why the requested background persist is not being run.
With (4) there is also the question of whether it's acceptable to ignore a requested background persist if there's already one in progress. If we think it's not acceptable then we need functionality to remember the request and have the background thread run again as soon as it finishes. But it's probably best to discuss that question as a separate issue and defer any changes to a future PR rather than mix it into this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another alternative: using the CConcurrentWrapper, that would allow to queue persistence requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. If m_FirstProcessorPeriodicPersistFunc
is called while isBusy() == true
then it will result in concurrent modification of the m_PersistFuncs
member.
I think it is sufficient to have the isBusy()
check at the start of the public startBackgroundPersist(void)
function and log a message. I think it's acceptable to ignore the persist request if one is currently executing. I've pushed a change with that check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another alternative: using the CConcurrentWrapper, that would allow to queue persistence requests
But I think we need to consider carefully if it's sensible to queue persistence requests. If state is persisted twice in quick succession all that really achieves is to waste CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, but flooding protection is on a different page just saying it allows to, does not mean we should do it. Same is true for the other way around: A scheduled background persist could kick in right after a on-demand persist. Should not be to hard to check whether persistence makes sense or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A scheduled background persist could kick in right after a on-demand persist
A requested persist resets the periodic persist clock so it won't happen again for another 3+ hours
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is sufficient to have the
isBusy()
check at the start of the publicstartBackgroundPersist(void)
function and log a message.
That's only true because the check for a periodic background persist is triggered in the same thread as control messages are handled in, so they cannot both happen at the same time.
If you're not going to extend the mutex locking then I think the Doxygen comments need to say that only one thread in the program is allowed to call the startBackgroundPersist()
and startBackgroundPersistIfAppropriate()
methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've documented the thread safety restrictions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
As MVP for 6.3 to support Cloud I think this is enough. Further changes can be done later.
* Add control message for back ground persist * Add isBusy check to public startBackgroundPersist function * Document thread safety aspects
* Add control message for back ground persist * Add isBusy check to public startBackgroundPersist function * Document thread safety aspects
* Add control message for back ground persist * Add isBusy check to public startBackgroundPersist function * Document thread safety aspects
Starts background persistence if not currently persisting and resets the clock on the periodic persist.
This is tested through an integration test in the x-pack repo