-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Use off-heap storage for forecasting large models #22
Conversation
Re-factor minimumSeasonalVarianceScale to make it available as method. This is required in order to restore models from a file stream, more precisely it is required as a parameter for maths::CModelParams which is required for maths::SModelRestoreParams
lib/api/CForecastRunner.cc
Outdated
|
||
|
||
boost::filesystem::create_directories(temporaryFolder); | ||
LOG_INFO("Persisting to: " << temporaryFolder.string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall be DEBUG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good on a quick first pass. I'll have another slightly more careful read through on Monday. I left some minor suggestions.
include/api/CForecastRunner.h
Outdated
//! max memory allowed to use for forecast models persisting to disk | ||
static const size_t MAX_FORECAST_MODEL_PERSISTANCE_MEMORY = 524288000; // 500MB | ||
|
||
// Note: This value us lower than on X-pack side to prevent side-effects, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
us |-> is
include/api/CForecastRunner.h
Outdated
@@ -82,6 +83,14 @@ class API_EXPORT CForecastRunner final: private core::CNonCopyable | |||
//! max memory allowed to use for forecast models | |||
static const size_t MAX_FORECAST_MODEL_MEMORY = 20971520; // 20MB | |||
|
|||
//! max memory allowed to use for forecast models persisting to disk | |||
static const size_t MAX_FORECAST_MODEL_PERSISTANCE_MEMORY = 524288000; // 500MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale for this being so much less than MIN_FORECAST_AVAILABLE_DISK_SPACE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not want to fill the disk no matter how large the models are. So I keep some room.
Note that MAX_FORECAST_MODEL_PERSISTANCE_MEMORY is actually the size in memory, not persistence. I haven't made tests yet, but I assume that writing it as json has some expansion factor. So the size on disk will be higher than the size in memory. I do not think we need to exactly calculate that but only make a worst case assumption.
500MB is totally educated guess, I wonder if we ever have forecast models of that size.
I will add some more code docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are I see, that makes sense, ok cool plus we can always revisit if we think this is too conservative. +1 to a little more documentation on this.
include/core/RestoreMacros.h
Outdated
return false; \ | ||
} \ | ||
target = enumtype(value); \ | ||
restoreSuccess = true; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing \
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I wonder whether we should check to see if we actually restored the enum in this macro. I'm inclined to say we shouldn't and ditch restoreSuccess.
We generally don't bother to check if the field was missing altogether from the state document. Sometimes we can cope with default initialising a new field and it doesn't have to be present in the original state document, so this cuts down usability.
If you want to check what fields are present in the state document you can always fill in an unordered string set in the loop which does the restoring, i.e.
boost::unordered_set<std::string> restored;
do
{
restored.insert(traverser.name());
...
}
while (traverser.next());
if (restored.count(MY_FIELD_I_WANT_TO_BE_PRESENT_TAG) > 0 && ...)
{
return false;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW: I first used a E_UNKNOWN state in the enum but that had to many side-effects on the existing code base. Other parts art not affected as I can initialize them to something invalid and check whether it has been set. So this only applies to enums, while enums are only used here.
For me the non-existence of the field is a bug.
I can use the suggested trick, but this looks more complicated to me than this 1 bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely is more complicated for this case, but I want this macro to be as widely useable as possible.
Suppose we introduce a new enum member to an existing class, but feel we can happily get away with default initialising it to something. Then we don't want this extra "check a value exists". Now we could always go ahead and just ignore the bool, but that feels wrong.
That is the rationale for this suggestion, make the macro as general and simple as possible and if we need special handling deal with that elsewhere. As it is this feels like odd functionality out to have just this one case check for existence of a field.
Another option would be to have some general functionality which allows us to force that a list field values is actually read in a given loop. Something like registerManditoryFields
on the traverser?
static const std::string FEATURE_TAG; | ||
static const std::string DATA_TYPE_TAG; | ||
static const std::string MODEL_TAG; | ||
static const std::string BY_FIELD_VALUE_TAG; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my usual comment, can we move to cc? Looks like these (at least FORECAST_MODEL_PERSIST_TAG
) are only referenced in cc, so just defining them in unnamed namespace cuts down code and cleans up class definition.
lib/api/CForecastRunner.cc
Outdated
series.s_MinimumSeasonalVarianceScale, | ||
series.s_ToForecastPersisted)); | ||
// if in memory models are empty check if we can load persisted ones | ||
if (series.s_ToForecast.empty() && modelRestore) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely modelRestore can't be null if we get here (although see below).
lib/api/CForecastRunner.cc
Outdated
@@ -214,7 +245,25 @@ void CForecastRunner::forecastWorker() | |||
lastStatsUpdate = elapsedTime; | |||
} | |||
} | |||
|
|||
// if in memory models are empty check if we can load persisted ones | |||
if (series.s_ToForecast.empty() && modelRestore) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move into lambda to avoid code duplication also maybe modelRestore != nullptr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be modelRestore.get() != nullptr
, hmm operator bool is a valid C++ construct on auto pointers. I do not mind, but we should make a note of it and be consistent everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the .get()
. There's an overload of the equality operator where rhs
is nullptr_t
: http://www.cplusplus.com/reference/memory/unique_ptr/operators/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to generally be moving towards making the condition as explicit as possible, even using == false
verses !
, on grounds that it is easier to see the intention at a glance. Using != nullptr
seems more consistent with that philosophy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with that, lucene enforces this style as well.
lib/api/CForecastRunner.cc
Outdated
@@ -306,14 +362,20 @@ bool CForecastRunner::pushForecastJob(const std::string &controlMessage, | |||
atLeastOneSupportedFunction = atLeastOneSupportedFunction || prerequisites.s_IsSupportedFunction; | |||
totalMemoryUsage += prerequisites.s_MemoryUsageForDetector; | |||
|
|||
if (totalMemoryUsage >= MAX_FORECAST_MODEL_MEMORY) | |||
if (totalMemoryUsage >= MAX_FORECAST_MODEL_MEMORY && forecastJob.s_TemporaryFolder.empty ()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some non-standard spaces between function name and argument lists in a few places, such as here. I guess once we've agreed on auto-formatting tooling this goes away, but would be nice not to introduce anything we won't accept in the meantime.
lib/api/CForecastRunner.cc
Outdated
{ | ||
boost::filesystem::path temporaryFolder (forecastJob.s_TemporaryFolder); | ||
|
||
if (!sufficientAvailableDiskSpace(temporaryFolder)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this->
modelParams.s_DecayRate, | ||
modelParams.s_BucketLength, | ||
modelParams.s_ComponentSize}, | ||
modelParams.distributionRestoreParams(dataType)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not something you introduced, but it feels like it would be nice to have a method on SModelParams
to get this object. Also you've mixed brackets and braces and indentation levels, which makes this hard to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few minor comments.
There are also some places where you have {
on the same line as a class declaration or if
statement. I guess if we're definitely going to do an automated reformat soon it doesn't matter though...
include/core/RestoreMacros.h
Outdated
return false; \ | ||
} \ | ||
target = enumtype(value); \ | ||
restoreSuccess = true; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line contains tabs.
include/api/CForecastRunner.h
Outdated
// Note: This value us lower than on X-pack side to prevent side-effects, | ||
// if you change this value also change the limit on X-pack side. | ||
//! minimum disk space required for disk persistence | ||
static const size_t MIN_FORECAST_AVAILABLE_DISK_SPACE = 4294967296; // 4GB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in C++11 a number with no size suffix can be int
, long
or long long
, whichever is the smallest it will fit into. But in C++98 it was only int
or long
. Visual Studio 2013 doesn't implement the complete C++11 standard, so this might cause an error when backporting, because long
is still 32 bits on Windows. You could add ull
to the number now or just watch out for a compilation problem on Windows when backporting to 6.x - Visual Studio 2013 might be OK with it as-is.
lib/api/CForecastRunner.cc
Outdated
<< " MB), using disk."); | ||
|
||
|
||
boost::filesystem::create_directories(temporaryFolder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be wrapped in a try
/catch
in case it throws an unexpected exception.
lib/api/CForecastRunner.cc
Outdated
if (!forecastJob.s_TemporaryFolder.empty()) | ||
{ | ||
boost::filesystem::path temporaryFolder (forecastJob.s_TemporaryFolder); | ||
boost::filesystem::remove_all(temporaryFolder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be wrapped in try
/ catch
. If it's a best effort thing then we can ignore the exception.
lib/api/CForecastRunner.cc
Outdated
|
||
if (errorCode) | ||
{ | ||
LOG_ERROR("Failed to retrieve disk information for " << path << " error " << errorCode.value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to include errorCode.message()
in the output as well as the number. Otherwise it will be like the moon landings where users faced with "error 1202" have to contact mission control to find out what's actually gone wrong.
LOG_DEBUG("| CForecastModelPersistTest::testPersistAndRestore |"); | ||
LOG_DEBUG("+----------------------------------------------------+"); | ||
|
||
core_t::TTime bucketLength{1800}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is indented with a tab.
restoredModel->checksum(42)); | ||
|
||
CPPUNIT_ASSERT(!restorer.nextModel(restoredModel, restoredFeature, restoredByFieldValue)); | ||
std::remove(persistedModels.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does CRestore
only close its file in its destructor? If so, you may find this remove()
fails on Windows because the file is still open. (Linux will let you delete an open file but Windows won't unless it was explicitly opened with share mode delete.) In other tests that have this problem we've put most of the code in an inner block so the appropriate destructor gets called before removing files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this now although still marked as WIP, but I think this is fine to merge.
superseded by #36 |
This implements the C++ side of forecast persistence. An additional parameter allows the forecast runner to persist models on disk for temporary purposes. Models are loaded back into memory one by one.
For models smaller than the current limit of 20MB nothing changes.
X-Pack part: https://github.com/elastic/x-pack-elasticsearch/pull/4134