Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Use off-heap storage for forecasting large models #22

Closed
wants to merge 7 commits into from
Closed

[ML] Use off-heap storage for forecasting large models #22

wants to merge 7 commits into from

Conversation

hendrikmuhs
Copy link

This implements the C++ side of forecast persistence. An additional parameter allows the forecast runner to persist models on disk for temporary purposes. Models are loaded back into memory one by one.

For models smaller than the current limit of 20MB nothing changes.

X-Pack part: https://github.com/elastic/x-pack-elasticsearch/pull/4134

Hendrik Muhs added 2 commits March 20, 2018 11:56
Re-factor minimumSeasonalVarianceScale to make it available as method.

This is required in order to restore models from a file stream, more precisely it is required as a parameter for maths::CModelParams which is required for maths::SModelRestoreParams


boost::filesystem::create_directories(temporaryFolder);
LOG_INFO("Persisting to: " << temporaryFolder.string());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall be DEBUG

Copy link
Contributor

@tveasey tveasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good on a quick first pass. I'll have another slightly more careful read through on Monday. I left some minor suggestions.

//! max memory allowed to use for forecast models persisting to disk
static const size_t MAX_FORECAST_MODEL_PERSISTANCE_MEMORY = 524288000; // 500MB

// Note: This value us lower than on X-pack side to prevent side-effects,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

us |-> is

@@ -82,6 +83,14 @@ class API_EXPORT CForecastRunner final: private core::CNonCopyable
//! max memory allowed to use for forecast models
static const size_t MAX_FORECAST_MODEL_MEMORY = 20971520; // 20MB

//! max memory allowed to use for forecast models persisting to disk
static const size_t MAX_FORECAST_MODEL_PERSISTANCE_MEMORY = 524288000; // 500MB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale for this being so much less than MIN_FORECAST_AVAILABLE_DISK_SPACE?

Copy link
Author

@hendrikmuhs hendrikmuhs Mar 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want to fill the disk no matter how large the models are. So I keep some room.

Note that MAX_FORECAST_MODEL_PERSISTANCE_MEMORY is actually the size in memory, not persistence. I haven't made tests yet, but I assume that writing it as json has some expansion factor. So the size on disk will be higher than the size in memory. I do not think we need to exactly calculate that but only make a worst case assumption.

500MB is totally educated guess, I wonder if we ever have forecast models of that size.

I will add some more code docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are I see, that makes sense, ok cool plus we can always revisit if we think this is too conservative. +1 to a little more documentation on this.

return false; \
} \
target = enumtype(value); \
restoreSuccess = true; \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing \

Copy link
Contributor

@tveasey tveasey Mar 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I wonder whether we should check to see if we actually restored the enum in this macro. I'm inclined to say we shouldn't and ditch restoreSuccess.

We generally don't bother to check if the field was missing altogether from the state document. Sometimes we can cope with default initialising a new field and it doesn't have to be present in the original state document, so this cuts down usability.

If you want to check what fields are present in the state document you can always fill in an unordered string set in the loop which does the restoring, i.e.

boost::unordered_set<std::string> restored;
do
{
    restored.insert(traverser.name());
    ...
}
while (traverser.next());
if (restored.count(MY_FIELD_I_WANT_TO_BE_PRESENT_TAG) > 0 && ...)
{
    return false;
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW: I first used a E_UNKNOWN state in the enum but that had to many side-effects on the existing code base. Other parts art not affected as I can initialize them to something invalid and check whether it has been set. So this only applies to enums, while enums are only used here.

For me the non-existence of the field is a bug.

I can use the suggested trick, but this looks more complicated to me than this 1 bool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely is more complicated for this case, but I want this macro to be as widely useable as possible.

Suppose we introduce a new enum member to an existing class, but feel we can happily get away with default initialising it to something. Then we don't want this extra "check a value exists". Now we could always go ahead and just ignore the bool, but that feels wrong.

That is the rationale for this suggestion, make the macro as general and simple as possible and if we need special handling deal with that elsewhere. As it is this feels like odd functionality out to have just this one case check for existence of a field.

Another option would be to have some general functionality which allows us to force that a list field values is actually read in a given loop. Something like registerManditoryFields on the traverser?

static const std::string FEATURE_TAG;
static const std::string DATA_TYPE_TAG;
static const std::string MODEL_TAG;
static const std::string BY_FIELD_VALUE_TAG;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my usual comment, can we move to cc? Looks like these (at least FORECAST_MODEL_PERSIST_TAG) are only referenced in cc, so just defining them in unnamed namespace cuts down code and cleans up class definition.

series.s_MinimumSeasonalVarianceScale,
series.s_ToForecastPersisted));
// if in memory models are empty check if we can load persisted ones
if (series.s_ToForecast.empty() && modelRestore)
Copy link
Contributor

@tveasey tveasey Mar 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely modelRestore can't be null if we get here (although see below).

@@ -214,7 +245,25 @@ void CForecastRunner::forecastWorker()
lastStatsUpdate = elapsedTime;
}
}

// if in memory models are empty check if we can load persisted ones
if (series.s_ToForecast.empty() && modelRestore)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move into lambda to avoid code duplication also maybe modelRestore != nullptr.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be modelRestore.get() != nullptr, hmm operator bool is a valid C++ construct on auto pointers. I do not mind, but we should make a note of it and be consistent everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the .get(). There's an overload of the equality operator where rhs is nullptr_t: http://www.cplusplus.com/reference/memory/unique_ptr/operators/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to generally be moving towards making the condition as explicit as possible, even using == false verses !, on grounds that it is easier to see the intention at a glance. Using != nullptr seems more consistent with that philosophy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with that, lucene enforces this style as well.

@@ -306,14 +362,20 @@ bool CForecastRunner::pushForecastJob(const std::string &controlMessage,
atLeastOneSupportedFunction = atLeastOneSupportedFunction || prerequisites.s_IsSupportedFunction;
totalMemoryUsage += prerequisites.s_MemoryUsageForDetector;

if (totalMemoryUsage >= MAX_FORECAST_MODEL_MEMORY)
if (totalMemoryUsage >= MAX_FORECAST_MODEL_MEMORY && forecastJob.s_TemporaryFolder.empty ())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some non-standard spaces between function name and argument lists in a few places, such as here. I guess once we've agreed on auto-formatting tooling this goes away, but would be nice not to introduce anything we won't accept in the meantime.

{
boost::filesystem::path temporaryFolder (forecastJob.s_TemporaryFolder);

if (!sufficientAvailableDiskSpace(temporaryFolder))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this->

modelParams.s_DecayRate,
modelParams.s_BucketLength,
modelParams.s_ComponentSize},
modelParams.distributionRestoreParams(dataType)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something you introduced, but it feels like it would be nice to have a method on SModelParams to get this object. Also you've mixed brackets and braces and indentation levels, which makes this hard to read.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few minor comments.

There are also some places where you have { on the same line as a class declaration or if statement. I guess if we're definitely going to do an automated reformat soon it doesn't matter though...

return false; \
} \
target = enumtype(value); \
restoreSuccess = true; \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line contains tabs.

// Note: This value us lower than on X-pack side to prevent side-effects,
// if you change this value also change the limit on X-pack side.
//! minimum disk space required for disk persistence
static const size_t MIN_FORECAST_AVAILABLE_DISK_SPACE = 4294967296; // 4GB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in C++11 a number with no size suffix can be int, long or long long, whichever is the smallest it will fit into. But in C++98 it was only int or long. Visual Studio 2013 doesn't implement the complete C++11 standard, so this might cause an error when backporting, because long is still 32 bits on Windows. You could add ull to the number now or just watch out for a compilation problem on Windows when backporting to 6.x - Visual Studio 2013 might be OK with it as-is.

<< " MB), using disk.");


boost::filesystem::create_directories(temporaryFolder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be wrapped in a try/catch in case it throws an unexpected exception.

if (!forecastJob.s_TemporaryFolder.empty())
{
boost::filesystem::path temporaryFolder (forecastJob.s_TemporaryFolder);
boost::filesystem::remove_all(temporaryFolder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be wrapped in try / catch. If it's a best effort thing then we can ignore the exception.


if (errorCode)
{
LOG_ERROR("Failed to retrieve disk information for " << path << " error " << errorCode.value());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to include errorCode.message() in the output as well as the number. Otherwise it will be like the moon landings where users faced with "error 1202" have to contact mission control to find out what's actually gone wrong.

LOG_DEBUG("| CForecastModelPersistTest::testPersistAndRestore |");
LOG_DEBUG("+----------------------------------------------------+");

core_t::TTime bucketLength{1800};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is indented with a tab.

restoredModel->checksum(42));

CPPUNIT_ASSERT(!restorer.nextModel(restoredModel, restoredFeature, restoredByFieldValue));
std::remove(persistedModels.c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does CRestore only close its file in its destructor? If so, you may find this remove() fails on Windows because the file is still open. (Linux will let you delete an open file but Windows won't unless it was explicitly opened with share mode delete.) In other tests that have this problem we've put most of the code in an inner block so the appropriate destructor gets called before removing files.

@sophiec20 sophiec20 added :ml and removed :ml labels Mar 28, 2018
Copy link
Contributor

@tveasey tveasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this now although still marked as WIP, but I think this is fine to merge.

@hendrikmuhs
Copy link
Author

superseded by #36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants