[ML] Use off-heap storage for forecasting large models #22

hendrikmuhs · 2018-03-23T14:46:38Z

This implements the C++ side of forecast persistence. An additional parameter allows the forecast runner to persist models on disk for temporary purposes. Models are loaded back into memory one by one.

For models smaller than the current limit of 20MB nothing changes.

X-Pack part: https://github.com/elastic/x-pack-elasticsearch/pull/4134

Re-factor minimumSeasonalVarianceScale to make it available as method. This is required in order to restore models from a file stream, more precisely it is required as a parameter for maths::CModelParams which is required for maths::SModelRestoreParams

hendrikmuhs · 2018-03-23T14:47:58Z

lib/api/CForecastRunner.cc

+
+
+        boost::filesystem::create_directories(temporaryFolder);
+        LOG_INFO("Persisting to: " << temporaryFolder.string());


shall be DEBUG

tveasey

This looks pretty good on a quick first pass. I'll have another slightly more careful read through on Monday. I left some minor suggestions.

tveasey · 2018-03-23T14:50:10Z

include/api/CForecastRunner.h

+        //! max memory allowed to use for forecast models persisting to disk
+        static const size_t MAX_FORECAST_MODEL_PERSISTANCE_MEMORY = 524288000; // 500MB
+
+        // Note: This value us lower than on X-pack side to prevent side-effects,


tveasey · 2018-03-23T14:51:08Z

include/api/CForecastRunner.h

@@ -82,6 +83,14 @@ class API_EXPORT CForecastRunner final: private core::CNonCopyable
        //! max memory allowed to use for forecast models
        static const size_t MAX_FORECAST_MODEL_MEMORY = 20971520; // 20MB

+        //! max memory allowed to use for forecast models persisting to disk
+        static const size_t MAX_FORECAST_MODEL_PERSISTANCE_MEMORY = 524288000; // 500MB


What's the rationale for this being so much less than MIN_FORECAST_AVAILABLE_DISK_SPACE?

I do not want to fill the disk no matter how large the models are. So I keep some room.

Note that MAX_FORECAST_MODEL_PERSISTANCE_MEMORY is actually the size in memory, not persistence. I haven't made tests yet, but I assume that writing it as json has some expansion factor. So the size on disk will be higher than the size in memory. I do not think we need to exactly calculate that but only make a worst case assumption.

500MB is totally educated guess, I wonder if we ever have forecast models of that size.

I will add some more code docs.

Are I see, that makes sense, ok cool plus we can always revisit if we think this is too conservative. +1 to a little more documentation on this.

tveasey · 2018-03-23T14:52:24Z

include/core/RestoreMacros.h

+                return false;                                                         \
+            }                                                                         \
+            target = enumtype(value);                                                 \
+            restoreSuccess = true;													  \


Also, I wonder whether we should check to see if we actually restored the enum in this macro. I'm inclined to say we shouldn't and ditch restoreSuccess.

We generally don't bother to check if the field was missing altogether from the state document. Sometimes we can cope with default initialising a new field and it doesn't have to be present in the original state document, so this cuts down usability.

If you want to check what fields are present in the state document you can always fill in an unordered string set in the loop which does the restoring, i.e.

boost::unordered_set<std::string> restored; do { restored.insert(traverser.name()); ... } while (traverser.next()); if (restored.count(MY_FIELD_I_WANT_TO_BE_PRESENT_TAG) > 0 && ...) { return false; }

FWIW: I first used a E_UNKNOWN state in the enum but that had to many side-effects on the existing code base. Other parts art not affected as I can initialize them to something invalid and check whether it has been set. So this only applies to enums, while enums are only used here.

For me the non-existence of the field is a bug.

I can use the suggested trick, but this looks more complicated to me than this 1 bool.

It definitely is more complicated for this case, but I want this macro to be as widely useable as possible.

Suppose we introduce a new enum member to an existing class, but feel we can happily get away with default initialising it to something. Then we don't want this extra "check a value exists". Now we could always go ahead and just ignore the bool, but that feels wrong.

That is the rationale for this suggestion, make the macro as general and simple as possible and if we need special handling deal with that elsewhere. As it is this feels like odd functionality out to have just this one case check for existence of a field.

Another option would be to have some general functionality which allows us to force that a list field values is actually read in a given loop. Something like registerManditoryFields on the traverser?

tveasey · 2018-03-23T15:04:03Z

include/model/CForecastModelPersist.h

+        static const std::string FEATURE_TAG;
+        static const std::string DATA_TYPE_TAG;
+        static const std::string MODEL_TAG;
+        static const std::string BY_FIELD_VALUE_TAG;


As per my usual comment, can we move to cc? Looks like these (at least FORECAST_MODEL_PERSIST_TAG) are only referenced in cc, so just defining them in unnamed namespace cuts down code and cleans up class definition.

tveasey · 2018-03-23T15:06:44Z

lib/api/CForecastRunner.cc

+                                                                            series.s_MinimumSeasonalVarianceScale,
+                                                                            series.s_ToForecastPersisted));
+                    // if in memory models are empty check if we can load persisted ones
+                    if (series.s_ToForecast.empty() && modelRestore)


Surely modelRestore can't be null if we get here (although see below).

tveasey · 2018-03-23T15:09:41Z

lib/api/CForecastRunner.cc

@@ -214,7 +245,25 @@ void CForecastRunner::forecastWorker()
                            lastStatsUpdate = elapsedTime;
                        }
                    }
+
+                    // if in memory models are empty check if we can load persisted ones
+                    if (series.s_ToForecast.empty() && modelRestore)


Could we move into lambda to avoid code duplication also maybe modelRestore != nullptr.

that would be modelRestore.get() != nullptr, hmm operator bool is a valid C++ construct on auto pointers. I do not mind, but we should make a note of it and be consistent everywhere.

You don't need the .get(). There's an overload of the equality operator where rhs is nullptr_t: http://www.cplusplus.com/reference/memory/unique_ptr/operators/

We seem to generally be moving towards making the condition as explicit as possible, even using == false verses !, on grounds that it is easier to see the intention at a glance. Using != nullptr seems more consistent with that philosophy.

I am fine with that, lucene enforces this style as well.

tveasey · 2018-03-23T15:12:48Z

lib/api/CForecastRunner.cc

@@ -306,14 +362,20 @@ bool CForecastRunner::pushForecastJob(const std::string &controlMessage,
        atLeastOneSupportedFunction = atLeastOneSupportedFunction || prerequisites.s_IsSupportedFunction;
        totalMemoryUsage += prerequisites.s_MemoryUsageForDetector;

-        if (totalMemoryUsage >= MAX_FORECAST_MODEL_MEMORY)
+        if (totalMemoryUsage >= MAX_FORECAST_MODEL_MEMORY && forecastJob.s_TemporaryFolder.empty ())


There are some non-standard spaces between function name and argument lists in a few places, such as here. I guess once we've agreed on auto-formatting tooling this goes away, but would be nice not to introduce anything we won't accept in the meantime.

tveasey · 2018-03-23T15:14:00Z

lib/api/CForecastRunner.cc

+    {
+        boost::filesystem::path temporaryFolder (forecastJob.s_TemporaryFolder);
+
+        if (!sufficientAvailableDiskSpace(temporaryFolder))


tveasey · 2018-03-23T15:19:29Z

lib/model/CForecastModelPersist.cc

+                    modelParams.s_DecayRate,
+                    modelParams.s_BucketLength,
+                    modelParams.s_ComponentSize},
+                    modelParams.distributionRestoreParams(dataType)};


Not something you introduced, but it feels like it would be nice to have a method on SModelParams to get this object. Also you've mixed brackets and braces and indentation levels, which makes this hard to read.

droberts195

I left a few minor comments.

There are also some places where you have { on the same line as a class declaration or if statement. I guess if we're definitely going to do an automated reformat soon it doesn't matter though...

droberts195 · 2018-03-23T15:05:39Z

include/core/RestoreMacros.h

+                return false;                                                         \
+            }                                                                         \
+            target = enumtype(value);                                                 \
+            restoreSuccess = true;													  \


This line contains tabs.

droberts195 · 2018-03-23T15:08:41Z

include/api/CForecastRunner.h

+        // Note: This value us lower than on X-pack side to prevent side-effects,
+        // if you change this value also change the limit on X-pack side.
+        //! minimum disk space required for disk persistence
+        static const size_t MIN_FORECAST_AVAILABLE_DISK_SPACE = 4294967296; // 4GB


I think in C++11 a number with no size suffix can be int, long or long long, whichever is the smallest it will fit into. But in C++98 it was only int or long. Visual Studio 2013 doesn't implement the complete C++11 standard, so this might cause an error when backporting, because long is still 32 bits on Windows. You could add ull to the number now or just watch out for a compilation problem on Windows when backporting to 6.x - Visual Studio 2013 might be OK with it as-is.

droberts195 · 2018-03-23T15:31:44Z

lib/api/CForecastRunner.cc

+            << " MB), using disk.");
+
+
+        boost::filesystem::create_directories(temporaryFolder);


I think this should be wrapped in a try/catch in case it throws an unexpected exception.

droberts195 · 2018-03-23T15:35:23Z

lib/api/CForecastRunner.cc

+            if (!forecastJob.s_TemporaryFolder.empty())
+            {
+                boost::filesystem::path temporaryFolder (forecastJob.s_TemporaryFolder);
+                boost::filesystem::remove_all(temporaryFolder);


Should be wrapped in try / catch. If it's a best effort thing then we can ignore the exception.

droberts195 · 2018-03-23T15:41:24Z

lib/api/CForecastRunner.cc

+
+    if (errorCode)
+    {
+        LOG_ERROR("Failed to retrieve disk information for " << path << " error " << errorCode.value());


It would be good to include errorCode.message() in the output as well as the number. Otherwise it will be like the moon landings where users faced with "error 1202" have to contact mission control to find out what's actually gone wrong.

droberts195 · 2018-03-23T15:45:50Z

lib/model/unittest/CForecastModelPersistTest.cc

+    LOG_DEBUG("|  CForecastModelPersistTest::testPersistAndRestore  |");
+    LOG_DEBUG("+----------------------------------------------------+");
+
+	core_t::TTime bucketLength{1800};


This line is indented with a tab.

droberts195 · 2018-03-23T15:50:10Z

lib/model/unittest/CForecastModelPersistTest.cc

+        restoredModel->checksum(42));
+
+    CPPUNIT_ASSERT(!restorer.nextModel(restoredModel, restoredFeature, restoredByFieldValue));
+    std::remove(persistedModels.c_str());


Does CRestore only close its file in its destructor? If so, you may find this remove() fails on Windows because the file is still open. (Linux will let you delete an open file but Windows won't unless it was explicitly opened with share mode delete.) In other tests that have this problem we've put most of the code in an inner block so the appropriate destructor gets called before removing files.

tveasey

I'm happy with this now although still marked as WIP, but I think this is fine to merge.

hendrikmuhs · 2018-04-05T12:23:09Z

superseded by #36

Hendrik Muhs added 2 commits March 20, 2018 11:56

implement forecast temporary persistence to scale out for larger models

bf0f895

hendrikmuhs added v7.0.0 WIP :ml labels Mar 23, 2018

hendrikmuhs commented Mar 23, 2018

View reviewed changes

tveasey reviewed Mar 23, 2018

View reviewed changes

droberts195 reviewed Mar 23, 2018

View reviewed changes

Hendrik Muhs added 5 commits March 26, 2018 14:07

refactor restore macro for enums

61f1ea0

move constants out of header

a2d3f1b

address review comments

0027b86

improve unit test

989695c

fix logic

25edaaf

sophiec20 added :ml and removed :ml labels Mar 28, 2018

tveasey approved these changes Mar 28, 2018

View reviewed changes

hendrikmuhs mentioned this pull request Apr 5, 2018

[ML] Use disk storage for forecasting large models #36

Closed

hendrikmuhs closed this Apr 5, 2018

davidkyle mentioned this pull request Jun 20, 2023

[NLP] Catch exceptions thrown during inference and report as errors #2542

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Use off-heap storage for forecasting large models #22

[ML] Use off-heap storage for forecasting large models #22

hendrikmuhs commented Mar 23, 2018

hendrikmuhs Mar 23, 2018

tveasey left a comment

tveasey Mar 23, 2018

tveasey Mar 23, 2018

hendrikmuhs Mar 23, 2018 •

edited

Loading

tveasey Mar 23, 2018

tveasey Mar 23, 2018

tveasey Mar 23, 2018 •

edited

Loading

hendrikmuhs Mar 23, 2018

tveasey Mar 23, 2018

tveasey Mar 23, 2018

tveasey Mar 23, 2018 •

edited

Loading

tveasey Mar 23, 2018

hendrikmuhs Mar 23, 2018

droberts195 Mar 23, 2018

tveasey Mar 23, 2018

hendrikmuhs Mar 23, 2018

tveasey Mar 23, 2018

tveasey Mar 23, 2018

tveasey Mar 23, 2018

droberts195 left a comment

droberts195 Mar 23, 2018

droberts195 Mar 23, 2018

droberts195 Mar 23, 2018

droberts195 Mar 23, 2018

droberts195 Mar 23, 2018

droberts195 Mar 23, 2018

droberts195 Mar 23, 2018

tveasey left a comment

hendrikmuhs commented Apr 5, 2018



		boost::filesystem::create_directories(temporaryFolder);
		LOG_INFO("Persisting to: " << temporaryFolder.string());

		<< " MB), using disk.");


		boost::filesystem::create_directories(temporaryFolder);

[ML] Use off-heap storage for forecasting large models #22

[ML] Use off-heap storage for forecasting large models #22

Conversation

hendrikmuhs commented Mar 23, 2018

Choose a reason for hiding this comment

tveasey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hendrikmuhs Mar 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tveasey Mar 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tveasey Mar 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

droberts195 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tveasey left a comment

Choose a reason for hiding this comment

hendrikmuhs commented Apr 5, 2018

hendrikmuhs Mar 23, 2018 •

edited

Loading

tveasey Mar 23, 2018 •

edited

Loading

tveasey Mar 23, 2018 •

edited

Loading