Aggregation to calculate the moving average on a histogram aggregation #10002

polyfractal · 2015-03-05T21:02:17Z

This aggregation will calculate the moving average of sibling metrics in histogram-style data (histogram, date_histogram). Moving averages are useful when time series data is locally stationary and has a mean that changes slowly over time.

Seasonal data may need a different analysis, as well as data that is bimodal, "bursty" or contains frequent extreme values (which are not necessarily outliers).

The movavg aggregation supports several configurable options:

Window Size

The user specifies the window size they wish to calculate a moving average for. E.g. a user may want a 30-day sliding window over a histogram of 90 days total.

Currently, if there is not enough data to "fill" the window, the moving average will be calculated with whatever is available. For example, if a user selects 30-day window, days 1-29 will calculate the moving average with between 1-29 days of data.

We could investigate adding more "edge policies", which determine how to handle gaps at the edge of the moving average

Weighting Type

Currently, the agg supports four types of weighting:

simple: A simple (arithmetic) average. Default.
linear: A linearly weighted average, such that data becomes linearly less important as it gets "older" in the window
single_exp: Single exponentially weighted average (aka EWMA or Brown's Simple Exp Smoothing), such that data becomes exponentially less important as it get's "older".
double_exp: Double exponentially weighted average (aka Holt-Winters). Uses two exponential terms: first smooth data exponentially like single_exp, but then apply second corrective smoothing to account for a trend.

Todo: Expose alpha and beta

Alpha and beta are parameters which control the behavior of single_exp and double_exp.

Alpha: controls how far the single exponential smoothing term lags behind the "turning points" in the mean by 1/alpha periods. Alpha = 1 means the smoothing term has no memory (period of 1), and emulates a random walk. Alpha = 0 means the smoothing term has infinite memory and reports the mean of the data
Beta: Only used in double_exp. Analogous to alpha, but applied to the trend smoothing rather than the data smoothing.

Todo: Investigate metric-weighting

It's sometimes useful to weight a time period not by it's distance from the current time, but rather by some metric that happened in that time interval. E.g. weight by the volume of transactions that happened on that day.

It should be possible to weight based on metrics within the bucket, although it could get complicated if the value is missing.

Sample Request

This will calculate a moving average (sliding window of three days) over the sum of prices in each day:

GET /test/_search?search_type=count
{
   "aggs": {
      "my_date_histo": {
         "date_histogram": {
            "field": "@timestamp",
            "interval": "day"
         },
         "aggs": {
            "the_sum": {
               "sum": {
                  "field": "price"
               }
            },
            "the_movavg": {
               "movavg": {
                  "bucketsPath": "the_sum",
                  "window": 3
               }
            }
         }
      }
   }
}

Sample Response

{
   "took": 3,
   "timed_out": false,
   "aggregations": {
      "my_date_histo": {
         "buckets": [
            {
               "key_as_string": "2014-12-01T00:00:00.000Z",
               "key": 1417392000000,
               "doc_count": 1,
               "the_sum": {
                  "value": 1,
                  "value_as_string": "1.0"
               },
               "the_movavg": {
                  "value": 1
               }
            },
            {
               "key_as_string": "2014-12-02T00:00:00.000Z",
               "key": 1417478400000,
               "doc_count": 1,
               "the_sum": {
                  "value": 2,
                  "value_as_string": "2.0"
               },
               "the_movavg": {
                  "value": 1.5
               }
            },
            {
               "key_as_string": "2014-12-04T00:00:00.000Z",
               "key": 1417651200000,
               "doc_count": 1,
               "the_sum": {
                  "value": 4,
                  "value_as_string": "4.0"
               },
               "the_movavg": {
                  "value": 2.3333333333333335
               }
            },
            {
               "key_as_string": "2014-12-05T00:00:00.000Z",
               "key": 1417737600000,
               "doc_count": 1,
               "the_sum": {
                  "value": 5,
                  "value_as_string": "5.0"
               },
               "the_movavg": {
                  "value": 3.6666666666666665
               }
            },
            {
               "key_as_string": "2014-12-08T00:00:00.000Z",
               "key": 1417996800000,
               "doc_count": 1,
               "the_sum": {
                  "value": 8,
                  "value_as_string": "8.0"
               },
               "the_movavg": {
                  "value": 5.666666666666667
               }
            },
            {
               "key_as_string": "2014-12-09T00:00:00.000Z",
               "key": 1418083200000,
               "doc_count": 1,
               "the_sum": {
                  "value": 9,
                  "value_as_string": "9.0"
               },
               "the_movavg": {
                  "value": 7.333333333333333
               }
            }
         ]
      }
   }
}

The text was updated successfully, but these errors were encountered:

polyfractal · 2015-04-08T14:50:56Z

Added in #10024

Adds a new type of aggregation called 'reducers' which act on the output of aggregations and compute extra information that they add to the aggregation tree. Reducers look much like any other aggregation in the request but have a buckets_path parameter which references the aggregation(s) to use. Internally there are two types of reducer; the first is given the output of its parent aggregation and computes new aggregations to add to the buckets of its parent, and the second (a specialisation of the first) is given a sibling aggregation and outputs an aggregation to be a sibling at the same level as that aggregation. This PR includes the framework for the reducers, the derivative reducer (#9293), the moving average reducer(#10002) and the maximum bucket reducer(#10000). These reducer implementations are not all yet fully complete. Known work left to do (these points will be done once this PR is merged into the master branch): Add x-axis normalisation to the derivative reducer Add lots more JUnit tests for all reducers Contributes to #9876 Closes #10002 Closes #9293 Closes #10000

elnur · 2017-05-21T16:14:55Z

Anything like that for moving maximum?

polyfractal · 2017-06-08T17:41:19Z

@elnur Just opened a PR for this functionality, see #25137

evanceheallyg · 2017-08-28T19:23:13Z

@polyfractal Is there any workaround to specify dynamic window? In our use case, we need to calculate the moving average for dynamic sliding window based on the number of months selected by the user. Ex, if the data set is for 1 year, then window should be 2, 2 years = 24 and so on.. Any thoughts?

polyfractal · 2017-08-29T14:35:08Z

@evanceheallyg You'll have to determine the range of the data up-front at the moment, there's no way to specify the number of partitions rather than the size of each partition.

I'm not sure we'd be able to support that kind of functionality though. Moving average works on discrete buckets. So if your histogram has 10 buckets, but you request "9 partitions", we'd have to put 0.9 buckets into each partition... which isn't doable. The only way it'd work is if the number of partitions is a multiple/divisor of the histogram interval, which starts to get very unintuitive.

I think the best thing to do is just run a pre-aggregation to find the min/max of your dataset and then scale the window size accordingly.

evanceheallyg · 2017-08-31T04:04:52Z

@polyfractal Thanks for the explanation and support. The problem is that we are using "https://github.com/PhaedrusTheGreek/transform_vis" for rendering the moving average in the dashboard and the DASHBOARD_CONTEXT used to filter is automatically fetched from dashboard filters and not able to control the same. Do you foresee any workaround in this situation.

Also, is there any workaround for the edge policies. Since I work in aerospace domain, its mandatory for us to calculate the MA with the previous set of data points, for Ex, if the plotting data set starts with Jan 2017, still we have to sum up the value from Jan 2017 and 11 months back, and move on.

colings86 mentioned this issue Mar 5, 2015

Add ability to perform computations on aggregations #9876

Closed

24 tasks

$@polyfractal$ polyfractal added >feature v2.0.0-beta1 :Analytics/Aggregations Aggregations labels Mar 5, 2015

$@polyfractal$ polyfractal self-assigned this Mar 5, 2015

$@polyfractal$ polyfractal mentioned this issue Mar 6, 2015

Aggregations: Add moving average aggregation #10024

Closed

$@polyfractal$ polyfractal closed this as completed Apr 8, 2015

colings86 mentioned this issue Apr 13, 2015

Pipeline aggregations: Ability to perform computations on aggregations #10568

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregation to calculate the moving average on a histogram aggregation #10002

Aggregation to calculate the moving average on a histogram aggregation #10002

polyfractal commented Mar 5, 2015

polyfractal commented Apr 8, 2015

elnur commented May 21, 2017

polyfractal commented Jun 8, 2017

evanceheallyg commented Aug 28, 2017

polyfractal commented Aug 29, 2017

evanceheallyg commented Aug 31, 2017

Aggregation to calculate the moving average on a histogram aggregation #10002

Aggregation to calculate the moving average on a histogram aggregation #10002

Comments

polyfractal commented Mar 5, 2015

Window Size

Weighting Type

Todo: Expose alpha and beta

Todo: Investigate metric-weighting

Sample Request

Sample Response

polyfractal commented Apr 8, 2015

elnur commented May 21, 2017

polyfractal commented Jun 8, 2017

evanceheallyg commented Aug 28, 2017

polyfractal commented Aug 29, 2017

evanceheallyg commented Aug 31, 2017