Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] Add moving average aggregation #61777

Closed
timroes opened this issue Mar 30, 2020 · 16 comments
Closed

[Lens] Add moving average aggregation #61777

timroes opened this issue Mar 30, 2020 · 16 comments
Labels
enhancement New value added to drive a business result Feature:Lens Project:LensDefault Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@timroes
Copy link
Contributor

timroes commented Mar 30, 2020

Add a moving average aggregation to Lens. See also #56696 for more discussion.

@timroes timroes added enhancement New value added to drive a business result Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Mar 30, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@wylieconlon
Copy link
Contributor

Open questions:

  • Do we use exponential smoothing?
  • Should the chart rendering be specialized in any way, such as showing the moving average and underlying data at the same time but with different colors?

@wylieconlon
Copy link
Contributor

The definition of a moving average for our purposes is any function which smoothes sequential values in a date histogram. The main goal of a moving average function is to smooth out temporary fluctuations in the data, and highlight longer-term trends.

There are several types of commonly-used moving averages. All of the types require a window size, which indicates how far to look back and look ahead of the current point to determine smoothing. The types are:

Which moving averages should Lens offer?

I think that Lens should offer the complete set of moving average functions because there are good use cases for each, and we shouldn't pick only one. However, because TSVB already offers the complete set of functions, we do have an escape for users who need this kind of smoothing, so it might be acceptable to release a partial list in Lens.

The main factors that will determine which moving average functions are available in Lens are:

  • Technical complexity of implementing these functions
  • Interface complexity

User inputs

There are three ways of allowing users to define the window:

a) Fewest options: This is what Visualize and TSVB offer. Window is defined as a number of buckets before the current value, and the user can't configure the position of the window
b) Window + Shift: This is what Elasticsearch offers in their API. The user can slide the window forward or backwards.
c) Start and end: This is what PostgreSQL offers. Window functions are defined as [start, end] such as [-2, 2].

My opinion is that the "start and end" style is best for users. Here's a comparison table of how the configuration would work for these:

Description Configuration A Configuration B Configuration C
Look at the previous 5 intervals before the current, not including the current Window: 5 Window: 5, Offset: 0 Window: [-6, -1]
Look at the previous 2 values, current value, and next 2 values Not possible Window: 5, Offset: 3 Window: [-2, 2]
Look at the current value and next 5 values Not possible Window: 6, Offset: 5 Window: [0, 5]
Look at all values before the current value Not possible Not possible [null, -1]

There are other parameters that are specifically for the exponential moving average functions. These are typically called alpha, beta, and gamma. Holt-Winters also requires 2 other inputs. I think that the form we use for these cases can be similar to what TSVB offers:

TSVB Holt-Winters config

Finally, like derivatives there is a gap-skipping option that should be support by the Moving Average function.

Form design

Based on the required user inputs above, the minimum options we need are:

  • Field selector
  • Type of moving average
  • Window definition (I prefer defining the window as [start, end])
  • Gap policy

For exponential functions we need to offer all of the extra options as tuneable parameters.

So we could take the TSVB form and make the following changes:

  • Change the window from a single value to a range of values
  • Add gap policy as an option

TSVB simple form

Table examples

I am not including table example because I think it's clear from the example visualizations

Visualization best practices for moving average

Moving averages are best suited to a line chart. Often, the moving average is displayed in conjunction with the raw data.

When both series are shown together, we may want to apply styling to the moving average line automatically. For example, we could style the moving average line using one or more of the following options:

  • Curves instead of straight lines
  • Dotted lines instead of solid
  • Thicker line width than the raw data

The raw data line is often styled with reduced emphasis, such as a thinner line or lighter color.

Example showing best practices:

Moving average best practices

Example visualizations for each of the types of moving average

The best way to compare all of the moving average functions is using almost-identical visualizations that can directly compare them. In all of these, the window is set to the 5 values immediately preceding the value, not including the value.

  1. Unweighted average

Screen Shot 2020-09-16 at 4 58 24 PM

  1. Linear weighted average

Screen Shot 2020-09-16 at 4 58 56 PM

  1. Single exponential with 0.7 decay

Screen Shot 2020-09-16 at 4 59 17 PM

  1. Single exponential with 0.3 decay

Screen Shot 2020-09-16 at 4 59 35 PM

  1. Holt-Winters exponential

Screen Shot 2020-09-16 at 6 33 07 PM

@AlonaNadler
Copy link

Should the chart rendering be specialized in any way, such as showing the moving average and underlying data at the same time but with different colors?

By default I don't think so. Users can always add another series which shows the underline data

Do we use exponential smoothing?

by default Lens uses the simple model
In the configuration for moving average, there is an advance option that opens popup for moving average. We can add the exponential weighted model. @cchaos we had this in the old design

@flash1293
Copy link
Contributor

As discussed offline, in the first iteration we will only implement the simplest form of moving average (averaging over a window of values, treating all of them equally).

We didn't talk about this, but I think it makes sense to expose a window parameter in some way as we can't easily suggest a good value.

@wylieconlon
Copy link
Contributor

I agree that we should expose options for controlling the window. In my proposal above, I have several examples of different options we can show the user. My preference is to expose a "start and end" offset separately for reasons given above.

@AlonaNadler
Copy link

why do we need start and end? I find it to be confusing. I think the ability to state a simple window that takes into consideration the intervals is good enough. In the future, Lens will have an offset feature and then users can generically move the line based on the offset they want
Do you agree?

@wylieconlon
Copy link
Contributor

I'm open to finding a better way of explaining these options, but I think we need to offer at least 2 options. The current Kibana behavior is using the calculation that's usually used in finance, but not in engineering:

In financial applications a simple moving average (SMA) is the unweighted mean of the previous n data. However, in science and engineering, the mean is normally taken from an equal number of data on either side of a central value.

The current option of just "window size: 5" is not enough. This is why I proposed an input called "start" and "end". Maybe a simpler option is to use a checkbox like:

  • Window size: 5
  • Window position: Before | Surrounding

@flash1293
Copy link
Contributor

At some point we definitely want to provide this in some way together with Holt-Winters etc.

Personally I don’t have a strong opinion on whether we need a window position setting from the start, I know too little about how this feature is used in practice in Kibana. I did a quick issue search and it seems nobody complained about not being able to set the window offset on TSVB so far, so IMHO I think we can live without it in the first iteration.

@wylieconlon
Copy link
Contributor

wylieconlon commented Sep 21, 2020

@flash1293 Your comment caused me to do some competitive analysis (brief summary below) to see what users might expect, and this has caused me to change my recommendation. Based on this analysis I am comfortable with the current set of features that Kibana offers for moving averages, which is that users choose the size of the window, and the current value of the field is excluded. I also believe it's possible to build a more configurable set of options, but that it might not be a high priority.


The majority of competitors offer a single option which is a window of "last N excluding current value", which I called configuration A above. Only Tableau and Looker allow users the full configuration options. Tableau uses configuration C, and Looker uses configuration B.

The majority of competitors do not include the current value in the moving average, and don't allow it to be included. Tableau includes the current value by default, as does InfluxDB. Only Tableau gives the user a choice about whether to include or exclude the current value.

@AlonaNadler
Copy link

From logic perspective, I suggest implementing it as TSVB. Allow users to configure a simple window which relates to the last N intervals. Since it wasn't raised as an issue to follow TSVB whether to include/exclude the current value from the moving average.

In the future (post Lens default) we can add more models or even define the window position if we get requests for it

@monfera
Copy link
Contributor

monfera commented Sep 30, 2020

As moving averages are lagging, and the lag is multiple bins or with EWMA, in theory, all preceding bins, what's the approach for the beginning bins? Would we query preceding bins not requested by the user (excluded by the user's filter) for the sole purpose of bootstrapping the lagging indicators? Or will it be OK for the user to see differing values for the initial bins (or maybe all bins), just because they go back more with their time filter?

There are other lagging indicator, eg. differences/deltas, in that case, it's a single bin though (unless there's a gap, and the policy is, skip/bridge gaps)

@monfera
Copy link
Contributor

monfera commented Oct 1, 2020

We touched on it during discussions, and some of #77692 (comment) intersects with this. A recap:

  • bootstrapping: would we query broader than what the time filter implies - which is, mostly, the user's way of signaling of the period of time of their interest, rather than a constraint on where the data should come from
  • if we don't do bootstrapping, and the user alters the beginning of the time filter, numbers will jump around
  • a comprehensive spec would be useful, given the above referred note and the various special cases such as temporal bounds, gaps in data, maybe outliers etc.
  • with an (eventual) EWMA, the bootstrap period is in theory, the entire past; so worth assuming ES-side calculation of a singular bootstrap value as bringing in the entire history isn't scalable
  • would be interesting to mention about types of averages:
    • single-bin averages: some metric in the bin is averaged (not the topic of this issue, but worth linking how that's done)
    • multi-bin averages: eg. average the months into quarters (not windowed!)
    • SMA, windowing averages: supported types; forward, backward and central averaging
    • EWMA; coefficient

Related topics 1: MA is often used for smoothing. There are alternatives that might be better, given a goal, eg. LOESS, radial basis functions etc.

Related topics 2: windowing is not just for averages, as mentioned earlier by folks here too. One reason for avoiding the name "derivation" for the differences is that there are some numerical methods for differentiation, eg. the five-point stencil method

@flash1293 flash1293 added the loe:needs-research This issue requires some research before it can be worked on or estimated label Oct 2, 2020
@AlonaNadler
Copy link

While moving average commonly used in xy chart this is not always the case.
Are we choosing to restrict moving average from table and single metric due to a technical challenge?
If not I don't think we should restrict using a moving average in tables and a single metric
The web is full of examples of how it is being used
image
image

@wylieconlon
Copy link
Contributor

@AlonaNadler We briefly considered the plan you're talking about (only yesterday), because we thought it would meet all of your requirements. Now that you've shared additional requirements we're going back to the previous plan of having a moving average function that will work in all chart types.

@flash1293 flash1293 removed the loe:needs-research This issue requires some research before it can be worked on or estimated label Oct 12, 2020
@mbondyra mbondyra self-assigned this Oct 27, 2020
@mbondyra mbondyra removed their assignment Nov 5, 2020
@flash1293
Copy link
Contributor

Closed by #84384

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Lens Project:LensDefault Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

8 participants