-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Lens] Add moving average aggregation #61777
Comments
Pinging @elastic/kibana-app (Team:KibanaApp) |
Open questions:
|
The definition of a moving average for our purposes is any function which smoothes sequential values in a date histogram. The main goal of a moving average function is to smooth out temporary fluctuations in the data, and highlight longer-term trends. There are several types of commonly-used moving averages. All of the types require a window size, which indicates how far to look back and look ahead of the current point to determine smoothing. The types are:
Which moving averages should Lens offer?I think that Lens should offer the complete set of moving average functions because there are good use cases for each, and we shouldn't pick only one. However, because TSVB already offers the complete set of functions, we do have an escape for users who need this kind of smoothing, so it might be acceptable to release a partial list in Lens. The main factors that will determine which moving average functions are available in Lens are:
User inputsThere are three ways of allowing users to define the window: a) Fewest options: This is what Visualize and TSVB offer. Window is defined as a number of buckets before the current value, and the user can't configure the position of the window My opinion is that the "start and end" style is best for users. Here's a comparison table of how the configuration would work for these:
There are other parameters that are specifically for the exponential moving average functions. These are typically called alpha, beta, and gamma. Holt-Winters also requires 2 other inputs. I think that the form we use for these cases can be similar to what TSVB offers: Finally, like derivatives there is a gap-skipping option that should be support by the Moving Average function. Form designBased on the required user inputs above, the minimum options we need are:
For exponential functions we need to offer all of the extra options as tuneable parameters. So we could take the TSVB form and make the following changes:
Table examplesI am not including table example because I think it's clear from the example visualizations Visualization best practices for moving averageMoving averages are best suited to a line chart. Often, the moving average is displayed in conjunction with the raw data. When both series are shown together, we may want to apply styling to the moving average line automatically. For example, we could style the moving average line using one or more of the following options:
The raw data line is often styled with reduced emphasis, such as a thinner line or lighter color. Example showing best practices: Example visualizations for each of the types of moving averageThe best way to compare all of the moving average functions is using almost-identical visualizations that can directly compare them. In all of these, the window is set to the 5 values immediately preceding the value, not including the value.
|
By default I don't think so. Users can always add another series which shows the underline data
by default Lens uses the simple model |
As discussed offline, in the first iteration we will only implement the simplest form of moving average (averaging over a window of values, treating all of them equally). We didn't talk about this, but I think it makes sense to expose a window parameter in some way as we can't easily suggest a good value. |
I agree that we should expose options for controlling the window. In my proposal above, I have several examples of different options we can show the user. My preference is to expose a "start and end" offset separately for reasons given above. |
why do we need start and end? I find it to be confusing. I think the ability to state a simple window that takes into consideration the intervals is good enough. In the future, Lens will have an offset feature and then users can generically move the line based on the offset they want |
I'm open to finding a better way of explaining these options, but I think we need to offer at least 2 options. The current Kibana behavior is using the calculation that's usually used in finance, but not in engineering:
The current option of just "window size: 5" is not enough. This is why I proposed an input called "start" and "end". Maybe a simpler option is to use a checkbox like:
|
At some point we definitely want to provide this in some way together with Holt-Winters etc. Personally I don’t have a strong opinion on whether we need a window position setting from the start, I know too little about how this feature is used in practice in Kibana. I did a quick issue search and it seems nobody complained about not being able to set the window offset on TSVB so far, so IMHO I think we can live without it in the first iteration. |
@flash1293 Your comment caused me to do some competitive analysis (brief summary below) to see what users might expect, and this has caused me to change my recommendation. Based on this analysis I am comfortable with the current set of features that Kibana offers for moving averages, which is that users choose the size of the window, and the current value of the field is excluded. I also believe it's possible to build a more configurable set of options, but that it might not be a high priority. The majority of competitors offer a single option which is a window of "last N excluding current value", which I called configuration A above. Only Tableau and Looker allow users the full configuration options. Tableau uses configuration C, and Looker uses configuration B. The majority of competitors do not include the current value in the moving average, and don't allow it to be included. Tableau includes the current value by default, as does InfluxDB. Only Tableau gives the user a choice about whether to include or exclude the current value. |
From logic perspective, I suggest implementing it as TSVB. Allow users to configure a simple window which relates to the last N intervals. Since it wasn't raised as an issue to follow TSVB whether to include/exclude the current value from the moving average. In the future (post Lens default) we can add more models or even define the window position if we get requests for it |
As moving averages are lagging, and the lag is multiple bins or with EWMA, in theory, all preceding bins, what's the approach for the beginning bins? Would we query preceding bins not requested by the user (excluded by the user's filter) for the sole purpose of bootstrapping the lagging indicators? Or will it be OK for the user to see differing values for the initial bins (or maybe all bins), just because they go back more with their time filter? There are other lagging indicator, eg. differences/deltas, in that case, it's a single bin though (unless there's a gap, and the policy is, skip/bridge gaps) |
We touched on it during discussions, and some of #77692 (comment) intersects with this. A recap:
Related topics 1: MA is often used for smoothing. There are alternatives that might be better, given a goal, eg. LOESS, radial basis functions etc. Related topics 2: windowing is not just for averages, as mentioned earlier by folks here too. One reason for avoiding the name "derivation" for the differences is that there are some numerical methods for differentiation, eg. the five-point stencil method |
@AlonaNadler We briefly considered the plan you're talking about (only yesterday), because we thought it would meet all of your requirements. Now that you've shared additional requirements we're going back to the previous plan of having a moving average function that will work in all chart types. |
Closed by #84384 |
Add a moving average aggregation to Lens. See also #56696 for more discussion.
The text was updated successfully, but these errors were encountered: