Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor rollups meta (AKA Rollup V2) #42720

Closed
11 of 21 tasks
polyfractal opened this issue May 30, 2019 · 11 comments
Closed
11 of 21 tasks

Refactor rollups meta (AKA Rollup V2) #42720

polyfractal opened this issue May 30, 2019 · 11 comments
Assignees
Labels
Meta >refactoring stalled :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Top Ask

Comments

@polyfractal
Copy link
Contributor

polyfractal commented May 30, 2019

This is a meta issue discussing a complete rewrite of the Elasticsearch rollup codebase with the aim to improve the following points:

  • Drop rollup jobs in favour of integrating rollups with ILM. This means that rolling up an index will work similarly to shrinking an index. The rollup will be done when indexing is complete and the action will rollup the entire index at the same time.
  • Make rollup functionality easier to setup and administer from an operational point of view. For example, allow limited editing of existing rollup configuration (add new metrics etc)
  • Make rollup indices behave much more like regular indices, simplifying querying and management.
  • Improve reliability. Existing rollup jobs are not atomic and sometimes fail midway, leaving rollup indices that are not complete. We should make the rollup computation atomic.
  • Improve performance of rollup jobs. Some large-scale use cases can run into bottlenecks where the search phase of rollup is not fast enough (due to limited thread involvement across cluster).
  • Implement support for pre-aggregated data structures to enable cardinality, percentiles [Rollup] Support for data-structure based metrics (Cardinality, Percentiles, etc) #33214

Below we outline a high level plan of changes that will help us achieve the above goals:

@polyfractal polyfractal added Meta :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data 7x labels May 30, 2019
@polyfractal polyfractal self-assigned this May 30, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

@csoulios csoulios removed the 7x label Nov 28, 2019
@csoulios csoulios changed the title Improving Rollup operational ergonomics Rollup rewrite meta Nov 28, 2019
@csoulios csoulios changed the title Rollup rewrite meta Refactor rollups meta Nov 28, 2019
@csoulios csoulios self-assigned this Nov 28, 2019
@exekias
Copy link

exekias commented Dec 5, 2019

ey @polyfractal I see this concept of "Grouping Tuple" really interesting for the metrics use case. I think doesn't only apply to rollups, but can be used in some queries too!

It actually sounds familiar to what I did in Beats here: elastic/beats#10293, where we want to have enough information to do downsampling on query time to make sure results actually make sense. I wonder if we could reuse this grouping tuple field here?

@pcsanwald
Copy link
Contributor

pcsanwald commented Dec 5, 2019

Great outline! I think the additional thing that would be good to think about and add to the outline, is what the migration path looks like for users currently using roll ups (I’m assuming that the new field type would happen during a rollup ILM action and therefore wouldn’t require a full re-index or source index?) and also what breaking changes we are proposing (at least _rollup_search removal).

@csoulios @polyfractal tagging y'all in case this got lost in the shuffle :)

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
csoulios added a commit that referenced this issue May 14, 2020
Following the implementation of the aggregate_metric_double field mapper(#49830) we are implementing the Min, Max, ValueCount, Sum and Average aggregations on aggregate metrics.

The code builds on the excellent work done for #42949 and uses the extensible ValuesSources infrastructure to wire up common metric aggregation on the aggregate_metric_double field type.

This PR is part of the rollups v2 refactoring as described in meta issue #42720
@giladgal
Copy link
Contributor

giladgal commented Jul 2, 2020

Regarding the relation with the Rollup functionality that is already released: Rollup is currently an experimental feature in Elasticsearch. We plan to make Rollup GA when we release the refactored rollup which will make rolled up indices queryable through _search and _async_search and which will make rollup an action in ILM. We are not making Rollup GA before that because we know this change will bring a change in the API. We will make an effort to provide an upgrade path for Rollup indices created with the current Rollup processes to the new refactored Rollup system, which will make these indices queryable through the _search and _async_search API endpoints.

@mbudge
Copy link

mbudge commented Oct 5, 2020

Will rollups handle new fields arriving in the data?

For example winlogbeat adds new fields when one appears in the windows event log.

Would this new field get rolled up automatically or would we have to manually modify the rollup job?

talevy added a commit to talevy/elasticsearch that referenced this issue Dec 14, 2020
This commit moves the ownership of tracking the rollup_index from
the RollupActionConfig to the RollupAction.Request.

This is cleaner since the config should not be concerned with the
source and rollup indices.

relates elastic#42720.
talevy added a commit that referenced this issue Dec 14, 2020
This commit moves the ownership of tracking the rollup_index from
the RollupActionConfig to the RollupAction.Request.

This is cleaner since the config should not be concerned with the
source and rollup indices.

relates #42720.

Co-authored-by: James Rodewig <[email protected]>
talevy added a commit that referenced this issue Dec 14, 2020
This commit moves the ownership of tracking the rollup_index from
the RollupActionConfig to the RollupAction.Request.

This is cleaner since the config should not be concerned with the
source and rollup indices.

relates #42720.

Co-authored-by: James Rodewig <[email protected]>
talevy added a commit that referenced this issue Jan 29, 2021
this commit introduces a new Rollup ILM Action that allows indices
to be rolled up according to a specific rollup config. The
action also allows for the new rolled up index to be associated with
a different policy than the original/source index.

Relates #42720.

Closes #48003.
talevy added a commit to talevy/elasticsearch that referenced this issue Jan 29, 2021
this commit introduces a new Rollup ILM Action that allows indices
to be rolled up according to a specific rollup config. The
action also allows for the new rolled up index to be associated with
a different policy than the original/source index.

Relates elastic#42720.

Closes elastic#48003.
@polyfractal polyfractal removed their assignment Mar 18, 2021
@ppf2
Copy link
Member

ppf2 commented Apr 12, 2021

The ability for rollup API to generate time-based indices that can be managed/retired over time is a common ask from the field. +1 on ILM integration.

@heoehmke
Copy link

heoehmke commented Feb 8, 2022

Will the new rollup feature also support boolean fields in terms aggregations (as proposed in #49537)? The current rollups only support keyword and numeric fields and it doesn't seem too complicated to make using booleans possible there.

@csoulios
Copy link
Contributor

The team has decided that there is a great fit between rollups and the new metrics database we are implementing (#74660).

The fact that dimensions and metrics are first class citizens allows us to simplify the rollup configuration. Also, leveraging the index structure in TSDB results in greatly improved performance.

The team views rollups as a fundamental operation of a metrics database and we focus all our efforts to develop such a feature. Hence, I am closing this issue and work on rollups/downsampling will be tracked inside #74660.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meta >refactoring stalled :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Top Ask
Projects
None yet
Development

No branches or pull requests