Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ValuesSourceRegistry for MultiValuesSourceAggregations #53194

Open
4 tasks
polyfractal opened this issue Mar 5, 2020 · 4 comments
Open
4 tasks

Use ValuesSourceRegistry for MultiValuesSourceAggregations #53194

polyfractal opened this issue Mar 5, 2020 · 4 comments
Labels
:Analytics/Aggregations Aggregations Meta Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >tech debt

Comments

@polyfractal
Copy link
Contributor

polyfractal commented Mar 5, 2020

Wire up the MultiValuesSourceAggregationBuilder sub classes to use the values source registry. This should follow a model like Composite, where single source components are loaded into the registry and can be checked out in any combination.

  • Weighted Average
  • T Test
  • Geo Line
  • Multi Terms (not based on MultiValuesSourceAggregationBuilder, but should be doable with the same techniques)

Original Description Below

Today we have three aggregations that use multiple values sources (e.g. multiple fields):

Matrix Stats

Uses the ArrayValuesSourceAggregationBuilder abstraction, which accepts a single array of fields, and all extra parameters (missing, format, script, etc) are presumed to apply to all of the specified fields.

I do not believe there is validation that all fields are the same type, but matrix_stats assumes they are all numerics.

"matrix_stats": {
  "fields": ["poverty", "income"]
}
Weighted Average

Uses the MultiValuesSourceAggregationBuilder abstraction, which conceptually accepts a map of single ValuesSourceAggBuilders (although doesn't actually use VSAB). Each field can define it's own extra params like missing, format, etc, and the overall agg can also have parameters like format.

WeightedAvg also assumes both fields are numerics, but this is not enforced by the builder when fields are resolved

"weighted_avg": {
  "value": {
    "field": "grade"
  },
  "weight": {
    "field": "weight"
  }
}
Top Metrics

Uses AbstractAggregationBuilder directly, because MultiValuesSourceAggregationBuilder had both too much and too little.

@nik9000 can probably elaborate more on why MVSAB wasn't a good fit.

"top_metrics": {
  "metrics": {"field": "value_field"},
  "sort": {"sort_field": "desc"}
}
Bonus round: Composite

Composite is a little different in that it operates totally differently, with it's own set of support classes. But technically it's a multi VS agg as well, and again doesn't share any infra with the others.

So, what to do?

So we basically have three (or four) aggs with three (or four) different abstractions. So we could find some commonality between them all and build something to reduce redundant code, while leaving it flexible for individual circumstances.

Or we just remove the abstractions (ArrayVSB, MultiVSB) and build directly off AbstractAB like top metrics. Right now the extra boilerplate that Array/Multi introduce is not warranted given that they are not reused anywhere.

A final option is to continue in a holding pattern until we have a better idea if more multi-field aggs are coming (e.g. if we add 2d clustering, that would need two fields and provide another example)

We've decided to ignore these aggs in the VS Refactoring for the time being, but we should probably do something about them sooner than later.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@wylieconlon
Copy link

We just ran into this issue in Lens and it prevents our adoption of top_metrics. It seems like these aggs are silently ignoring multi-value sources, which is worse than throwing an error in my opinion. I think we should take the 8.0 major as an opportunity to make a breaking change to multi value fields. Specifically, we should either:

  • Allow multi-value fields in all the aggs affect OR
  • Throw an error when using a multi-value field in a single-value agg

This would be much easier if clients had metadata about multi-value fields, which is related to #58523 and #64077.

@wchaparro
Copy link
Member

@wylieconlon we missed this one since Zach has left. cc @elastic/es-analytics-geo
The issue you are reporting is different from this issues. This issue is about aggs that operate on more than one field, you are taking about fields with multiple values.

Could you please open an issue for us? Thanks!

@not-napoleon
Copy link
Member

We discussed this at the meeting on 2022-01-05. Summary:

  • We don't see a big problem with the abstractions as they stand
  • We would like to get multi values source aggregations onto the values source registry, but it isn't an urgent need to do so
  • Top Metrics & Composite already use the registry (added since the time this issue was created, probably)
  • We think aggregations using MultiValuesSourceAggregationBuilder could follow the same model used by Composite & Top Metrics - namely, registering values source specific components that are combined into a generic aggregator.
  • ArrayValuesSource & it's one aggregation MatrixStats may need a different approach, but we're comfortable leaving that out for now.

I'm going to update the title and description of this issue to reflect the current thinking, as well as remove Team Discuss. I also opened #82281 to address the unrelated concerns around Top Metrics over multi value fields.

@not-napoleon not-napoleon changed the title What to do about Multi ValuesSource aggregations? Use ValuesSourceRegistry for MultiValuesSourceAggregations Jan 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations Meta Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >tech debt
Projects
None yet
Development

No branches or pull requests

6 participants