-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement top_metrics agg #51155
Implement top_metrics agg #51155
Conversation
The `top_metrics` agg is kind of like `top_hits` but it only works on doc values so it *should* be faster. At this point it is fairly limited in that it only supports a single, numeric sort and a single, numeric metric. And it only fetches the "very topest" document worth of metric. We plan to support returning a configurable number of top metrics, requesting more than one metric and more than one sort. And, eventually, non-numeric sorts and metrics. The trick is doing those things fairly efficiently. Co-Authored by: Zachary Tong <[email protected]>
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
Some notes for reviewers:
|
.../src/test/java/org/elasticsearch/xpack/analytics/topmetrics/InternalTopMetricsWireTests.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments :)
I have mixed opinions about the Lucene sort stuff after seeing it in action. Getting the parsing and associated comparator logic for free is indeed nice. But we are sorta misusing the overall API a bit I think? I'm a little worried we'll hit unanticipated side effects, and the not-entirely intuitive API that it forces on us. Maybe @jpountz has an opinion?
Otherwise I think the other big issue is date formatting and storing the sort value as a double
, which I think will cause issues due to floating point errors.
Depending on if/when we add support for multiple sorts or metrics, we're going to have to add wire compat versioning, but I think the JSON itself is forwards compatible so that isn't a huge concern (other than being a bit of a pain). So ++ for the JSON, nice work 👍
docs/reference/aggregations/metrics/top-metrics-aggregation.asciidoc
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/analytics/topmetrics/TopMetricsAggregationBuilder.java
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/analytics/topmetrics/TopMetricsAggregationBuilder.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/analytics/topmetrics/TopMetricsAggregationBuilder.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/analytics/topmetrics/TopMetricsAggregationBuilder.java
Outdated
Show resolved
Hide resolved
.../src/test/java/org/elasticsearch/xpack/analytics/topmetrics/InternalTopMetricsWireTests.java
Outdated
Show resolved
Hide resolved
...cs/src/test/java/org/elasticsearch/xpack/analytics/topmetrics/TopMetricsAggregatorTests.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/analytics/topmetrics/TopMetricsAggregationBuilder.java
Outdated
Show resolved
Hide resolved
...analytics/src/main/java/org/elasticsearch/xpack/analytics/topmetrics/InternalTopMetrics.java
Outdated
Show resolved
Hide resolved
...alytics/src/main/java/org/elasticsearch/xpack/analytics/topmetrics/TopMetricsAggregator.java
Outdated
Show resolved
Hide resolved
I'm pretty sure it isn't the right API to be honest. I wonder if we'd be better off adding something to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@polyfractal do you think the sort key formatting thing is a blocker for getting the initial implementation of this in? I'd be happy to do that part now if you think so. I'd planned to do that as maybe a second or third followup because I hadn't realized that it was that important.
...src/main/java/org/elasticsearch/xpack/analytics/topmetrics/TopMetricsAggregationBuilder.java
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/analytics/topmetrics/TopMetricsAggregationBuilder.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/analytics/topmetrics/TopMetricsAggregationBuilder.java
Outdated
Show resolved
Hide resolved
I have something! Not, like, finished. But something on its way. Tomorrow probably. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't have time for an in-depth review but it looks good to me.
docs/reference/aggregations/metrics/top-metrics-aggregation.asciidoc
Outdated
Show resolved
Hide resolved
docs/reference/aggregations/metrics/top-metrics-aggregation.asciidoc
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/index/fielddata/fieldcomparator/BytesRefFieldComparatorSource.java
Outdated
Show resolved
Hide resolved
The `top_metrics` agg is kind of like `top_hits` but it only works on doc values so it *should* be faster. At this point it is fairly limited in that it only supports a single, numeric sort and a single, numeric metric. And it only fetches the "very topest" document worth of metric. We plan to support returning a configurable number of top metrics, requesting more than one metric and more than one sort. And, eventually, non-numeric sorts and metrics. The trick is doing those things fairly efficiently. Co-Authored by: Zachary Tong <[email protected]>
The `top_metrics` agg is kind of like `top_hits` but it only works on doc values so it *should* be faster. At this point it is fairly limited in that it only supports a single, numeric sort and a single, numeric metric. And it only fetches the "very topest" document worth of metric. We plan to support returning a configurable number of top metrics, requesting more than one metric and more than one sort. And, eventually, non-numeric sorts and metrics. The trick is doing those things fairly efficiently. Co-Authored by: Zachary Tong <[email protected]>
The
top_metrics
agg is kind of liketop_hits
but it only works ondoc values so it should be faster.
At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.
Closes #48069
Co-Authored by: Zachary Tong [email protected]