Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement top_metrics agg #51155

Merged
merged 71 commits into from
Feb 14, 2020
Merged

Implement top_metrics agg #51155

merged 71 commits into from
Feb 14, 2020

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jan 17, 2020

The top_metrics agg is kind of like top_hits but it only works on
doc values so it should be faster.

At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.

Closes #48069

Co-Authored by: Zachary Tong [email protected]

The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.

At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.

Co-Authored by: Zachary Tong <[email protected]>
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@nik9000 nik9000 added >enhancement and removed WIP labels Jan 17, 2020
@nik9000
Copy link
Member Author

nik9000 commented Jan 17, 2020

Some notes for reviewers:

  • I've talked to @giladgal about the use of the experimental label here. We're ok with it so long as it is a temporary thing and we track it.
  • I don't have any yml integration tests. At this point everything is covered by the tests that the docs make.
  • I don't have any support in the high level rest client. I don't believe we've settled on a nice pattern for the aggs in the analytics plugin for that.
  • I do my best to reuse bits of Lucene around the sort. Maybe I work too hard to be honest. But it keeps this version small and it something we can look at again later.

Copy link
Contributor

@polyfractal polyfractal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments :)

I have mixed opinions about the Lucene sort stuff after seeing it in action. Getting the parsing and associated comparator logic for free is indeed nice. But we are sorta misusing the overall API a bit I think? I'm a little worried we'll hit unanticipated side effects, and the not-entirely intuitive API that it forces on us. Maybe @jpountz has an opinion?

Otherwise I think the other big issue is date formatting and storing the sort value as a double, which I think will cause issues due to floating point errors.

Depending on if/when we add support for multiple sorts or metrics, we're going to have to add wire compat versioning, but I think the JSON itself is forwards compatible so that isn't a huge concern (other than being a bit of a pain). So ++ for the JSON, nice work 👍

@nik9000
Copy link
Member Author

nik9000 commented Jan 21, 2020

I'm a little worried we'll hit unanticipated side effects, and the not-entirely intuitive API that it forces on us. Maybe @jpountz has an opinion?

I'm pretty sure it isn't the right API to be honest. I wonder if we'd be better off adding something to SortBuilder to return something more like what we want.

Copy link
Member Author

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@polyfractal do you think the sort key formatting thing is a blocker for getting the initial implementation of this in? I'd be happy to do that part now if you think so. I'd planned to do that as maybe a second or third followup because I hadn't realized that it was that important.

@nik9000
Copy link
Member Author

nik9000 commented Jan 21, 2020

I'm pretty sure it isn't the right API to be honest. I wonder if we'd be better off adding something to SortBuilder to return something more like what we want.

I have something! Not, like, finished. But something on its way. Tomorrow probably.

@nik9000
Copy link
Member Author

nik9000 commented Feb 10, 2020

@jpountz, could you have another look at this when you get a chance?

@jpountz, would you like to have another look at this one? I'm going to fix the merge conflict.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have time for an in-depth review but it looks good to me.

@nik9000 nik9000 merged commit 5b22666 into elastic:master Feb 14, 2020
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Feb 14, 2020
The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.

At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.

Co-Authored by: Zachary Tong <[email protected]>
nik9000 added a commit that referenced this pull request Feb 14, 2020
The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.

At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.

Co-Authored by: Zachary Tong <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metric Selector aggregation
6 participants