Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to sort by result of bucket script aggregation #32153

Closed
ydzhu opened this issue Jul 18, 2018 · 5 comments
Closed

Unable to sort by result of bucket script aggregation #32153

ydzhu opened this issue Jul 18, 2018 · 5 comments
Labels

Comments

@ydzhu
Copy link

ydzhu commented Jul 18, 2018

I get result by using bucket script aggregation, but I can't sort by this aggregation value. For explame,

{
  "from": 0,
  "size": 0,
  "sort": [],
  "aggs": {
    "api_terms": {
      "terms": {
        "field": "name",
        "order": {
          "avg_time": "desc"
        }
      },
      "aggs": {
        "sum_duration": {
          "sum": {
            "field": "duration"
          }
        },
        "sum_count": {
          "sum": {
            "field": "count"
          }
        },
        "avg_time": {
          "bucket_script": {
            "buckets_path": {
              "duration": "sum_duration",
              "count": "sum_count"
            },
            "script": "params.duration / params.count"
          }
        }
      }
    }
  }
}

I hope sort by "avg_time" that is calculated by bucket_script, so I add order in term aggregation ( "order": {"avg_time": "desc"}). But it cause error. This reason of error is “Invalid aggregator order path [avg_time]. Unknown aggregation [avg_time]".
Even more puzzling is I use add order by sum_count instead of avg_time, i can get correct value.

@colings86 colings86 added the :Analytics/Aggregations Aggregations label Jul 18, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@jtibshirani
Copy link
Contributor

jtibshirani commented Jul 31, 2018

Hi @ydzhu, in a terms aggregation it's unfortunately not possible to order by the result of a pipeline aggregation like bucket_script (see the note here: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-order).

It's not good that we're throwing a 503, and the error message is also misleading. We're tracking the error code issue in #20003.

I don't see an easy workaround for your request, but tagging @polyfractal just in case he has an idea.

@polyfractal
Copy link
Contributor

@jtibshirani is correct, you can't sort by pipeline aggs... they are executed after regular aggs execute.

If the terms agg contains all the entries you care about, you can still do pipeline sorting using the bucket_sort pipeline agg: https://www.elastic.co/guide/en/elasticsearch/reference/6.x/search-aggregations-pipeline-bucket-sort-aggregation.html

The caveat is that this performs sorting on the final list of buckets, not while the aggregations are calculating. So it only sorts the list that is returned by the terms agg... if a term/value isn't in the list, it won't get sorted. That's in contrast to sorting on the terms agg itself, which changes the contents of the list.

It'd look something like this (untested):

{
  "from": 0,
  "size": 0,
  "sort": [],
  "aggs": {
    "api_terms": {
      "terms": {
        "field": "name",
        "order": {
          "avg_time": "desc"
        }
      },
      "aggs": {
        "sum_duration": {
          "sum": {
            "field": "duration"
          }
        },
        "sum_count": {
          "sum": {
            "field": "count"
          }
        },
        "avg_time": {
          "bucket_script": {
            "buckets_path": {
              "duration": "sum_duration",
              "count": "sum_count"
            },
            "script": "params.duration / params.count"
          }
        },
        "final_sort": {
          "bucket_sort": {
             "sort": [
               {"avg_time": {"order": "desc"}}
              ]
           }
        }
      }
    }
  }
}




@jtibshirani
Copy link
Contributor

jtibshirani commented Jul 31, 2018

I opened #32522 in hopes of clarifying the error message. I'll close this out once that goes in.

@gkozyryatskyy
Copy link

So is there any other option to sort buckets based metric, calculated from sub-aggregations results?
From current topic example: Sort terms by the "sum_duration/sum_count"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants