Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting based on parent/child relationship #2917

Closed
kul opened this issue Apr 19, 2013 · 78 comments
Closed

Sorting based on parent/child relationship #2917

kul opened this issue Apr 19, 2013 · 78 comments
Assignees
Labels
>feature help wanted adoptme high hanging fruit :Search/Search Search-related issues that do not fall into other categories

Comments

@kul
Copy link
Contributor

kul commented Apr 19, 2013

Currently there is no way to sort documents based on parent child relation. E.g.
Sorting a doc based on child doc field or the opposite.

@martijnvg
Copy link
Member

@kul At the moment this isn't possible. This feature will be added in the near future.

For now you use @clintongormley's excellent workaround: http://stackoverflow.com/questions/14504180/elasticsearch-sorting-parents-through-child-values/14519947#14519947

This workaround allows you to sort on child values by using custom_score as child query.

@kul
Copy link
Contributor Author

kul commented Apr 19, 2013

oh wow! if i can specify a query, it mean limitless possibilities for sorting using nested has_child/has_parent clause.

Thanks

@GrantGochnauer
Copy link

Really looking forward to this feature - we use parent/child relationships extensively and right now have to copy children values on parent object to sort on them. Will give the work-around a try but hopefully we'll see this in the .90 series too ;) Thank you!

@GrantGochnauer
Copy link

Is it true that the work-around requires you to leveraged nested mappings instead of a true/parent child relationship for the sorting to work? thanks!

@P-Hill
Copy link

P-Hill commented May 15, 2013

On 5/14/2013 10:52 AM, Grant Gochnauer wrote:

Is it true that the work-around requires you to leveraged nested
mappings instead of a true/parent child relationship for the sorting
to work? thanks!

http://www.elasticsearch.org/guide/reference/query-dsl/has-child-query/
"The |has_child| also has scoring support from version |0.20.2|. The
supported score types are |max|, |sum|, |avg| or |none|"

Without having seen Clinton's post, but having discussed it a bit on the
list, what I was looking for was the youngest child, so I used "max" to
good effect to find parents with the youngest child ("newest parents").
Originally I was playing with top_children, but has_children was what I
really needed. The field that becomes my score is the date of a child file.

The problem that a score is a Float, so you can have round off problems
when you convert a 64-bit Date long into a 32-bit Float. This round off
can loose seconds, more often milliseconds. Since my "children" are
actually file instances. I couldn't come up with any brilliant formula
to stay away from the round off, because two files can have dates very
close together that are not resolvable in the digits of a float.

If the score was a double I would be able use ~16 (base 10) digits of
accuracy to better effect and rarely have round off of dates, so I hope
someone changes how a score is stored in the entire Elastic Search and
Lucene infrastructure from a Float to a Double :) That is an easy
change isn't it? :)

-Paul

@GrantGochnauer
Copy link

Thanks for the reply P-Hill... We are developing an API that allows for an arbitrary sort on child fields which are different depending on who is leveraging our API. In other words, without built in support for sorting on child document fields, we aren't able to use the custom score very well.

Thanks

@P-Hill
Copy link

P-Hill commented May 16, 2013

+2
My example is just one case where a pretty simple thing like a datetime
doesn't actually work to send through as a score. I'm glad this is coming.
If somehow a result set of parents has a field from a matched child
field, it would seem this could lead to other requested features like
returning the one max/min/avg value or even a list of matching value
(forget sorting them). Since there seemed to be many requests for
various things related to knowledge about the actual child matches of a
parent, this should be a useful API.

On 5/15/2013 6:24 AM, Grant Gochnauer wrote:

We are developing an API that allows for an arbitrary sort on child
fields which are different depending on who is leveraging our API.

@vickenstein
Copy link

+1
Does any one know of any ways for fetching the _parent doc rather than just the _parent uid using the script field?
e.g.
"script" : "_source._parent["somefield"].value"
thanks! because if this is possible sorting using parent/child would be realized even if it is not optimized.

@serj-p
Copy link

serj-p commented Mar 18, 2014

Very important feature since it's computationally hard to update thousands of documents when all you need is update only one field in a big document and than make sorting by this field. For example contacts which have property like last contacted which changes very frequently but not whole contact. Update api doesn't solve my case since enabling _source will increase my index a lot.

@amerov
Copy link

amerov commented Jul 11, 2014

+1

@travisbell
Copy link

+1

Hope to see this natively supported in ES!

@machinelearner
Copy link

though the memory signature it leaves and the cost of compute is slightly high, this will be one of the most used feature if it comes out in ES. Eagerly looking forward to it!

@parhammmm
Copy link

+10

@stephanebastian
Copy link

+1 IMHO, it's definitely one the top missing feature, along with:

@pauleil
Copy link

pauleil commented Apr 15, 2015

Are there any plans to support this feature in the foreseeable future?

@martijnvg
Copy link
Member

Once the refactoring in #8134 is in, this is planned to be added. Like with the current refactoring the sorting by child or parent field should be added to the new Lucene query time join first.

@pauleil
Copy link

pauleil commented May 29, 2015

This is very exciting. Is there a plan of integrating this in an upcoming release, now that #8134 is solved?

@vinusebastian
Copy link

@kul @martijnvg @clintongormley How do I do the workaround mentioned in http://stackoverflow.com/questions/14504180/elasticsearch-sorting-by-nested-documents-values/14519947#14519947 for a parent child relationship? In my script field what should replace "doc['locations.order'].value" to refer to the child document's field?

Thanks in advance

@archie-sh
Copy link

would really like to see this implemented

@clintongormley
Copy link
Contributor

@martijnvg With #8134 in, do you see a way forward for implementing this?

@martijnvg
Copy link
Member

@clintongormley Yes, I do see a way how this can be implemented. Similar to how the join is implemented, but instead of aggregating child scores per parent the sorting should aggregate sort values instead.

@danipl
Copy link

danipl commented Aug 14, 2016

+1

For now, In my situation (sorting by parent.field), I'm working with function_score.

GET /index/child/_search
{
  "query": {
    "has_parent": {
      "parent_type": "parent",
      "score_mode": "score", 
      "query": {
        "function_score": {
          "script_score": {
            "script": "_score * doc['id'].value"
          }
        }
      }
    }
  }
} 

@jpountz
Copy link
Contributor

jpountz commented Aug 24, 2016

I don't think we can realistically make it work so I am closing it as a won't fix. When this is needed, a possible workaround is to fold sort values into the score as shown by @danipl in the above message.

@jpountz jpountz closed this as completed Aug 24, 2016
@rpedela
Copy link

rpedela commented Aug 24, 2016

@jpountz What has changed? I thought @martijnvg said he saw a way forward?

@jpountz
Copy link
Contributor

jpountz commented Aug 25, 2016

This is something that could be implemented, but it would require a lot of specialization depending on the types of the fields that are being sorted, and I don't think that would be sustainable in the long term. Moreover, I am concerned about making features that do not scale more appealing (parent/child needs to perform a linear scan in the 2nd phase of its execution in the general case).

@martijnvg
Copy link
Member

@rpedela I did and still see a way how this could be implemented. Implementing this feature does require writing quite some code that will only be used when using has_child/has_parent queries and sorting by a field, which won't be used if sorting by _score or when other queries are used. Over time I did see that the workaround provided here is sufficient for most people wanting to sort based on a field in a child document or parent document. I think closing this issue as won't fix is jusitfied, since adding this is far from trivial and most of the time the workaround is good enough

@rpedela
Copy link

rpedela commented Aug 25, 2016

Fair enough. Could small but complete examples (sort by parent and sort by child) be added to the docs showing how to sort using the workaround? I think without official documentation this will keep coming up based on the amount of interest.

@martijnvg
Copy link
Member

Could small but complete examples (sort by parent and sort by child) be added to the docs showing how to sort using the workaround?

Yes!

@sop3k
Copy link

sop3k commented Sep 16, 2016

I there any possibility to use this workaround to sort on text field?

@dimfeld
Copy link
Contributor

dimfeld commented Sep 17, 2016

@sop3k I can't think of any normal way to make it work on a string field, but I came up with a method which may or may not work for you depending on your application and tooling.

At index time, convert the string you want to sort into a 63-bit number and index it alongside the string field as a long. Assuming a really naive 8-bit encoding, you could get 7 characters worth of sorting accuracy, or you could pack the bits to fit more if you know the range of values you'll be storing. Then when you want to sort on the string field, sort on this number instead using the function_score technique described here.

I haven't actually tried the above, but it seems likely to work if you're staying in the ASCII range of characters. Of course, this would not work well with UTF8, and would probably not provide enough accuracy to be useful with UTF16.

@sop3k
Copy link

sop3k commented Sep 17, 2016

Thank @dimfeld for comment. I was thinking about same solution but as you mentoin it's very limited to do so with UTF8 and UTF16.

@lonre
Copy link

lonre commented Nov 8, 2016

@martijnvg

Hi, any updates for the workaround?

@martijnvg
Copy link
Member

@ayushsangani
Copy link
Contributor

@martijnvg Does the workaround above work for String fields?

@boesing
Copy link

boesing commented Aug 17, 2017

+1

2 similar comments
@tomoktan
Copy link

+1

@zhouchong90
Copy link

+1

@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Parent/Child labels Feb 14, 2018
@wangyankuku
Copy link

+1

1 similar comment
@s6jain
Copy link

s6jain commented Jun 26, 2018

+1

@RomanVashchegin
Copy link

Also need workaround for strings

@shushantan
Copy link

shushantan commented Mar 12, 2020

+1

@song-william
Copy link

song-william commented Nov 9, 2021

I haven't been able to get this workaround to work well with a number field that can have a wide range of negative, positive, and missing values. This is due to the limited precision of _score, the requirement that _score > 0, and lack of support of _last/_first for missing values.

@cmk1523
Copy link

cmk1523 commented Jul 13, 2022

+1 for this feature to be added

@Chan-Ro
Copy link

Chan-Ro commented Oct 12, 2022

+1 need this feature for string sorting

@lazofl
Copy link

lazofl commented Jul 20, 2023

+1

2 similar comments
@getsolaris
Copy link
Contributor

+1

@kiransunkari
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature help wanted adoptme high hanging fruit :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests