Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add inner hits to nested and parent/child queries #8153

Merged
merged 1 commit into from
Dec 2, 2014

Conversation

martijnvg
Copy link
Member

Inner hits allows to embed nested inner objects, children documents or the parent document that contributed to the matching of the returned search hit as inner hits, which would otherwise be hidden.

Example search request w/ nested query:

curl -XGET "http://localhost:9200/stack/question/_search" -d'
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "match": {
          "comments.text": "elasticsearch"
        }
      }
    }
  },
  "_source": {
    "include": "title"
  },
  "inner_hits": {
    "comments": {
      "path": {
        "comments": {
          "_source": {
            "include": "text"
          },
          "query": {
            "match": {
              "comments.text": "elasticsearch"
            }
          }
        }
      }
    }
  }
}'

Example response:

{
   "took": 16,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 2.1920953,
      "hits": [
         {
            "_index": "stack",
            "_type": "question",
            "_id": "485316",
            "_score": 2.1920953,
            "_source": {
               "title": "timestamp +5 hours logstash"
            },
            "inner_hits": {
               "comments": {
                  "hits": {
                     "total": 1,
                     "max_score": 2.1920953,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "485316",
                           "_nested": {
                              "field": "comments",
                              "offset": 2
                           },
                           "_score": 2.1920953,
                           "_source": {
                              "text": "So my timezone is EST which shows in the log, but the agent stamps 2013-03-06T17:03:56.934Z before it sends to elasticsearch."
                           }
                        }
                     ]
                  }
               }
            }
         },
         {
            "_index": "stack",
            "_type": "question",
            "_id": "107518",
            "_score": 1.8563976,
            "_source": {
               "title": "Web based file search in the lan?"
            },
            "inner_hits": {
               "comments": {
                  "hits": {
                     "total": 1,
                     "max_score": 1.8563976,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "107518",
                           "_nested": {
                              "field": "comments",
                              "offset": 2
                           },
                           "_score": 1.8563976,
                           "_source": {
                              "text": "Did you ever settle on something? I am looking for something similar. Does Solr work? that seems to be the most common suggestion. I tried elasticsearch but didn't know how to add a network path as an index to it."
                           }
                        }
                     ]
                  }
               }
            }
         },
         {
            "_index": "stack",
            "_type": "question",
            "_id": "419532",
            "_score": 1.2994784,
            "_source": {
               "title": "How to silently buffer tunnel traffic on network split, and automatically renew the connection when possible?"
            },
            "inner_hits": {
               "comments": {
                  "hits": {
                     "total": 1,
                     "max_score": 1.2994784,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "419532",
                           "_nested": {
                              "field": "comments",
                              "offset": 4
                           },
                           "_score": 1.2994784,
                           "_source": {
                              "text": "Let me elaborate. I'm using ElasticSearch replication, when network disconnects, it puts all \"cluster\" into \"yellow\" state, means writes are postponed. I want to use \"localhost\" as replica nodes' host setting in config, both in China and in Germany. Under this there will be daemons acting as tcp proxy buffer and handling disconnections nicely. That's the outline of the idea, however maybe it would better to just leave it to ElasticSearch's protocol and handle yellow state in my app."
                           }
                        }
                     ]
                  }
               }
            }
         }
      ]
   }
}

For documentation and more examples check the: inner-hits.asciidoc file.

Closes #3022, #3152

documents or child document are returned based on matches in parent documents. In the nested case documents are returned
based on matches in nested inner objects.

In both cases the actual matches in the different scopes what caused a document to be returned is hidden. In many cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/what/that/ ?

==== Options

Inner hits support the following options:
* `path` - Defines the nested scope where hits will be collected from.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs a line break here otherwise the documentation is not rendered correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, better to lay this out as:

[horizontal]
`path`::  Defines the nested scope where hits will be collected from.
`type`:: Defines the parent or child type score where hits will be collected from.

etc

@mathieu007
Copy link

This is a nice feature, but i was asking myself how is the inner documents returned. Is it returned by stream reading the source or a desieralization process or is there some kind of index behind the inner docs.

The main reason i am asking this is because i have some very large nested documents and i don't know if returning inner docs his a wise solution in term of performance.

Thank you

@martijnvg
Copy link
Member Author

@mathieu007 It deserialises the _source and retrieves the relevant inner json objects from the _source. The deserialisation is done only once per top level hit, so that shouldn't affect performance too much.

Also if you are concerned about this you can enable stored fields inside nested inner objects in the mapping, that way the _source isn't touched at all and the field values for each returned inner hit can simply be fetched from disk. If you go down this path you may not need to _source at all in ES and you can then disable it.

@martijnvg
Copy link
Member Author

@jpountz @dadoonet @clintongormley I updated the PR with the docs feedback.

@clintongormley
Copy link
Contributor

@martijnvg are we sold on the idea of specifying inner hits within the has_child and nested queries/filters themselves? If so, is there ever a use case for accepting inner_hits at the top level of the search request?

If we go for specifying within the query/filter, then I'd probably add a section on inner hits to the has_child docs, and another section to the nested docs.

@martijnvg
Copy link
Member Author

@clintongormley Yes, specifying the inner_hits on the query/filter level is a good idea and I'm sold on that. I think we should keep the flexibility of defining the inner hits on the top level. This allows an entire different query to be specified that runs in the nested scope. Also defining multiple levels of inner_hits definitions can be defined in a top level inner hits.

@noter
Copy link

noter commented Nov 27, 2014

Is it possible to do global sorting based on inner hits doc filed ?

@martijnvg
Copy link
Member Author

@noter
Copy link

noter commented Nov 27, 2014

Yes thats clear but i was thinking about parent/child docs.

@martijnvg
Copy link
Member Author

Ok, sorting by fields in a child / parent document still needs to be implemented.

The best workaround is to transform the value you'd like to sort by to a number based value and wrap a function_score query that refers to your field in a has_child or has_parent.

@martijnvg martijnvg force-pushed the feature/the_inner_hits branch 2 times, most recently from 4ff7cf0 to 590930b Compare November 27, 2014 21:24
@martijnvg martijnvg force-pushed the feature/the_inner_hits branch from 590930b to 09238e3 Compare November 27, 2014 21:30
@martijnvg
Copy link
Member Author

I updated the PR to be in sync with the recent changes in master.

@martijnvg martijnvg force-pushed the feature/the_inner_hits branch from 09238e3 to 86ebcf5 Compare November 28, 2014 16:03
@mathieu007
Copy link

@martijnvg Just an idea, it would be great if the returned data could be the inner hits + all it's parents,

something like that:

{
    id: "2",
    title: "My root node",
    nestedKey: {
        id: "3",
        title: "Nested data level 1",
        nestedKey2: {
            id: "4",
            title: "Nested data level 2",
            "nestedKey3": {
                id: "5",
                title: "My matching nested document"
            }
        }
    }
}

@portante
Copy link

portante commented Dec 4, 2014

@martijnvg, thanks. Has this work be slated for a particular release yet?

@adrianocrestani
Copy link

@portante , it has v1.5 label on it :)

@pierrre
Copy link

pierrre commented Dec 8, 2014

Can you access inner hits from scripts? (sort or script fields)

@aminakhan85
Copy link

Hi, has this feature of inner hits been implemented in ES Java API?

@martijnvg
Copy link
Member Author

@aminakhan85 Yes, from version 1.5 on the nested, has_child and has_parent query/filter there is a new method with the name innerHit(...) that allows one to define inner hits. You can just pass a QueryInnerHitBuilder instance with no options set.

@aminakhan85
Copy link

@martijnvg. Thanks for the info. Right now im using version 1.4. Ill upgrade to 1.5 and will check it out. Thanks

@aminakhan85
Copy link

Hello, looks like 1.5 official release is not available . When is it expected to be available? the latest I could find is 1.4.2

@martijnvg
Copy link
Member Author

@aminakhan85 There is no date set on a release yet, but I expect it to be released in the coming months.

@aamirl
Copy link

aamirl commented Jan 21, 2015

Hi there, if my understanding of this new feature is correct:

If a document with four objects in a nested field is queried and only objects 1 and 3 match, this will allow us to exclude objects 2 and 4 from the results and pull back only 1 and 3 as part of the 'inner_objects' field.

Does that sound correct?

@martijnvg
Copy link
Member Author

(I think you meant 1 and 3 instead of 1 and 2 being returned?)
In your example inner_hits will return objects 1 and 3 since those inner
object are the matching ones.

On 21 January 2015 at 15:17, Aamir Latif [email protected] wrote:

Hi there, if my understanding of this new feature is correct, this will
solve the following issue:

For example, let's say that a document has four objects in a nested field.
Querying on the required filters results in only matching objects 1 and 3,
but when we get the results via _source, we will pull back the entire
document along with all four objects 1,2,3,4.

This new feature will allow us to exclude objects 2 and 4 from the results
and pull back only 1 and 2 as part of the 'inner_objects' field. Does that
sound correct?


Reply to this email directly or view it on GitHub
#8153 (comment)
.

Met vriendelijke groet,

Martijn van Groningen

@aamirl
Copy link

aamirl commented Jan 21, 2015

Indeed yes, that's exactly what I meant - I updated my edited answer to that but you ended up responding to the email! Thanks a lot 👍 That's a great feature to have.

@tstibbs
Copy link
Contributor

tstibbs commented Feb 16, 2015

Just been trying this out (built from the head of the 1.x branch), looks great, really useful.

Quick bug report (don't want to raise a new issue for this as the feature isn't released yet, but will raise one if you think I should): if using a 'nested' type and you specify your child document as an object rather than an array, you get a classcast. i.e. your doc looks like this

{
  "nested_field": {
    "make": "ford"
  }
}

When searching you will get a classcast like this:

java.lang.ClassCastException: java.util.LinkedHashMap cannot be cast to java.util.List
at org.elasticsearch.search.fetch.FetchPhase.createNestedSearchHit(FetchPhase.java:294)
at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:178)
at org.elasticsearch.search.fetch.innerhits.InnerHitsFetchSubPhase.hitExecute(InnerHitsFetchSubPhase.java:96)
at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:190)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:501)

You can fix it by giving the child as an array, but I think that either the nested hits stuff should support this single child, or it should throw an error when you try to index a document using that syntax.

I'm also interested to know what the rules are for constructing the query that goes in the 'inner_hits' section?

@martijnvg
Copy link
Member Author

@tstibbs Great to hear you're trying it out! This looks like a bug, so can you open issue for it?

All options defined here can be defined on a inner_hits element on a query:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/search-request-inner-hits.html#_options

Beyond that it is also possible to define a top level inner_hits section in the body of the search request, as a standalone element it then also accepts a query element and a inner_hits element for nesting purposes: http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/search-request-inner-hits.html#top-level-inner-hits

Top level inner_hits are useful if one requires inner hits outside of the scope of the query.

@serj-p
Copy link

serj-p commented Mar 10, 2015

The much-anticipated feature

@clintongormley clintongormley added release highlight :Search/Search Search-related issues that do not fall into other categories labels Mar 19, 2015
@clintongormley clintongormley changed the title Core: Add inner hits Add inner hits to nested and parent/child queries Mar 19, 2015
@im-denisenko im-denisenko mentioned this pull request Apr 6, 2015
16 tasks
@martijnvg martijnvg deleted the feature/the_inner_hits branch May 18, 2015 23:28
@balaakurakula
Copy link

I have an issue with usage of inner_hits. Version 1.7.1
My structure is parent: style and child: product. I tried this query:

{
  "query": {
    "filtered": {
      "filter": {
        "or": [
          {
            "and": [
              {
                "term": {
                  "name": "embellished"
                }
              },
              {
                "has_child": {
                  "inner_hits": {
                    "size": 100
                  },
                  "query": {
                    "filtered": {
                      "filter": {
                        "term": {
                          "color": "black"
                        }
                      }
                    }
                  },
                  "type": "product"
                }
              }
            ]
          },
          {
            "and": [
              {
                "term": {
                  "name": "cutout"
                }
              },
              {
                "has_child": {
                  "inner_hits": {
                    "size": 100
                  },
                  "query": {
                    "filtered": {
                      "filter": {
                        "term": {
                          "color": "white"
                        }
                      }
                    }
                  },
                  "type": "product"
                }
              }
            ]
          }
        ]
      }
    }
  }
}

This gives me inner_hits for the styles that are result of the second part of "or" (name: cutout color: white) but not for the styles that are result of the the first "or" query (name: embellished color: black). All the hits for the first or query are 0.

If I reverse the first and second query in "or" there are inner_hits for query (name: embellished color: black) but not for (name: cutout color: white). I am not sure why this happens. ISs my query wrong?

@clintongormley
Copy link
Contributor

@bkuraku both your inner hits are using product as the name, so one overwrites the other. Just specify distinct names, eg:

GET _search
{
  "query": {
    "filtered": {
      "filter": {
        "or": [
          {
            "and": [
              {
                "term": {
                  "name": "embellished"
                }
              },
              {
                "has_child": {
                  "inner_hits": {
                    "size": 100,
                    "name": "embellished"
                  },
                  "query": {
                    "filtered": {
                      "filter": {
                        "term": {
                          "color": "black"
                        }
                      }
                    }
                  },
                  "type": "product"
                }
              }
            ]
          },
          {
            "and": [
              {
                "term": {
                  "name": "cutout"
                }
              },
              {
                "has_child": {
                  "inner_hits": {
                    "size": 100,
                    "name": "cutout"
                  },
                  "query": {
                    "filtered": {
                      "filter": {
                        "term": {
                          "color": "white"
                        }
                      }
                    }
                  },
                  "type": "product"
                }
              }
            ]
          }
        ]
      }
    }
  }
}

@balaakurakula
Copy link

@clintongormley Thanks. That worked like a charm!!

@balaakurakula
Copy link

V 1.7.1
I have style (parent), product (child) and vendors (nested under product). when I try to query styles to get both product inner hits and vendors inner hits it seems to give olny one

{
  "query": {
    "filtered": {
      "filter": {
        "or": [
          {
            "and": [
              {
                "has_child": {
                  "filter": {
                    "or": [
                      {
                        "term": {
                          "color": "green"
                        }
                      },
                      {
                        "nested": {
                          "path": "vendors",
                          "filter": {
                            "terms": {
                              "vendors.vendorId": [
                                1046,
                                1288,
                                1280
                              ]
                            }
                          },
                          "inner_hits": {
                            "size": 100,
                            "name": "vendors1"
                          }
                        }
                      }
                    ]
                  },
                  "inner_hits": {
                    "name": "products1",
                    "size": 100
                  },
                  "type": "product"
                }
              }
            ]
          }
        ]
      }
    }
  }
}

When I do this I only get inner hits of products1 and always the vendors1 hits are 0. There are matching only because of vendorId attribute. Even in that case the inner_hits are in products1 and not in vendors1. Is this expected behavior?

@clintongormley
Copy link
Contributor

@bkuraku Providing just a query without the related mapping and document makes it pretty hard to debug what you're doing. Given that your previous issue was resolved by changing the document structure, I'd suggest asking the full question in the forum first: http://discuss.elastic.co/

Then, if the result of the discussion in the forum indicates that this is a bug, have a look at existing open issues, eg:

If you don't find anything related, then open a new issue with a full recreation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature release highlight :Search/Search Search-related issues that do not fall into other categories v1.5.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Return matching nested inner objects per hit