Feature Request: The ability to "join" parent and children #761

merrellb · 2011-03-08T23:47:25Z

There are many times I would like both the parent and children of a record. Currently to find the children of a query (even a has_child query) requires an individual GET for each returned record.

The simplest solution may be to enhance the has_child query, which already specifies parent and children types, allowing the actual children to be returned along with the parents.
Enhance the query DSL to allow the children/parents of any query results to be joined and returned. Perhaps even allowing additional filtering.
Add a join API call.

till · 2011-05-05T22:34:30Z

subscribe

bryangreen · 2011-06-08T22:26:02Z

+1 this would be great

mente · 2011-08-17T11:01:01Z

+1

abh · 2012-01-30T08:48:47Z

I ran into this, too. Nested documents are a bit too closely tied (specifically that you always get all the nested documents back and not just the matching one(s)) and with parent/child documents I can't get both the matching lower level and the upper level back, either – unless I am missing something.

gjb83 · 2012-02-03T19:43:55Z

+1

hlian · 2012-02-09T16:50:17Z

Lucene 3.6 will support a join query: https://issues.apache.org/jira/browse/LUCENE-3602

kevingessner · 2012-02-09T16:51:32Z

Lucene 3.6 added query-time joining: https://issues.apache.org/jira/browse/LUCENE-3602

What's the timeline for ES using Lucene 3.6, @kimchy?

kimchy · 2012-02-12T17:09:08Z

The join query is not really relevant here. Parent child support is similar to the join aspect, its a matter of returning different data set than what is provided now. Note, there will never be a cross shard join in elasticsearch, so any join will happen within a shard, which the parent-child support does now.

kevingessner · 2012-02-13T15:27:37Z

@kimchy Sure, makes sense. I don't actually need full join support -- I really need something more like #792 or #1017, to be able to query the parent's field from a search on the child type.

dhardy92 · 2012-03-12T08:50:19Z

[+1]

Vineeth-Mohan · 2012-06-25T19:49:59Z

+1

Vineeth-Mohan · 2012-06-25T19:51:35Z

+1

nickhoffman · 2012-07-04T14:16:07Z

This would be incredibly useful.

ghost · 2012-08-08T01:37:42Z

Any update on this? Would love to have this rather than having to use seperate requests to get the children.

gjb83 · 2012-09-28T12:56:01Z

+1

keir · 2013-06-10T08:53:54Z

+1 this would be great to have.

mvallebr · 2014-02-20T02:42:30Z

+1

isabel12 · 2014-05-19T02:38:00Z

+1

chaitanya24 · 2014-06-17T12:28:00Z

+1

vedharish · 2014-07-02T07:25:57Z

+1

clintongormley · 2014-07-02T10:04:20Z

So what would the response actually look like? Don't forget that parents and children are separate documents. Presumably you'd want children grouped with parents somehow? A parent may have millions of matching children - how many of those do we return?

The top_hits aggregation #6124 isn't a good solution for this as you would have to aggregate on parent_id, of which there may be millions in the resultset.

By far the most efficient way of doing this is in two queries:

retrieve the top 10 parents matching the query
use an msearch to find (eg) the top 10 children for each parent id

While this requires two steps, it gives you all the flexibility you need which would otherwise have to be provided by adding new structures to the query dsl and to the response.

Anybody want to flesh out this feature request a bit more?

clintongormley · 2014-08-08T09:43:55Z

No further feedback. Closing

jason-mccloskey · 2014-08-20T17:08:20Z

Oh, no! This is the exact feature that will help complete my elasticsearch implementation. Let me give a hypothetical use case for this feature that is analogous to what I need to do in my implementation. Please forgive me for any misgivings as I am fairly new to elasticsearch and brand new to commenting on issues in GitHub.

Use Case: I want to be able to populate a grid of events at parks in a given city, and allow filtering based upon whether the event is at a "safe" park.

Mappings

We want three types here in a grandparent/parent/child relation.

City

curl -XPUT 'http://localhost:9200/parkinfo/city/_mapping' -d '{ 
    "city" : {
        "_id" : { "path" : "cityName" },
        "properties" : {
            "cityName" : { "type" : "string" },
            "state" : { "type" : "string" }
        }
    }
}'

Park

curl -XPUT 'http://localhost:9200/parkinfo/park/_mapping' -d '{ 
    "park" : {
        "_parent":{
            "type" :  "city"
        },
        "_id" : { "path" : "parkName" },
        "properties" : {
            "parkName" : { "type" : "string" },
            "address" : { "type" : "string" }
        }
    }
}'

Park Event

curl -XPUT 'http://localhost:9200/parkinfo/park_event/_mapping' -d '{   
    "park_event" : {
        "_parent":{
            "type" :  "park"
        },
        "properties" : {
            "eventName" : { "type" : "string" },
            "eventType" : { "type" : "string" },
            "time" : { "type" : "date" }
        }
    }
}'

Data

Let's now consider the data that we'd like to put in this index:

Cities

curl -XPUT 'http://localhost:9200/parkinfo/city/SanDiego?routing=SanDiego' -d '{
        "cityName" : "SanDiego",
        "state" : "California"
}'
curl -XPUT 'http://localhost:9200/parkinfo/city/LosAngeles?routing=LosAngeles' -d '{
        "cityName" : "LosAngeles",
        "state" : "California"
}'

Parks in San Diego

curl -XPUT 'http://localhost:9200/parkinfo/park/Balboa?parent=SanDiego&routing=SanDiego' -d '{
        "parkName" : "Balboa",
        "address" : "1549 El Prado"
}'
curl -XPUT 'http://localhost:9200/parkinfo/park/Glen?parent=SanDiego&routing=SanDiego' -d '{
        "parkName" : "Glen",
        "address" : "2149 Orinda Dr"
}'
curl -XPUT 'http://localhost:9200/parkinfo/park/KateSessions?parent=SanDiego&routing=SanDiego' -d '{
        "parkName" : "KateSessions",
        "address" : "5115 Soledad Rd"
}'

Parks in Los Angeles

curl -XPUT 'http://localhost:9200/parkinfo/park/48thSt?parent=LosAngeles&routing=LosAngeles' -d '{
        "parkName" : "48thSt",
        "address" : "4800 South Hoover"
}'
curl -XPUT 'http://localhost:9200/parkinfo/park/Alma?parent=LosAngeles&routing=LosAngeles' -d '{
        "parkName" : "Alma",
        "address" : "21st and Meyler"
}'
curl -XPUT 'http://localhost:9200/parkinfo/park/Canal?parent=LosAngeles&routing=LosAngeles' -d '{
        "parkName" : "Canal",
        "address" : "200 Linnie Canal and Venice"
}'

Events in Parks in San Diego

curl -XPUT 'http://localhost:9200/parkinfo/park_event/1?parent=Balboa&routing=SanDiego' -d '{
        "eventName" : "Scary Stuff",
        "eventType" : "crime",
        "time" : "2014-08-15T22:58:00"
}'
curl -XPUT 'http://localhost:9200/parkinfo/park_event/2?parent=Balboa&routing=SanDiego' -d '{
        "eventName" : "Bocce Ball Summer 2014",
        "eventType" : "tournament",
        "time" : "2014-08-25T12:00:00"
}'
curl -XPUT 'http://localhost:9200/parkinfo/park_event/3?parent=Glen&routing=SanDiego' -d '{
        "eventName" : "Basketball Summer 2014",
        "eventType" : "tournament",
        "time" : "2014-08-23T12:00:00"
}'

Events in Parks in Los Angeles

curl -XPUT 'http://localhost:9200/parkinfo/park_event/4?parent=48thSt&routing=LosAngeles' -d '{
        "eventName" : "More Scary Stuff",
        "eventType" : "crime",
        "time" : "2014-08-15T22:58:00"
}'
curl -XPUT 'http://localhost:9200/parkinfo/park_event/5?parent=Alma&routing=LosAngeles' -d '{
        "eventName" : "Really Scary Stuff",
        "eventType" : "crime",
        "time" : "2014-06-25T23:14:00"
}'
curl -XPUT 'http://localhost:9200/parkinfo/park_event/6?parent=Canal&routing=LosAngeles' -d '{
        "eventName" : "Weight Lifting Summer 2014",
        "eventType" : "tournament",
        "time" : "2014-08-23T12:00:00"
}'

Filtering Stories/Requirements

As a user I want to be able to display only events that will occur in the next X days in a grid
The grid shall have the columns: city, state, park name, address, event name, event type, time
As a user I want to be able to filter for events at safe parks
A park will be determined safe if it has no crime event in the past 4 weeks and there have not been crimes at 2 parks in its the city in the past 3 months

Filtering Implementation

For these 2 requirements, we need a filter to keep only safe parks and then a query to display events in the next X days and join together the data from the 3 generations

Safe Park Filter

This filter must do two things, it must exclude if its parent is considered an unsafe park city and it must exclude if the particular park in question is intrinsically unsafe. The preference would be to be able to do this with a single query. Currently I would expect to have to query cities and save the terms, then use a terms lookup filter.

I see return options as being: none, all, matching, #. None would be the default for all children.

"filter" : {
    "bool" : {
        "type" : "park",
        "must" : {
            "bool" : {
                "type" : "city",
                "return" : "none",
                "must_not" : {
                    "has_child" : {
                        "type" : "park",
                        "min_children": 2,
                        "return" : "all",
                        "filter" : {
                            "must" : {
                                "has_child" : {
                                    "type" : "park_event",
                                    "filter" : {
                                        "bool" : {
                                            "must" : {
                                                "term" : {
                                                    "eventType" : "crime"
                                                    },
                                                "range" : {
                                                    "time" : {
                                                        "gte" : "2014-05-20",
                                                        "lte" : "now"
                                                    }
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        },
        "must_not" : {
            "has_child" : {
                "type" : "park_event",
                "filter" : {
                    "bool" : {
                        "must" : {
                            "term" : {
                                "eventType" : "crime"
                                },
                            "range" : {
                                "time" : {
                                    "gte" : "2014-07-23",   
                                    "lte" : "now"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

I don't believe that you can list a "type" in a bool filter, but I felt it made things much more clear by including it. It also may be required for that sort of functionality.

Returning the data over the next 5 days

As this is to be put into a grid, we would want the data to be denormalized, and perhaps sortable for using from and size. If denormalized is false, it would be an array based return, in the case that people aren't trying to display in a grid.

{
  "denormalized" : "true"
  "filtered": {
        "type" : "park",
        "return" : "matching",
        "query": {
            "has_parent" : {
                "return" : "matching"
            }
            "has_child" : {
                "return" : "matching"
                "range" : {
                    "time" : {
                        "gte" : "now",
                        "lte" : "2014-08-25"
                    }   
                }
            }
        },    
        "filter": "SafeParkFilter"
    }
}

Expected Results

Safe Parks

No parks in Los Angeles should be considered safe because there were multiple parks in LA with crime events in the past 3 months. Balboa Park should also be considered unsafe because of the crime event in the past 4 weeks. This leaves the safe parks as:
Glen
KateSessions

Events at Safe Parks

Given that only Glen Park and Kate Sessions Park are safe parks in this scenario, we should only be returning events from those parks which will be held in the next 5 days

City	State	Park Name	Address	Event Name	Event Type	Time
SanDiego	California	Glen	2149 Orinda Dr	Basketball Summer 2014	tournament	2014-08-23T12:00:00
SanDiego	California	KateSessions	5115 Soledad Rd	Bocce Ball Summer 2014	tournament	2014-08-25T12:00:00

Please let me know if any of this is unclear or doesn't make sense. This is also likely more than the orignal request, but this feature set would be very powerful and is the gap between what I have currently implemented on my project and the toolset I need to finish.

clintongormley · 2014-08-22T13:17:57Z

Hi @ILMN-jmccloskey

Thanks for the detailed example. It feels very much like you are trying to use Elasticsearch as a relational DB, which isn't the best way to use it. I would definitely avoid using grandparent-parent-child relationships as it is very costly, both with joins and the data required to maintain the relationship.

Think about how many times a crime is committed, then how many times your query will run. A much better approach would be to denormalize your data and to update it when you have new events. You want your results to be parks, so you should store all the info you need inside the single park document, including the crimes inside that park and the number of crimes for a particular time period in the city where the park is located.

I suggest reading about the various techniques and tradeoffs here: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/modeling-your-data.html

jason-mccloskey · 2014-08-22T16:20:34Z

Hi @clintongormley

Thanks for responding. I fully agree with you that the example, as given, doesn't lend itself to normalization. I was trying to be breif (ha!) in the data for the example. Imagine that the city, park and park_event all have anywhere from 10 to 50 fields, which should be able to be updated independtely from each other and you are creating many events per month. This isn't the acutal index I am trying to create, only an example for illustrative purposes.

I also am not sure how to fill the requirements of a safe park (2 crimes committed at parks in a particular city within the last 3 months) without the parent/child relationship. It seems to me that this is why the parent/child relationship and min_children were created. Assuming that the example was altered to lend itself to normalization using elasticsearch, are there things that need to be added to demonstrate the value of returning data across documents or perhaps clear up implementation details?

Thank you for the link. I will read further to make sure my actual mapping strategy is appropriate for my implementation. Even outside of the number of fields in a type and the ability to update those types independently, it seems to me that I won't be able to look for parks that have some events and not others, or cities that have some events and not others, without a parent/child relationship (**updated: I could if I used use multiple queries). Do you have any suggestions in that regard?

asanderson · 2014-08-22T17:00:44Z

FWIW, we aggregate data into Elasticsearch from many different disparate sources including unstructured, semi-structured (e.g. XML, RESTful services, etc.), and structured (e.g. relational database records), so our basic schema includes master parent documents (e.g. entities, relationships, etc.), and each of them can have many detail child documents each of which can have dozens and dozens of fields.

We do not want to update the master documents, since most of our data ingest pattern is just adding additional details. The performance is more than acceptable.

Yes, everyone says not to use Elasticsearch (or Solr) as a relational database replacement, but for data that is primarily write-once/read-many, it is more than an adequate solution as we've proved with Solr and now Elasticsearch.

However, without a simple parent/child join capability baked into Elasticseach, it means that every Elasticsearch client must do it the hard way, and pull unnecessary data across the network.

Just my $0.02.

kunklejr · 2014-09-05T13:40:26Z

My situation is similar to @asanderson's. I have a parent document that has one or more child documents containing all the data. They generally don't change but are added all the time. It would be incredibly valuable to search the child documents and get results back in terms of the parent document AND also return the data contained in the children along with the parent.

clintongormley · 2014-11-04T07:51:30Z

Closing in favour of #8153

ofavre mentioned this issue Mar 1, 2012

Access to child/parent fields of a document within a script #1017

Closed

mhoffman mentioned this issue Feb 9, 2014

Feature Request: pre-select terms in TermVector request #3924

Closed

clintongormley added the feedback_needed label Jul 8, 2014

clintongormley mentioned this issue Jul 8, 2014

Parent object in fields result #1891

Closed

clintongormley closed this as completed Aug 8, 2014

clintongormley reopened this Aug 22, 2014

clintongormley assigned martijnvg Aug 22, 2014

clintongormley removed the feedback_needed label Aug 22, 2014

stephanebastian mentioned this issue Sep 24, 2014

Sorting based on parent/child relationship #2917

Closed

clintongormley closed this as completed Nov 4, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: The ability to "join" parent and children #761

Feature Request: The ability to "join" parent and children #761

merrellb commented Mar 8, 2011

till commented May 5, 2011

bryangreen commented Jun 8, 2011

mente commented Aug 17, 2011

abh commented Jan 30, 2012

gjb83 commented Feb 3, 2012

hlian commented Feb 9, 2012

kevingessner commented Feb 9, 2012

kimchy commented Feb 12, 2012

kevingessner commented Feb 13, 2012

dhardy92 commented Mar 12, 2012

Vineeth-Mohan commented Jun 25, 2012

Vineeth-Mohan commented Jun 25, 2012

nickhoffman commented Jul 4, 2012

ghost commented Aug 8, 2012

gjb83 commented Sep 28, 2012

keir commented Jun 10, 2013

mvallebr commented Feb 20, 2014

isabel12 commented May 19, 2014

chaitanya24 commented Jun 17, 2014

vedharish commented Jul 2, 2014

clintongormley commented Jul 2, 2014

clintongormley commented Aug 8, 2014

jason-mccloskey commented Aug 20, 2014

clintongormley commented Aug 22, 2014

jason-mccloskey commented Aug 22, 2014

asanderson commented Aug 22, 2014

kunklejr commented Sep 5, 2014

clintongormley commented Nov 4, 2014

Feature Request: The ability to "join" parent and children #761

Feature Request: The ability to "join" parent and children #761

Comments

merrellb commented Mar 8, 2011

till commented May 5, 2011

bryangreen commented Jun 8, 2011

mente commented Aug 17, 2011

abh commented Jan 30, 2012

gjb83 commented Feb 3, 2012

hlian commented Feb 9, 2012

kevingessner commented Feb 9, 2012

kimchy commented Feb 12, 2012

kevingessner commented Feb 13, 2012

dhardy92 commented Mar 12, 2012

Vineeth-Mohan commented Jun 25, 2012

Vineeth-Mohan commented Jun 25, 2012

nickhoffman commented Jul 4, 2012

ghost commented Aug 8, 2012

gjb83 commented Sep 28, 2012

keir commented Jun 10, 2013

mvallebr commented Feb 20, 2014

isabel12 commented May 19, 2014

chaitanya24 commented Jun 17, 2014

vedharish commented Jul 2, 2014

clintongormley commented Jul 2, 2014

clintongormley commented Aug 8, 2014

jason-mccloskey commented Aug 20, 2014

Mappings

City

Park

Park Event

Data

Cities

Parks in San Diego

Parks in Los Angeles

Events in Parks in San Diego

Events in Parks in Los Angeles

Filtering Stories/Requirements

Filtering Implementation

Safe Park Filter

Returning the data over the next 5 days

Expected Results

Safe Parks

Events at Safe Parks

clintongormley commented Aug 22, 2014

jason-mccloskey commented Aug 22, 2014

asanderson commented Aug 22, 2014

kunklejr commented Sep 5, 2014

clintongormley commented Nov 4, 2014