-
Notifications
You must be signed in to change notification settings - Fork 25k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Expose DistanceFeatureQuery for geo, date and date_nanos types Closes #33382
- Loading branch information
1 parent
bd04b4f
commit a87b139
Showing
13 changed files
with
783 additions
and
5 deletions.
There are no files selected for viewing
177 changes: 177 additions & 0 deletions
177
docs/reference/query-dsl/distance-feature-query.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,177 @@ | ||
[[query-dsl-distance-feature-query]] | ||
=== Distance Feature Query | ||
|
||
The `distance_feature` query is a specialized query that only works | ||
on <<date, `date`>>, <<date_nanos, `date_nanos`>> or <<geo-point,`geo_point`>> | ||
fields. Its goal is to boost documents' scores based on proximity | ||
to some given origin. For example, use this query if you want to | ||
give more weight to documents with dates closer to a certain date, | ||
or to documents with locations closer to a certain location. | ||
|
||
This query is called `distance_feature` query, because it dynamically | ||
calculates distances between the given origin and documents' field values, | ||
and use these distances as features to boost the documents' scores. | ||
|
||
`distance_feature` query is typically used on its own to find the nearest | ||
neighbors to a given point, or put in a `should` clause of a | ||
<<query-dsl-bool-query,`bool`>> query so that its score is added to the score | ||
of the query. | ||
|
||
Compared to using <<query-dsl-function-score-query,`function_score`>> or other | ||
ways to modify the score, this query has the benefit of being able to | ||
efficiently skip non-competitive hits when | ||
<<search-uri-request,`track_total_hits`>> is not set to `true`. | ||
|
||
==== Syntax of distance_feature query | ||
|
||
`distance_feature` query has the following syntax: | ||
[source,js] | ||
-------------------------------------------------- | ||
"distance_feature": { | ||
"field": <field>, | ||
"origin": <origin>, | ||
"pivot": <pivot>, | ||
"boost" : <boost> | ||
} | ||
-------------------------------------------------- | ||
// NOTCONSOLE | ||
|
||
[horizontal] | ||
`field`:: | ||
Required parameter. Defines the name of the field on which to calculate | ||
distances. Must be a field of the type `date`, `date_nanos` or `geo_point`, | ||
and must be indexed (`"index": true`, which is the default) and has | ||
<<doc-values, doc values>> (`"doc_values": true`, which is the default). | ||
|
||
`origin`:: | ||
Required parameter. Defines a point of origin used for calculating | ||
distances. Must be a date for date and date_nanos fields, | ||
and a geo-point for geo_point fields. Date math (for example `now-1h`) is | ||
supported for a date origin. | ||
|
||
`pivot`:: | ||
Required parameter. Defines the distance from origin at which the computed | ||
score will equal to a half of the `boost` parameter. Must be | ||
a `number+date unit` ("1h", "10d",...) for date and date_nanos fields, | ||
and a `number + geo unit` ("1km", "12m",...) for geo fields. | ||
|
||
`boost`:: | ||
Optional parameter with a default value of `1`. Defines the factor by which | ||
to multiply the score. Must be a non-negative float number. | ||
|
||
|
||
The `distance_feature` query computes a document's score as following: | ||
|
||
`score = boost * pivot / (pivot + distance)` | ||
|
||
where `distance` is the absolute difference between the origin and | ||
a document's field value. | ||
|
||
==== Example using distance_feature query | ||
|
||
Let's look at an example. We index several documents containing | ||
information about sales items, such as name, production date, | ||
and location. | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
PUT items | ||
{ | ||
"mappings": { | ||
"properties": { | ||
"name": { | ||
"type": "keyword" | ||
}, | ||
"production_date": { | ||
"type": "date" | ||
}, | ||
"location": { | ||
"type": "geo_point" | ||
} | ||
} | ||
} | ||
} | ||
PUT items/_doc/1 | ||
{ | ||
"name" : "chocolate", | ||
"production_date": "2018-02-01", | ||
"location": [-71.34, 41.12] | ||
} | ||
PUT items/_doc/2 | ||
{ | ||
"name" : "chocolate", | ||
"production_date": "2018-01-01", | ||
"location": [-71.3, 41.15] | ||
} | ||
PUT items/_doc/3 | ||
{ | ||
"name" : "chocolate", | ||
"production_date": "2017-12-01", | ||
"location": [-71.3, 41.12] | ||
} | ||
POST items/_refresh | ||
-------------------------------------------------- | ||
// CONSOLE | ||
|
||
We look for all chocolate items, but we also want chocolates | ||
that are produced recently (closer to the date `now`) | ||
to be ranked higher. | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET items/_search | ||
{ | ||
"query": { | ||
"bool": { | ||
"must": { | ||
"match": { | ||
"name": "chocolate" | ||
} | ||
}, | ||
"should": { | ||
"distance_feature": { | ||
"field": "production_date", | ||
"pivot": "7d", | ||
"origin": "now" | ||
} | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
// TEST[continued] | ||
|
||
We can look for all chocolate items, but we also want chocolates | ||
that are produced locally (closer to our geo origin) | ||
come first in the result list. | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET items/_search | ||
{ | ||
"query": { | ||
"bool": { | ||
"must": { | ||
"match": { | ||
"name": "chocolate" | ||
} | ||
}, | ||
"should": { | ||
"distance_feature": { | ||
"field": "location", | ||
"pivot": "1000m", | ||
"origin": [-71.3, 41.15] | ||
} | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
// TEST[continued] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
87 changes: 87 additions & 0 deletions
87
rest-api-spec/src/main/resources/rest-api-spec/test/search/250_distance_feature.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
setup: | ||
- skip: | ||
version: " - 7.9.99" #TODO adjust to 7.0.99 after merging to 7.x | ||
reason: "Implemented in 7.1" | ||
|
||
- do: | ||
indices.create: | ||
index: index1 | ||
body: | ||
settings: | ||
number_of_replicas: 0 | ||
mappings: | ||
properties: | ||
my_date: | ||
type: date | ||
my_date_nanos: | ||
type: date_nanos | ||
my_geo: | ||
type: geo_point | ||
|
||
- do: | ||
bulk: | ||
refresh: true | ||
body: | ||
- '{ "index" : { "_index" : "index1", "_id" : "1" } }' | ||
- '{ "my_date": "2018-02-01T10:00:00Z", "my_date_nanos": "2018-02-01T00:00:00.223456789Z", "my_geo": [-71.34, 41.13] }' | ||
- '{ "index" : { "_index" : "index1", "_id" : "2" } }' | ||
- '{ "my_date": "2018-02-01T11:00:00Z", "my_date_nanos": "2018-02-01T00:00:00.123456789Z", "my_geo": [-71.34, 41.14] }' | ||
- '{ "index" : { "_index" : "index1", "_id" : "3" } }' | ||
- '{ "my_date": "2018-02-01T09:00:00Z", "my_date_nanos": "2018-02-01T00:00:00.323456789Z", "my_geo": [-71.34, 41.12] }' | ||
|
||
--- | ||
"test distance_feature query on date type": | ||
|
||
- do: | ||
search: | ||
rest_total_hits_as_int: true | ||
index: index1 | ||
body: | ||
query: | ||
distance_feature: | ||
field: my_date | ||
pivot: 1h | ||
origin: 2018-02-01T08:00:30Z | ||
|
||
- length: { hits.hits: 3 } | ||
- match: { hits.hits.0._id: "3" } | ||
- match: { hits.hits.1._id: "1" } | ||
- match: { hits.hits.2._id: "2" } | ||
|
||
--- | ||
"test distance_feature query on date_nanos type": | ||
|
||
- do: | ||
search: | ||
rest_total_hits_as_int: true | ||
index: index1 | ||
body: | ||
query: | ||
distance_feature: | ||
field: my_date_nanos | ||
pivot: 100000000nanos | ||
origin: 2018-02-01T00:00:00.323456789Z | ||
|
||
- length: { hits.hits: 3 } | ||
- match: { hits.hits.0._id: "3" } | ||
- match: { hits.hits.1._id: "1" } | ||
- match: { hits.hits.2._id: "2" } | ||
|
||
--- | ||
"test distance_feature query on geo_point type": | ||
|
||
- do: | ||
search: | ||
rest_total_hits_as_int: true | ||
index: index1 | ||
body: | ||
query: | ||
distance_feature: | ||
field: my_geo | ||
pivot: 1km | ||
origin: [-71.35, 41.12] | ||
|
||
- length: { hits.hits: 3 } | ||
- match: { hits.hits.0._id: "3" } | ||
- match: { hits.hits.1._id: "1" } | ||
- match: { hits.hits.2._id: "2" } |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.