Skip to content

Commit

Permalink
Expose proximity boosting (#39385)
Browse files Browse the repository at this point in the history
Expose DistanceFeatureQuery for geo, date and date_nanos types

Closes #33382
  • Loading branch information
mayya-sharipova authored Mar 19, 2019
1 parent bd04b4f commit a87b139
Show file tree
Hide file tree
Showing 13 changed files with 783 additions and 5 deletions.
177 changes: 177 additions & 0 deletions docs/reference/query-dsl/distance-feature-query.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
[[query-dsl-distance-feature-query]]
=== Distance Feature Query

The `distance_feature` query is a specialized query that only works
on <<date, `date`>>, <<date_nanos, `date_nanos`>> or <<geo-point,`geo_point`>>
fields. Its goal is to boost documents' scores based on proximity
to some given origin. For example, use this query if you want to
give more weight to documents with dates closer to a certain date,
or to documents with locations closer to a certain location.

This query is called `distance_feature` query, because it dynamically
calculates distances between the given origin and documents' field values,
and use these distances as features to boost the documents' scores.

`distance_feature` query is typically used on its own to find the nearest
neighbors to a given point, or put in a `should` clause of a
<<query-dsl-bool-query,`bool`>> query so that its score is added to the score
of the query.

Compared to using <<query-dsl-function-score-query,`function_score`>> or other
ways to modify the score, this query has the benefit of being able to
efficiently skip non-competitive hits when
<<search-uri-request,`track_total_hits`>> is not set to `true`.

==== Syntax of distance_feature query

`distance_feature` query has the following syntax:
[source,js]
--------------------------------------------------
"distance_feature": {
"field": <field>,
"origin": <origin>,
"pivot": <pivot>,
"boost" : <boost>
}
--------------------------------------------------
// NOTCONSOLE

[horizontal]
`field`::
Required parameter. Defines the name of the field on which to calculate
distances. Must be a field of the type `date`, `date_nanos` or `geo_point`,
and must be indexed (`"index": true`, which is the default) and has
<<doc-values, doc values>> (`"doc_values": true`, which is the default).

`origin`::
Required parameter. Defines a point of origin used for calculating
distances. Must be a date for date and date_nanos fields,
and a geo-point for geo_point fields. Date math (for example `now-1h`) is
supported for a date origin.

`pivot`::
Required parameter. Defines the distance from origin at which the computed
score will equal to a half of the `boost` parameter. Must be
a `number+date unit` ("1h", "10d",...) for date and date_nanos fields,
and a `number + geo unit` ("1km", "12m",...) for geo fields.

`boost`::
Optional parameter with a default value of `1`. Defines the factor by which
to multiply the score. Must be a non-negative float number.


The `distance_feature` query computes a document's score as following:

`score = boost * pivot / (pivot + distance)`

where `distance` is the absolute difference between the origin and
a document's field value.

==== Example using distance_feature query

Let's look at an example. We index several documents containing
information about sales items, such as name, production date,
and location.

[source,js]
--------------------------------------------------
PUT items
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"production_date": {
"type": "date"
},
"location": {
"type": "geo_point"
}
}
}
}
PUT items/_doc/1
{
"name" : "chocolate",
"production_date": "2018-02-01",
"location": [-71.34, 41.12]
}
PUT items/_doc/2
{
"name" : "chocolate",
"production_date": "2018-01-01",
"location": [-71.3, 41.15]
}
PUT items/_doc/3
{
"name" : "chocolate",
"production_date": "2017-12-01",
"location": [-71.3, 41.12]
}
POST items/_refresh
--------------------------------------------------
// CONSOLE

We look for all chocolate items, but we also want chocolates
that are produced recently (closer to the date `now`)
to be ranked higher.

[source,js]
--------------------------------------------------
GET items/_search
{
"query": {
"bool": {
"must": {
"match": {
"name": "chocolate"
}
},
"should": {
"distance_feature": {
"field": "production_date",
"pivot": "7d",
"origin": "now"
}
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

We can look for all chocolate items, but we also want chocolates
that are produced locally (closer to our geo origin)
come first in the result list.

[source,js]
--------------------------------------------------
GET items/_search
{
"query": {
"bool": {
"must": {
"match": {
"name": "chocolate"
}
},
"should": {
"distance_feature": {
"field": "location",
"pivot": "1000m",
"origin": [-71.3, 41.15]
}
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
8 changes: 8 additions & 0 deletions docs/reference/query-dsl/special-queries.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@ the specified document.
A query that computes scores based on the values of numeric features and is
able to efficiently skip non-competitive hits.

<<query-dsl-distance-feature-query,`distance_feature` query>>::

A query that computes scores based on the dynamically computed distances
between the origin and documents' date, date_nanos and geo_point fields.
It is able to efficiently skip non-competitive hits.

<<query-dsl-wrapper-query,`wrapper` query>>::

A query that accepts other queries as json or yaml string.
Expand All @@ -42,4 +48,6 @@ include::percolate-query.asciidoc[]

include::rank-feature-query.asciidoc[]

include::distance-feature-query.asciidoc[]

include::wrapper-query.asciidoc[]
Original file line number Diff line number Diff line change
Expand Up @@ -440,6 +440,7 @@ public enum ValueType {
OBJECT_OR_LONG(START_OBJECT, VALUE_NUMBER),
OBJECT_ARRAY_BOOLEAN_OR_STRING(START_OBJECT, START_ARRAY, VALUE_BOOLEAN, VALUE_STRING),
OBJECT_ARRAY_OR_STRING(START_OBJECT, START_ARRAY, VALUE_STRING),
OBJECT_ARRAY_STRING_OR_NUMBER(START_OBJECT, START_ARRAY, VALUE_STRING, VALUE_NUMBER),
VALUE(VALUE_BOOLEAN, VALUE_NULL, VALUE_EMBEDDED_OBJECT, VALUE_NUMBER, VALUE_STRING),
VALUE_OBJECT_ARRAY(VALUE_BOOLEAN, VALUE_NULL, VALUE_EMBEDDED_OBJECT, VALUE_NUMBER, VALUE_STRING, START_OBJECT, START_ARRAY),
VALUE_ARRAY(VALUE_BOOLEAN, VALUE_NULL, VALUE_NUMBER, VALUE_STRING, START_ARRAY);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
setup:
- skip:
version: " - 7.9.99" #TODO adjust to 7.0.99 after merging to 7.x
reason: "Implemented in 7.1"

- do:
indices.create:
index: index1
body:
settings:
number_of_replicas: 0
mappings:
properties:
my_date:
type: date
my_date_nanos:
type: date_nanos
my_geo:
type: geo_point

- do:
bulk:
refresh: true
body:
- '{ "index" : { "_index" : "index1", "_id" : "1" } }'
- '{ "my_date": "2018-02-01T10:00:00Z", "my_date_nanos": "2018-02-01T00:00:00.223456789Z", "my_geo": [-71.34, 41.13] }'
- '{ "index" : { "_index" : "index1", "_id" : "2" } }'
- '{ "my_date": "2018-02-01T11:00:00Z", "my_date_nanos": "2018-02-01T00:00:00.123456789Z", "my_geo": [-71.34, 41.14] }'
- '{ "index" : { "_index" : "index1", "_id" : "3" } }'
- '{ "my_date": "2018-02-01T09:00:00Z", "my_date_nanos": "2018-02-01T00:00:00.323456789Z", "my_geo": [-71.34, 41.12] }'

---
"test distance_feature query on date type":

- do:
search:
rest_total_hits_as_int: true
index: index1
body:
query:
distance_feature:
field: my_date
pivot: 1h
origin: 2018-02-01T08:00:30Z

- length: { hits.hits: 3 }
- match: { hits.hits.0._id: "3" }
- match: { hits.hits.1._id: "1" }
- match: { hits.hits.2._id: "2" }

---
"test distance_feature query on date_nanos type":

- do:
search:
rest_total_hits_as_int: true
index: index1
body:
query:
distance_feature:
field: my_date_nanos
pivot: 100000000nanos
origin: 2018-02-01T00:00:00.323456789Z

- length: { hits.hits: 3 }
- match: { hits.hits.0._id: "3" }
- match: { hits.hits.1._id: "1" }
- match: { hits.hits.2._id: "2" }

---
"test distance_feature query on geo_point type":

- do:
search:
rest_total_hits_as_int: true
index: index1
body:
query:
distance_feature:
field: my_geo
pivot: 1km
origin: [-71.35, 41.12]

- length: { hits.hits: 3 }
- match: { hits.hits.0._id: "3" }
- match: { hits.hits.1._id: "1" }
- match: { hits.hits.2._id: "2" }
21 changes: 21 additions & 0 deletions server/src/main/java/org/elasticsearch/common/geo/GeoUtils.java
Original file line number Diff line number Diff line change
Expand Up @@ -545,6 +545,27 @@ private static GeoPoint parseGeoHash(GeoPoint point, String geohash, EffectivePo
}
}

/**
* Parse a {@link GeoPoint} from a string. The string must have one of the following forms:
*
* <ul>
* <li>Latitude, Longitude form: <pre>&quot;<i>&lt;latitude&gt;</i>,<i>&lt;longitude&gt;</i>&quot;</pre></li>
* <li>Geohash form:: <pre>&quot;<i>&lt;geohash&gt;</i>&quot;</pre></li>
* </ul>
*
* @param val a String to parse the value from
* @return new parsed {@link GeoPoint}
*/
public static GeoPoint parseFromString(String val) {
GeoPoint point = new GeoPoint();
boolean ignoreZValue = false;
if (val.contains(",")) {
return point.resetFromString(val, ignoreZValue);
} else {
return parseGeoHash(point, val, EffectivePoint.BOTTOM_LEFT);
}
}

/**
* Parse a precision that can be expressed as an integer or a distance measure like "1km", "10m".
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,10 @@ public DateFormatter dateTimeFormatter() {
return dateTimeFormatter;
}

public Resolution resolution() {
return resolution;
}

void setDateTimeFormatter(DateFormatter formatter) {
checkIfFrozen();
this.dateTimeFormatter = formatter;
Expand Down
Loading

0 comments on commit a87b139

Please sign in to comment.