forked from elastic/elasticsearch
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add the ability to set the number of hits to track accurately
In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested. It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the `track_total_hits` search option. A boolean value (`true`, `false`) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough hits have been collected. In order to ensure that the result is correctly interpreted this commit also adds a new section in the search response that indicates the number of tracked hits and whether the value is a lower bound (`gte`) or the exact count (`eq`): ``` GET /_search { "track_total_hits": 100, "query": { "term": { "title": "fast" } } } ``` ... will return: ``` { "_shards": ... "hits" : { "total" : -1, "tracked_total": { "value": 100, "relation": "gte" }, "max_score" : 0.42, "hits" : [] } } ``` Relates elastic#33028
- Loading branch information
Showing
25 changed files
with
515 additions
and
109 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
127 changes: 127 additions & 0 deletions
127
docs/reference/search/request/track-total-hits.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
[[search-request-track-total-hits]] | ||
=== Track total hits | ||
|
||
The `track_total_hits` parameter allows you to configure the number of hits to | ||
count accurately. | ||
When set to `true` the search response will contain the total number of hits | ||
that match the query: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET /_search | ||
{ | ||
"track_total_hits": true, | ||
"query" : { | ||
"match_all" : {} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
|
||
\... returns: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"_shards": ... | ||
"hits" : { | ||
"total" : 2048, <1> | ||
"max_score" : 1.0, | ||
"hits" : [] | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/] | ||
// TESTRESPONSE[s/"total": 2048/"total": $body.hits.total/] | ||
|
||
<1> The total number of hits that match the query. | ||
|
||
If you don't need to track the total number of hits you can set this option | ||
to `false`. In such case the total number of hits is unknown and the search | ||
can efficiently skip non-competitive hits if the query is sorted by relevancy: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET /_search | ||
{ | ||
"track_total_hits": false, | ||
"query": { | ||
"term": { | ||
"title": "fast" | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
|
||
\... returns: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"_shards": ... | ||
"hits" : { | ||
"total" : -1, <1> | ||
"max_score" : 0.42, | ||
"hits" : [] | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/] | ||
// TESTRESPONSE[s/"max_score": 0\.42/"max_score": $body.hits.max_score/] | ||
|
||
<1> The total number of hits is unknown. | ||
|
||
The total hit count can't be computed accurately without visiting all matches, | ||
which is costly for queries that match lots of documents. Given that it is | ||
often enough to have a lower bounds of the number of hits, such as | ||
"there are more than 1000 hits", it is also possible to set `track_total_hits` | ||
as an integer that represents the number of hits to count accurately. When this | ||
option is set as a number the search response will contain a new section called | ||
`tracked_total` that contains the number of tracked hits (`tracked_total.value`) | ||
and a relation (`tracked_total.relation`) that indicates if the `value` is | ||
accurate (`eq`) or a lower bound of the total hit count (`gte`): | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET /_search | ||
{ | ||
"track_total_hits": 100, | ||
"query": { | ||
"term": { | ||
"title": "fast" | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
|
||
\... returns: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"_shards": ... | ||
"hits" : { | ||
"total" : -1, <1> | ||
"tracked_total": { <2> | ||
"value": 100, | ||
"relation": "gte" | ||
}, | ||
"max_score" : 0.42, | ||
"hits" : [] | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/] | ||
// TESTRESPONSE[s/"max_score": 0\.42/"max_score": $body.hits.max_score/] | ||
// TESTRESPONSE[s/"value": 100/"value": $body.hits.tracked_total.value/] | ||
// TESTRESPONSE[s/"relation": "gte"/"relation": "$body.hits.tracked_total.relation"/] | ||
|
||
<1> The total number of hits is unknown. | ||
<2> There are at least (`gte`) 100 documents that match the query. | ||
|
||
Search can also skip non-competitive hits if the query is sorted by | ||
relevancy but the optimization kicks in only after collecting at least | ||
$`track_total_hits` documents. This is a good trade off to speed up searches | ||
if you don't need the accurate number of hits after a certain threshold. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
90 changes: 90 additions & 0 deletions
90
rest-api-spec/src/main/resources/rest-api-spec/test/search/230_track_total_hits.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
--- | ||
"Track total hits": | ||
|
||
- skip: | ||
version: " - 6.99.99" | ||
reason: track_total_hits was introduced in 7.0.0 | ||
|
||
- do: | ||
search: | ||
index: test_1 | ||
track_total_hits: false | ||
|
||
- match: { hits.total: -1 } | ||
- is_false: "hits.tracked_total" | ||
|
||
- do: | ||
search: | ||
index: test_1 | ||
track_total_hits: true | ||
|
||
- match: { hits.total: 0 } | ||
- is_false: "hits.tracked_total" | ||
|
||
- do: | ||
search: | ||
index: test_1 | ||
track_total_hits: 10 | ||
|
||
- match: { hits.total: -1 } | ||
- match: { hits.tracked_total.value: 0 } | ||
- match: { hits.tracked_total.relation: "eq" } | ||
|
||
- do: | ||
index: | ||
index: test_1 | ||
id: 1 | ||
body: {} | ||
|
||
- do: | ||
index: | ||
index: test_1 | ||
id: 2 | ||
body: {} | ||
|
||
- do: | ||
index: | ||
index: test_1 | ||
id: 3 | ||
body: {} | ||
|
||
- do: | ||
index: | ||
index: test_1 | ||
id: 4 | ||
body: {} | ||
|
||
- do: | ||
indices.refresh: {} | ||
|
||
- do: | ||
search: | ||
index: test_1 | ||
|
||
- match: { hits.total: 4 } | ||
|
||
- do: | ||
search: | ||
index: test_1 | ||
track_total_hits: false | ||
|
||
- match: { hits.total: -1 } | ||
- is_false: "hits.tracked_total" | ||
|
||
- do: | ||
search: | ||
index: test_1 | ||
track_total_hits: 10 | ||
|
||
- match: { hits.total: -1 } | ||
- match: { hits.tracked_total.value: 4 } | ||
- match: { hits.tracked_total.relation: "eq" } | ||
|
||
- do: | ||
search: | ||
index: test_1 | ||
track_total_hits: 3 | ||
|
||
- match: { hits.total: -1 } | ||
- match: { hits.tracked_total.value: 3 } | ||
- match: { hits.tracked_total.relation: "gte" } |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.