Skip to content

Commit

Permalink
Rework TextQuery query component
Browse files Browse the repository at this point in the history
Note that this only has an effect on ranking at present if the
"debug=new_weighting" query parameter is passed.

TextQuery now builds a boolean query as follows:

Required matches
----------------

The "must clause" contains a search across the `all_searchable_text`
field (instead of the `_all` field), which returns the highest weight
obtained by searching for the query without synonyms, or with synonyms.
(Plus a small bonus if it matches both ways).

This means that we get back all documents which match any of the
searchable text fields (which are all copied into
`all_searchable_text`), whether with or without synonym expansion.

An exact match will typically match both with and without expansion, so
will get the small bonus score.

Optional matches
----------------

The "should clause", which is used to boost the weight of things which
are already going to be returned, gives a score which is the sum of
searching for the text in several different ways.  Each of these ways is
tried against several specific fields, and the highest scoring match for
a field is the contribution used.

The specific matches performed are:

 - just searching for the words in a single field.  This means that if
   all (or most) the words are found in a single field the document gets a higher
   weight.
 - searching for the full query as a phrase in a single field.  This
   doesn't exclude there being more words in the field as well as those
   in the query, but means that if the exact set of words in the query
   appears in the document, the document is going to be ranked quite a
   bit more highly.
 - searching for all the words in the query occurring in a single field.
   This is similar to the phrase search, but doesn't require the words
   to occur together in the right order.
 - searching for the words in a single field after doing synonym
   expansion.

The results of these matches per field are boosted using a custom value
for each field.

Using only the highest score from each of these matches across the
fields has several benefits, but the main one here is that it allows us
to perform this match on as many fields as we like without risking one
measure (eg, the search for the words) overwhelming the others.  See
https://www.elastic.co/guide/en/elasticsearch/reference/1.x/query-dsl-dis-max-query.html
for more details.
  • Loading branch information
Richard Boulton committed May 28, 2015
1 parent a04cf1d commit f55006f
Show file tree
Hide file tree
Showing 2 changed files with 75 additions and 52 deletions.
119 changes: 71 additions & 48 deletions lib/query_components/text_query.rb
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
module QueryComponents
class TextQuery < BaseComponent
DEFAULT_QUERY_ANALYZER = "query_with_old_synonyms"
DEFAULT_QUERY_ANALYZER_WITHOUT_SYNONYMS = 'default'

# TODO: The `score` here doesn't actually do anything.
# Fields that we want to do a field-specific match for, together with a
# boost value used for that match.
MATCH_FIELDS = {
"title" => 5,
"acronym" => 5, # Ensure that organisations rank brilliantly for their acronym
"acronym" => 5,
"description" => 2,
"indexable_content" => 1,
}
Expand Down Expand Up @@ -47,69 +45,94 @@ def payload
}
end

private
private

def must_conditions
[query_string_query]
[all_searchable_text_query]
end

def all_searchable_text_query
# Return the highest weight obtained by searching for the text when
# analyzed in different ways (with a small bonus if it matches in
# multiple of these ways).
queries = []
queries << match_query(:all_searchable_text, search_term)
queries << match_query(:"all_searchable_text.synonym", search_term) unless debug[:disable_synonyms]
dis_max_query(queries, tie_breaker: 0.1)
end

def should_conditions
exact_field_boosts + [ exact_match_boost, shingle_token_filter_boost ]
groups = []
groups << field_boosts_words
groups << field_boosts_phrase
groups << field_boosts_all_terms
groups << field_boosts_synonyms unless debug[:disable_synonyms]

groups.map { |queries|
dis_max_query(queries)
}
end

def query_string_query
{
match: {
_all: {
query: escape(search_term),
analyzer: query_analyzer,
minimum_should_match: MINIMUM_SHOULD_MATCH,
}
}
def field_boosts_words
# Return the highest weight found by looking for a word-based match in
# individual fields
MATCH_FIELDS.map { |field_name, boost|
match_query(field_name, search_term, boost: boost)
}
end

def exact_field_boosts
MATCH_FIELDS.map do |field_name, _|
{
match_phrase: {
field_name => {
query: escape(search_term),
analyzer: query_analyzer,
}
}
}
end
def field_boosts_phrase
# Return the highest weight found by looking for a phrase match in
# individual fields
MATCH_FIELDS.map { |field_name, boost|
match_query(field_name, search_term, type: :phrase, boost: boost)
}
end

def exact_match_boost
{
multi_match: {
query: escape(search_term),
operator: "and",
fields: MATCH_FIELDS.keys,
analyzer: query_analyzer
}
def field_boosts_all_terms
# Return the highest weight found by looking for a match of all terms
# individual fields
MATCH_FIELDS.map { |field_name, boost|
match_query(field_name, search_term, type: :boolean, operator: :and, boost: boost)
}
end

def shingle_token_filter_boost
{
multi_match: {
query: escape(search_term),
operator: "or",
fields: MATCH_FIELDS.keys,
analyzer: "shingled_query_analyzer"
}
def field_boosts_synonyms
# Return the highest weight found by looking for a synonym-expanded word
# match in individual fields
MATCH_FIELDS.map { |field_name, boost|
match_query("#{field_name}.synonym", search_term, boost: boost)
}
end

def query_analyzer
if debug[:disable_synonyms]
DEFAULT_QUERY_ANALYZER_WITHOUT_SYNONYMS
def dis_max_query(queries, tie_breaker: 0.0, boost: 1.0)
# Calculates a score by running all the queries, and taking the maximum.
# Adds in the scores for the other queries multiplied by `tie_breaker`.
if queries.size == 1
queries.first
else
DEFAULT_QUERY_ANALYZER
{
dis_max: {
queries: queries,
tie_breaker: tie_breaker,
boost: boost,
}
}
end
end

def match_query(field_name, query, type: :boolean, boost: 1.0, operator: :or)
{
match: {
field_name => {
type: type,
boost: boost,
query: query,
minimum_should_match: MINIMUM_SHOULD_MATCH,
operator: operator,
}
}
}
end
end
end
8 changes: 4 additions & 4 deletions test/unit/query_components/text_query_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,20 @@

class TextQueryTest < ShouldaUnitTestCase
context "search with debug disabling use of synonyms" do
should "use the query_with_old_synonyms analyzer" do
should "use the all_searchable_text.synonym field" do
builder = QueryComponents::TextQuery.new(search_query_params)

query = builder.payload

assert_match(/query_with_old_synonyms/, query.to_s)
assert_match(/all_searchable_text.synonym/, query.to_s)
end

should "not use the query_with_old_synonyms analyzer" do
should "not use the all_searchable_text.synonym field" do
builder = QueryComponents::TextQuery.new(search_query_params(debug: { disable_synonyms: true }))

query = builder.payload

refute_match(/query_with_old_synonyms/, query.to_s)
refute_match(/all_searchable_text.synonym/, query.to_s)
end
end
end

0 comments on commit f55006f

Please sign in to comment.