Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Note that this only has an effect on ranking at present if the "debug=new_weighting" query parameter is passed. TextQuery now builds a boolean query as follows: Required matches ---------------- The "must clause" contains a search across the `all_searchable_text` field (instead of the `_all` field), which returns the highest weight obtained by searching for the query without synonyms, or with synonyms. (Plus a small bonus if it matches both ways). This means that we get back all documents which match any of the searchable text fields (which are all copied into `all_searchable_text`), whether with or without synonym expansion. An exact match will typically match both with and without expansion, so will get the small bonus score. Optional matches ---------------- The "should clause", which is used to boost the weight of things which are already going to be returned, gives a score which is the sum of searching for the text in several different ways. Each of these ways is tried against several specific fields, and the highest scoring match for a field is the contribution used. The specific matches performed are: - just searching for the words in a single field. This means that if all (or most) the words are found in a single field the document gets a higher weight. - searching for the full query as a phrase in a single field. This doesn't exclude there being more words in the field as well as those in the query, but means that if the exact set of words in the query appears in the document, the document is going to be ranked quite a bit more highly. - searching for all the words in the query occurring in a single field. This is similar to the phrase search, but doesn't require the words to occur together in the right order. - searching for the words in a single field after doing synonym expansion. The results of these matches per field are boosted using a custom value for each field. Using only the highest score from each of these matches across the fields has several benefits, but the main one here is that it allows us to perform this match on as many fields as we like without risking one measure (eg, the search for the words) overwhelming the others. See https://www.elastic.co/guide/en/elasticsearch/reference/1.x/query-dsl-dis-max-query.html for more details.
- Loading branch information