-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Function Score: Add optional weight parameter per function #7137
Conversation
++ to deprecating |
@@ -69,6 +74,9 @@ First, each document is scored by the defined functions. The parameter | |||
`max`:: maximum score is used | |||
`min`:: minimum score is used | |||
|
|||
Because scores can be on different scales (for example, between 0 and 1 for decay functions but arbitrary for `random_score`), the score of each function can be adjusted with a user defined `weight` (added[1.4.0]). The `weight` can be defined per function in the `functions` array (example above) and is multiplied with the score computed by the respective function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arbitrary scale for random_score
might be fixed soon so maybe we should rather give another example?
You added the weight as a common property of all functions, but I'm wondering if that wouldn't make more sense to keep that out of ScoreFunction and to apply the weight on top of the function. For example, maybe we could have a WeightFunction that would wrap a weight (double) and another function, and when a weight that is != 1 would be provided, we would wrap the function in a WeightFunction instance? |
Thanks for the review! Implemented all suggestions. |
|
||
@Override | ||
public String toString() { | ||
return "weight[" + getWeight() + "]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it mention the wrapped function as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explanation contains the wrapped function already, example below. Is this what you meant?
In the example the string describing the decay function misses a [
, will fix in a different commit.
{
"explain": true,
"query": {
"function_score": {
"query": {
"match": {
"text": "foo"
}
},
"functions": [
{
"linear": {
"number": {
"origin": "1",
"scale": "5"
}
},
"weight": 10
}
]
}
}
}
result:
"_explanation": {
"value": 1.726047,
"description": "function score, product of:",
"details": [
{
"value": 0.19178301,
"description": "weight(text:foo in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.19178301,
"description": "fieldWeight in 0, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 0.30685282,
"description": "idf(docFreq=1, maxDocs=1)"
},
{
"value": 0.625,
"description": "fieldNorm(doc=0)"
}
]
}
]
},
{
"value": 9,
"description": "Math.min of",
"details": [
{
"value": 9,
"description": "product of:",
"details": [
{
"value": 0.9,
"description": "Function for field number:",
"details": [
{
"value": 0.9,
"description": "max(0.0, ((10.0 - MINMath.max(Math.abs(2.0(=doc value) - 1.0(=origin))) - 0.0(=offset), 0)])/10.0)"
}
]
},
{
"value": 10,
"description": "weight"
}
]
},
{
"value": 3.4028235e+38,
"description": "maxBoost"
}
]
},
{
"value": 1,
"description": "queryBoost"
}
]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget it, has nothing to do with the explanation at all...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is an artifact left over from copying BoostScoreFunction.The method is actually not needed at all. I will remove it.
@brwe Left some more comments |
Implemented all. However, while I took a closer look at the score explanation I found several issues #7257 #7248 #7245 . I would like to wait until these are merged because then I can write a proper test for explain. The current one is ugly: https://github.com/elasticsearch/elasticsearch/pull/7137/files#diff-f10a23353b65e87c30eb7c037e718f1fR97 |
d2b19c8
to
30b0c17
Compare
Rebased on master and fixed the explain test now. Ready for the next review. |
@@ -9,7 +9,7 @@ the score on a filtered set of documents. | |||
`function_score` provides the same functionality that | |||
`custom_boost_factor`, `custom_score` and | |||
`custom_filters_score` provided | |||
but furthermore adds futher scoring functionality such as | |||
but furthermore adds further scoring functionality such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was odd wording to begin with. Can we change to:
"but with additional capabilities such as"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's better
Thanks for the review! Addressed all comments and ready for round four. |
… even if disabled Settings that are not default for _size, _index and _timestamp were only build in toXContent if these fields were actually enabled. _timestamp, _index and _size can be dynamically enabled or disabled. Therfore the settings must be kept, even if the field is disabled. (Dynamic enabling/disabling was intended, see TimestampFieldMapper.merge(..) and SizeMappingTests#testThatDisablingWorksWhenMerging but actually never worked, see below). To avoid that _timestamp is overwritten by a default mapping this commit also adds a check to mapping merging if the type is already in the mapping. In this case the default is not applied anymore. (see SimpleTimestampTests#testThatUpdatingMappingShouldNotRemoveTimestampConfiguration) As a side effect, this fixes - overwriting of paramters from the _source field by default mappings (see DefaultSourceMappingTests). - dynamic enabling and disabling of _timestamp and _size () (see SimpleTimestampTests#testThatTimestampCanBeSwitchedOnAndOff and SizeMappingIntegrationTests#testThatTimestampCanBeSwitchedOnAndOff ) Tests: Enable UpdateMappingOnClusterTests#test_doc_valuesInvalidMappingOnUpdate again The missing settings in the mapping for _timestamp, _index and _size caused a the failure: When creating a mapping which has settings other than default and the field disabled, still empty field mappings were built from the type mappers. When creating such a mapping, the mapping source on master and the rest of the cluster can be out of sync for some time: 1. Master creates the index with source _timestamp:{_store:true} mapper classes are in a correct state but source is _timestamp:{} 2. Nodes update mapping and refresh source which then completely misses _timestamp 3. After a while source is refreshed again also on master and the _timestamp:{} vanishes there also. The test UpdateMappingOnCusterTests#test_doc_valuesInvalidMappingOnUpdate failed because the cluster state was sampled from master between 1. and 3. because the randomized testing injected a default mapping with disabled _size and _timestamp fields that have settings which are not default. The test TimestampMappingTests#testThatDisablingFieldMapperDoesNotReturnAnyUselessInfo must be removed because it actualy expected the timestamp to remove parameters when it was disabled. closes elastic#7137
|
||
@Override | ||
public String getName() { | ||
throw new UnsupportedOperationException("weight factor is never supposed to be parsed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is part of the client API, I think it needs to be implemented?
Just left one minor comment, other than that LGTM |
… even if disabled Settings that are not default for _size, _index and _timestamp were only build in toXContent if these fields were actually enabled. _timestamp, _index and _size can be dynamically enabled or disabled. Therfore the settings must be kept, even if the field is disabled. (Dynamic enabling/disabling was intended, see TimestampFieldMapper.merge(..) and SizeMappingTests#testThatDisablingWorksWhenMerging but actually never worked, see below). To avoid that _timestamp is overwritten by a default mapping this commit also adds a check to mapping merging if the type is already in the mapping. In this case the default is not applied anymore. (see SimpleTimestampTests#testThatUpdatingMappingShouldNotRemoveTimestampConfiguration) As a side effect, this fixes - overwriting of paramters from the _source field by default mappings (see DefaultSourceMappingTests). - dynamic enabling and disabling of _timestamp and _size () (see SimpleTimestampTests#testThatTimestampCanBeSwitchedOnAndOff and SizeMappingIntegrationTests#testThatTimestampCanBeSwitchedOnAndOff ) Tests: Enable UpdateMappingOnClusterTests#test_doc_valuesInvalidMappingOnUpdate again The missing settings in the mapping for _timestamp, _index and _size caused a the failure: When creating a mapping which has settings other than default and the field disabled, still empty field mappings were built from the type mappers. When creating such a mapping, the mapping source on master and the rest of the cluster can be out of sync for some time: 1. Master creates the index with source _timestamp:{_store:true} mapper classes are in a correct state but source is _timestamp:{} 2. Nodes update mapping and refresh source which then completely misses _timestamp 3. After a while source is refreshed again also on master and the _timestamp:{} vanishes there also. The test UpdateMappingOnCusterTests#test_doc_valuesInvalidMappingOnUpdate failed because the cluster state was sampled from master between 1. and 3. because the randomized testing injected a default mapping with disabled _size and _timestamp fields that have settings which are not default. The test TimestampMappingTests#testThatDisablingFieldMapperDoesNotReturnAnyUselessInfo must be removed because it actualy expected the timestamp to remove parameters when it was disabled. closes #7137
… even if disabled Settings that are not default for _size, _index and _timestamp were only build in toXContent if these fields were actually enabled. _timestamp, _index and _size can be dynamically enabled or disabled. Therfore the settings must be kept, even if the field is disabled. (Dynamic enabling/disabling was intended, see TimestampFieldMapper.merge(..) and SizeMappingTests#testThatDisablingWorksWhenMerging but actually never worked, see below). To avoid that _timestamp is overwritten by a default mapping this commit also adds a check to mapping merging if the type is already in the mapping. In this case the default is not applied anymore. (see SimpleTimestampTests#testThatUpdatingMappingShouldNotRemoveTimestampConfiguration) As a side effect, this fixes - overwriting of paramters from the _source field by default mappings (see DefaultSourceMappingTests). - dynamic enabling and disabling of _timestamp and _size () (see SimpleTimestampTests#testThatTimestampCanBeSwitchedOnAndOff and SizeMappingIntegrationTests#testThatTimestampCanBeSwitchedOnAndOff ) Tests: Enable UpdateMappingOnClusterTests#test_doc_valuesInvalidMappingOnUpdate again The missing settings in the mapping for _timestamp, _index and _size caused a the failure: When creating a mapping which has settings other than default and the field disabled, still empty field mappings were built from the type mappers. When creating such a mapping, the mapping source on master and the rest of the cluster can be out of sync for some time: 1. Master creates the index with source _timestamp:{_store:true} mapper classes are in a correct state but source is _timestamp:{} 2. Nodes update mapping and refresh source which then completely misses _timestamp 3. After a while source is refreshed again also on master and the _timestamp:{} vanishes there also. The test UpdateMappingOnCusterTests#test_doc_valuesInvalidMappingOnUpdate failed because the cluster state was sampled from master between 1. and 3. because the randomized testing injected a default mapping with disabled _size and _timestamp fields that have settings which are not default. The test TimestampMappingTests#testThatDisablingFieldMapperDoesNotReturnAnyUselessInfo must be removed because it actualy expected the timestamp to remove parameters when it was disabled. closes #7137
… even if disabled Settings that are not default for _size, _index and _timestamp were only build in toXContent if these fields were actually enabled. _timestamp, _index and _size can be dynamically enabled or disabled. Therfore the settings must be kept, even if the field is disabled. (Dynamic enabling/disabling was intended, see TimestampFieldMapper.merge(..) and SizeMappingTests#testThatDisablingWorksWhenMerging but actually never worked, see below). To avoid that _timestamp is overwritten by a default mapping this commit also adds a check to mapping merging if the type is already in the mapping. In this case the default is not applied anymore. (see SimpleTimestampTests#testThatUpdatingMappingShouldNotRemoveTimestampConfiguration) As a side effect, this fixes - overwriting of paramters from the _source field by default mappings (see DefaultSourceMappingTests). - dynamic enabling and disabling of _timestamp and _size () (see SimpleTimestampTests#testThatTimestampCanBeSwitchedOnAndOff and SizeMappingIntegrationTests#testThatTimestampCanBeSwitchedOnAndOff ) Tests: Enable UpdateMappingOnClusterTests#test_doc_valuesInvalidMappingOnUpdate again The missing settings in the mapping for _timestamp, _index and _size caused a the failure: When creating a mapping which has settings other than default and the field disabled, still empty field mappings were built from the type mappers. When creating such a mapping, the mapping source on master and the rest of the cluster can be out of sync for some time: 1. Master creates the index with source _timestamp:{_store:true} mapper classes are in a correct state but source is _timestamp:{} 2. Nodes update mapping and refresh source which then completely misses _timestamp 3. After a while source is refreshed again also on master and the _timestamp:{} vanishes there also. The test UpdateMappingOnCusterTests#test_doc_valuesInvalidMappingOnUpdate failed because the cluster state was sampled from master between 1. and 3. because the randomized testing injected a default mapping with disabled _size and _timestamp fields that have settings which are not default. The test TimestampMappingTests#testThatDisablingFieldMapperDoesNotReturnAnyUselessInfo must be removed because it actualy expected the timestamp to remove parameters when it was disabled. closes #7137
autsch...wrong issue number in commit message for 9750375 |
Weights can be defined per function like this: ``` "function_score": { "functions": [ { "filter": {}, "FUNCTION": {}, "weight": number } ... ``` If `weight` is given without `FUNCTION` then `weight` behaves like `boost_factor`. This commit deprecates `boost_factor`. The following is valid: ``` POST testidx/_search { "query": { "function_score": { "weight": 2 } } } POST testidx/_search { "query": { "function_score": { "functions": [ { "weight": 2 }, ... ] } } } POST testidx/_search { "query": { "function_score": { "functions": [ { "FUNCTION": {}, "weight": 2 }, ... ] } } } POST testidx/_search { "query": { "function_score": { "functions": [ { "filter": {}, "weight": 2 }, ... ] } } } POST testidx/_search { "query": { "function_score": { "functions": [ { "filter": {}, "FUNCTION": {}, "weight": 2 }, ... ] } } } ``` The following is not valid: ``` POST testidx/_search { "query": { "function_score": { "weight": 2, "FUNCTION(including boost_factor)": 2 } } } POST testidx/_search { "query": { "function_score": { "functions": [ { "weight": 2, "boost_factor": 2 } ] } } } ```` closes #6955 closes #7137
… even if disabled Settings that are not default for _size, _index and _timestamp were only build in toXContent if these fields were actually enabled. _timestamp, _index and _size can be dynamically enabled or disabled. Therfore the settings must be kept, even if the field is disabled. (Dynamic enabling/disabling was intended, see TimestampFieldMapper.merge(..) and SizeMappingTests#testThatDisablingWorksWhenMerging but actually never worked, see below). To avoid that _timestamp is overwritten by a default mapping this commit also adds a check to mapping merging if the type is already in the mapping. In this case the default is not applied anymore. (see SimpleTimestampTests#testThatUpdatingMappingShouldNotRemoveTimestampConfiguration) As a side effect, this fixes - overwriting of paramters from the _source field by default mappings (see DefaultSourceMappingTests). - dynamic enabling and disabling of _timestamp and _size () (see SimpleTimestampTests#testThatTimestampCanBeSwitchedOnAndOff and SizeMappingIntegrationTests#testThatTimestampCanBeSwitchedOnAndOff ) Tests: Enable UpdateMappingOnClusterTests#test_doc_valuesInvalidMappingOnUpdate again The missing settings in the mapping for _timestamp, _index and _size caused a the failure: When creating a mapping which has settings other than default and the field disabled, still empty field mappings were built from the type mappers. When creating such a mapping, the mapping source on master and the rest of the cluster can be out of sync for some time: 1. Master creates the index with source _timestamp:{_store:true} mapper classes are in a correct state but source is _timestamp:{} 2. Nodes update mapping and refresh source which then completely misses _timestamp 3. After a while source is refreshed again also on master and the _timestamp:{} vanishes there also. The test UpdateMappingOnCusterTests#test_doc_valuesInvalidMappingOnUpdate failed because the cluster state was sampled from master between 1. and 3. because the randomized testing injected a default mapping with disabled _size and _timestamp fields that have settings which are not default. The test TimestampMappingTests#testThatDisablingFieldMapperDoesNotReturnAnyUselessInfo must be removed because it actualy expected the timestamp to remove parameters when it was disabled. closes #7137
Weights can be defined per function like this: ``` "function_score": { "functions": [ { "filter": {}, "FUNCTION": {}, "weight": number } ... ``` If `weight` is given without `FUNCTION` then `weight` behaves like `boost_factor`. This commit deprecates `boost_factor`. The following is valid: ``` POST testidx/_search { "query": { "function_score": { "weight": 2 } } } POST testidx/_search { "query": { "function_score": { "functions": [ { "weight": 2 }, ... ] } } } POST testidx/_search { "query": { "function_score": { "functions": [ { "FUNCTION": {}, "weight": 2 }, ... ] } } } POST testidx/_search { "query": { "function_score": { "functions": [ { "filter": {}, "weight": 2 }, ... ] } } } POST testidx/_search { "query": { "function_score": { "functions": [ { "filter": {}, "FUNCTION": {}, "weight": 2 }, ... ] } } } ``` The following is not valid: ``` POST testidx/_search { "query": { "function_score": { "weight": 2, "FUNCTION(including boost_factor)": 2 } } } POST testidx/_search { "query": { "function_score": { "functions": [ { "weight": 2, "boost_factor": 2 } ] } } } ```` closes #6955 closes #7137
… even if disabled Settings that are not default for _size, _index and _timestamp were only build in toXContent if these fields were actually enabled. _timestamp, _index and _size can be dynamically enabled or disabled. Therfore the settings must be kept, even if the field is disabled. (Dynamic enabling/disabling was intended, see TimestampFieldMapper.merge(..) and SizeMappingTests#testThatDisablingWorksWhenMerging but actually never worked, see below). To avoid that _timestamp is overwritten by a default mapping this commit also adds a check to mapping merging if the type is already in the mapping. In this case the default is not applied anymore. (see SimpleTimestampTests#testThatUpdatingMappingShouldNotRemoveTimestampConfiguration) As a side effect, this fixes - overwriting of paramters from the _source field by default mappings (see DefaultSourceMappingTests). - dynamic enabling and disabling of _timestamp and _size () (see SimpleTimestampTests#testThatTimestampCanBeSwitchedOnAndOff and SizeMappingIntegrationTests#testThatTimestampCanBeSwitchedOnAndOff ) Tests: Enable UpdateMappingOnClusterTests#test_doc_valuesInvalidMappingOnUpdate again The missing settings in the mapping for _timestamp, _index and _size caused a the failure: When creating a mapping which has settings other than default and the field disabled, still empty field mappings were built from the type mappers. When creating such a mapping, the mapping source on master and the rest of the cluster can be out of sync for some time: 1. Master creates the index with source _timestamp:{_store:true} mapper classes are in a correct state but source is _timestamp:{} 2. Nodes update mapping and refresh source which then completely misses _timestamp 3. After a while source is refreshed again also on master and the _timestamp:{} vanishes there also. The test UpdateMappingOnCusterTests#test_doc_valuesInvalidMappingOnUpdate failed because the cluster state was sampled from master between 1. and 3. because the randomized testing injected a default mapping with disabled _size and _timestamp fields that have settings which are not default. The test TimestampMappingTests#testThatDisablingFieldMapperDoesNotReturnAnyUselessInfo must be removed because it actualy expected the timestamp to remove parameters when it was disabled. closes elastic#7137
… even if disabled Settings that are not default for _size, _index and _timestamp were only build in toXContent if these fields were actually enabled. _timestamp, _index and _size can be dynamically enabled or disabled. Therfore the settings must be kept, even if the field is disabled. (Dynamic enabling/disabling was intended, see TimestampFieldMapper.merge(..) and SizeMappingTests#testThatDisablingWorksWhenMerging but actually never worked, see below). To avoid that _timestamp is overwritten by a default mapping this commit also adds a check to mapping merging if the type is already in the mapping. In this case the default is not applied anymore. (see SimpleTimestampTests#testThatUpdatingMappingShouldNotRemoveTimestampConfiguration) As a side effect, this fixes - overwriting of paramters from the _source field by default mappings (see DefaultSourceMappingTests). - dynamic enabling and disabling of _timestamp and _size () (see SimpleTimestampTests#testThatTimestampCanBeSwitchedOnAndOff and SizeMappingIntegrationTests#testThatTimestampCanBeSwitchedOnAndOff ) Tests: Enable UpdateMappingOnClusterTests#test_doc_valuesInvalidMappingOnUpdate again The missing settings in the mapping for _timestamp, _index and _size caused a the failure: When creating a mapping which has settings other than default and the field disabled, still empty field mappings were built from the type mappers. When creating such a mapping, the mapping source on master and the rest of the cluster can be out of sync for some time: 1. Master creates the index with source _timestamp:{_store:true} mapper classes are in a correct state but source is _timestamp:{} 2. Nodes update mapping and refresh source which then completely misses _timestamp 3. After a while source is refreshed again also on master and the _timestamp:{} vanishes there also. The test UpdateMappingOnCusterTests#test_doc_valuesInvalidMappingOnUpdate failed because the cluster state was sampled from master between 1. and 3. because the randomized testing injected a default mapping with disabled _size and _timestamp fields that have settings which are not default. The test TimestampMappingTests#testThatDisablingFieldMapperDoesNotReturnAnyUselessInfo must be removed because it actualy expected the timestamp to remove parameters when it was disabled. closes elastic#7137
Weights can be defined per function like this:
If
weight
is given withoutFUNCTION
thenweight
behaves likeboost_factor
.This commit deprecates
boost_factor
.The following is valid:
The following is not valid:
closes #6955