-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add term frequency functions to score script query #4988
Conversation
Signed-off-by: Fanit Kolchina <[email protected]>
@@ -328,5 +328,52 @@ GET blogs/_search | |||
``` | |||
{% include copy-curl.html %} | |||
|
|||
### Term frequency functions | |||
|
|||
Term frequency functions expose term-level statistics in a script score context. You can use these statistics to implement custom information retrieval and ranking algorithms like query-time multiplicative or additive score boosting by popularity. To apply a term frequency function, call one of the following Painless methods: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's script context
https://github.com/noCharger/OpenSearch/blob/main/server/src/main/java/org/opensearch/script/ScriptContext.java and each script has their own context instance https://github.com/noCharger/OpenSearch/blob/main/server/src/main/java/org/opensearch/script/ScoreScript.java#L287. I would say expose them in score script source
.
Term frequency functions expose term-level statistics in a script score context. You can use these statistics to implement custom information retrieval and ranking algorithms like query-time multiplicative or additive score boosting by popularity. To apply a term frequency function, call one of the following Painless methods: | ||
|
||
- `int termFreq(String <field-name>, String <term>)`: Retrieves the term frequency within a field for a specific term. | ||
- `float tf(String <field-name>, String <term>)`: Calculates the term frequency/inverse document frequency (TF/IDF) for a specific term within a field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some setup for using tf
, without them you will get errors like this:
{
"error": {
"root_cause": [
{
"type": "unsupported_operation_exception",
"reason": "unsupported_operation_exception: requires a TFIDFSimilarity (such as ClassicSimilarity)"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "test",
"node": "sQkgvgQWRT2pfI4leTC8lg",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.opensearch.ExceptionsHelper.convertToOpenSearchException(ExceptionsHelper.java:86)",
"org.opensearch.script.ScoreScriptUtils$TF.tf(ScoreScriptUtils.java:113)",
"tf(params.field, params.term)",
" ^---- HERE"
],
"script": "tf(params.field, params.term)",
"lang": "painless",
"position": {
"offset": 23,
"start": 0,
"end": 29
},
"caused_by": {
"type": "exception",
"reason": "java.lang.UnsupportedOperationException: requires a TFIDFSimilarity (such as ClassicSimilarity)",
"caused_by": {
"type": "unsupported_operation_exception",
"reason": "unsupported_operation_exception: requires a TFIDFSimilarity (such as ClassicSimilarity)"
}
}
}
}
],
"caused_by": {
"type": "unsupported_operation_exception",
"reason": "unsupported_operation_exception: requires a TFIDFSimilarity (such as ClassicSimilarity)"
}
},
"status": 400
}
Here's some suggestions.
- Setup the Similarity Model:
You need to set the similarity model of the field you're working with Ex: ClassicSimilarity, TFIDFSimilarity). Make sure the model is supported under current OpenSearch Version
To do this, you can update the mappings of your index:
PUT /test/_mapping
{
"properties": {
"<field_name>": {
"type": "text",
"similarity": "classic"
}
}
}
- Reindex your data for existing indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I tried to set similarity to "classic", I got an error that suggests to use "BM25" as similarity:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "The [classic] similarity may not be used anymore. Please use the [BM25] similarity or build a custom [scripted] similarity instead."
}
],
"type": "illegal_argument_exception",
"reason": "The [classic] similarity may not be used anymore. Please use the [BM25] similarity or build a custom [scripted] similarity instead."
},
"status": 400
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, would you create an issue in OpenSearch repo or I can do it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can create an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. The OpenSearch issue is opensearch-project/OpenSearch#9958
"params": { | ||
"fields": ["title", "description"], | ||
"term": "ai", | ||
"multiplier": 2, | ||
"default_value": 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit picking: it's more consistent use the same doc schema for all examples in this page
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just one comment from me.
Co-authored-by: Chris Moore <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws Just a couple of commas. Otherwise, LGTM!
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>
let's remove the docs for tf(). opensearch-project/OpenSearch#9995 |
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Hi team, anything can I do to get this PR merged? |
@noCharger The PR is in the "Done but ready to merge" status. This means we'll merge it by the release date. So you can mark it done on your side. |
…t#4988) * Add term frequency functions to score script query Signed-off-by: Fanit Kolchina <[email protected]> * Add tf setup and rewording Signed-off-by: Fanit Kolchina <[email protected]> * typo fix Signed-off-by: Fanit Kolchina <[email protected]> * changed heading level Signed-off-by: Fanit Kolchina <[email protected]> * Update _query-dsl/specialized/script-score.md Co-authored-by: Chris Moore <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> * Remove tf from documentation Signed-off-by: Fanit Kolchina <[email protected]> --------- Signed-off-by: Fanit Kolchina <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> Co-authored-by: Chris Moore <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
* Add term frequency functions to score script query Signed-off-by: Fanit Kolchina <[email protected]> * Add tf setup and rewording Signed-off-by: Fanit Kolchina <[email protected]> * typo fix Signed-off-by: Fanit Kolchina <[email protected]> * changed heading level Signed-off-by: Fanit Kolchina <[email protected]> * Update _query-dsl/specialized/script-score.md Co-authored-by: Chris Moore <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> * Remove tf from documentation Signed-off-by: Fanit Kolchina <[email protected]> --------- Signed-off-by: Fanit Kolchina <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> Co-authored-by: Chris Moore <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
Fixes #4858
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.