Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add term frequency functions to score script query #4988

Merged
merged 8 commits into from
Sep 22, 2023
Merged

Conversation

kolchfa-aws
Copy link
Collaborator

Fixes #4858

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@kolchfa-aws kolchfa-aws self-assigned this Sep 7, 2023
@kolchfa-aws kolchfa-aws added v2.10.0 release-notes PR: Include this PR in the automated release notes labels Sep 7, 2023
@@ -328,5 +328,52 @@ GET blogs/_search
```
{% include copy-curl.html %}

### Term frequency functions

Term frequency functions expose term-level statistics in a script score context. You can use these statistics to implement custom information retrieval and ranking algorithms like query-time multiplicative or additive score boosting by popularity. To apply a term frequency function, call one of the following Painless methods:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Term frequency functions expose term-level statistics in a script score context. You can use these statistics to implement custom information retrieval and ranking algorithms like query-time multiplicative or additive score boosting by popularity. To apply a term frequency function, call one of the following Painless methods:

- `int termFreq(String <field-name>, String <term>)`: Retrieves the term frequency within a field for a specific term.
- `float tf(String <field-name>, String <term>)`: Calculates the term frequency/inverse document frequency (TF/IDF) for a specific term within a field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some setup for using tf, without them you will get errors like this:

{
  "error": {
    "root_cause": [
      {
        "type": "unsupported_operation_exception",
        "reason": "unsupported_operation_exception: requires a TFIDFSimilarity (such as ClassicSimilarity)"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "test",
        "node": "sQkgvgQWRT2pfI4leTC8lg",
        "reason": {
          "type": "script_exception",
          "reason": "runtime error",
          "script_stack": [
            "org.opensearch.ExceptionsHelper.convertToOpenSearchException(ExceptionsHelper.java:86)",
            "org.opensearch.script.ScoreScriptUtils$TF.tf(ScoreScriptUtils.java:113)",
            "tf(params.field, params.term)",
            "                       ^---- HERE"
          ],
          "script": "tf(params.field, params.term)",
          "lang": "painless",
          "position": {
            "offset": 23,
            "start": 0,
            "end": 29
          },
          "caused_by": {
            "type": "exception",
            "reason": "java.lang.UnsupportedOperationException: requires a TFIDFSimilarity (such as ClassicSimilarity)",
            "caused_by": {
              "type": "unsupported_operation_exception",
              "reason": "unsupported_operation_exception: requires a TFIDFSimilarity (such as ClassicSimilarity)"
            }
          }
        }
      }
    ],
    "caused_by": {
      "type": "unsupported_operation_exception",
      "reason": "unsupported_operation_exception: requires a TFIDFSimilarity (such as ClassicSimilarity)"
    }
  },
  "status": 400
}

Here's some suggestions.

  1. Setup the Similarity Model:
    You need to set the similarity model of the field you're working with Ex: ClassicSimilarity, TFIDFSimilarity). Make sure the model is supported under current OpenSearch Version

To do this, you can update the mappings of your index:

PUT /test/_mapping
{
  "properties": {
    "<field_name>": {
      "type": "text",
      "similarity": "classic"
    }
  }
}
  1. Reindex your data for existing indices

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tried to set similarity to "classic", I got an error that suggests to use "BM25" as similarity:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "The [classic] similarity may not be used anymore. Please use the [BM25] similarity or build a custom [scripted] similarity instead."
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "The [classic] similarity may not be used anymore. Please use the [BM25] similarity or build a custom [scripted] similarity instead."
  },
  "status": 400
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, would you create an issue in OpenSearch repo or I can do it too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can create an issue

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The OpenSearch issue is opensearch-project/OpenSearch#9958

Comment on lines +364 to +368
"params": {
"fields": ["title", "description"],
"term": "ai",
"multiplier": 2,
"default_value": 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit picking: it's more consistent use the same doc schema for all examples in this page

Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
@kolchfa-aws kolchfa-aws added 4 - Doc review PR: Doc review in progress and removed 3 - Tech review PR: Tech review in progress labels Sep 11, 2023
Copy link
Contributor

@cwillum cwillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just one comment from me.

_query-dsl/specialized/script-score.md Outdated Show resolved Hide resolved
Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
@kolchfa-aws kolchfa-aws added 5 - Editorial review PR: Editorial review in progress and removed 4 - Doc review PR: Doc review in progress labels Sep 11, 2023
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Just a couple of commas. Otherwise, LGTM!

_query-dsl/specialized/script-score.md Outdated Show resolved Hide resolved
_query-dsl/specialized/script-score.md Outdated Show resolved Hide resolved
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
@kolchfa-aws kolchfa-aws added 6 - Done but waiting to merge PR: The work is done and ready to merge 3 - Done Issue is done/complete and removed 5 - Editorial review PR: Editorial review in progress labels Sep 11, 2023
@noCharger
Copy link
Contributor

let's remove the docs for tf(). opensearch-project/OpenSearch#9995

Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
@noCharger
Copy link
Contributor

Hi team, anything can I do to get this PR merged?

@kolchfa-aws
Copy link
Collaborator Author

@noCharger The PR is in the "Done but ready to merge" status. This means we'll merge it by the release date. So you can mark it done on your side.

@hdhalter hdhalter removed the 3 - Done Issue is done/complete label Sep 13, 2023
@kolchfa-aws kolchfa-aws merged commit 59ea279 into main Sep 22, 2023
harshavamsi pushed a commit to harshavamsi/documentation-website that referenced this pull request Oct 31, 2023
…t#4988)

* Add term frequency functions to score script query

Signed-off-by: Fanit Kolchina <[email protected]>

* Add tf setup and rewording

Signed-off-by: Fanit Kolchina <[email protected]>

* typo fix

Signed-off-by: Fanit Kolchina <[email protected]>

* changed heading level

Signed-off-by: Fanit Kolchina <[email protected]>

* Update _query-dsl/specialized/script-score.md

Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Remove tf from documentation

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Chris Moore <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
vagimeli pushed a commit that referenced this pull request Dec 21, 2023
* Add term frequency functions to score script query

Signed-off-by: Fanit Kolchina <[email protected]>

* Add tf setup and rewording

Signed-off-by: Fanit Kolchina <[email protected]>

* typo fix

Signed-off-by: Fanit Kolchina <[email protected]>

* changed heading level

Signed-off-by: Fanit Kolchina <[email protected]>

* Update _query-dsl/specialized/script-score.md

Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Remove tf from documentation

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Chris Moore <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
@Naarcha-AWS Naarcha-AWS deleted the term-freq branch March 28, 2024 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6 - Done but waiting to merge PR: The work is done and ready to merge release-notes PR: Include this PR in the automated release notes v2.10.0
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[DOC] Document new doc/term frequency functions in Painless score scripts
5 participants