Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed document source and score field mismatch in sorted hybrid queries #1043

Conversation

martin-gaievski
Copy link
Member

@martin-gaievski martin-gaievski commented Dec 24, 2024

Fixed document source and score field mismatch in sorted hybrid queries.
Returned search hits identified with the help of min heap, it's used by sorting functionality to get top X docs. We do keep track of the heap leaf element and updating it when collecting doc ids. In current code we use only one element for this, but in case of hybrid query we do need separate element for each sub-query. With a single element our updates are incorrectly propagated to a results of different sub-queries, cause different types of inconsistency: doc id, score fields, score can be incorrect in final search result.

In this PR I'm changing the min heap leaf element from a single object to an array of objects, one for each sub-query.

Tested on the data set referred in the issue, got correct response where all field have consistent values in _source and sort sections:

{
    "took": 455,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 397,
            "relation": "eq"
        },
        "max_score": 0.8442519,
        "hits": [
            {
                "_index": "templates_prod",
                "_id": "bk2m5np8h467r92kmalxdcvft",
                "_score": null,
                "_source": {
                    "trendingScore": 303.125,
                    "name": "Summer Breeze Design"
                },
                "sort": [
                    303.125
                ]
            },
            {
                "_index": "templates_prod",
                "_id": "bk2m6w28201xn8m3vb6mmdebn",
                "_score": null,
                "_source": {
                    "trendingScore": 303.125,
                    "name": "Winterfrost"
                },
                "sort": [
                    303.125
                ]
            },
            {
                "_index": "templates_prod",
                "_id": "ak2nhbsz900lwxe0xcpir0duc",
                "_score": null,
                "_source": {
                    "trendingScore": 19,
                    "name": "Sunshine Days - Nature Greeting"
                },
                "sort": [
                    19.0
                ]
            },
            {
                "_index": "templates_prod",
                "_id": "ak2n8y4h900csue0x7ij8rwuj",
                "_score": null,
                "_source": {
                    "trendingScore": 13,
                    "name": "Mountain Vista Wedding Suite - Save the Date"
                },
                "sort": [
                    13.0
                ]
            },
            {
                "_index": "templates_prod",
                "_id": "ak2nau2a900e7ue0x2h3nheb9",
                "_score": null,
                "_source": {
                    "trendingScore": 10,
                    "name": "Ocean Waves Thank You"
                },
                "sort": [
                    10.0
                ]
            },
            {
                "_index": "templates_prod",
                "_id": "ak305ic900fr3xe0xjuiin2nt",
                "_score": null,
                "_source": {
                    "trendingScore": 10,
                    "name": "Midnight Dreams - Elegant Celebration"
                },
                "sort": [
                    10.0
                ]
            },
            {
                "_index": "templates_prod",
                "_id": "ak2n2r1t007au00xsn82pvg1",
                "_score": null,
                "_source": {
                    "trendingScore": 6,
                    "name": "Forest Pine - Rustic Wedding Invitation"
                },
                "sort": [
                    6.0
                ]
            },
            {
                "_index": "templates_prod",
                "_id": "ak2n0pxe0060u00xxz0afuaa",
                "_score": null,
                "_source": {
                    "trendingScore": 3.0,
                    "name": "Modern Romance - Classic Black Wedding Invitation"
                },
                "sort": [
                    3.0
                ]
            }
        ]
    }
}

Related Issues

#1044

Check List

  • New functionality includes testing.
  • [ ] New functionality has been documented.
  • [ ] API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.~~
  • [ ] Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@martin-gaievski martin-gaievski force-pushed the fixed_mismatch_in_doc_source_and_score_fields branch from 093f199 to 3559f12 Compare December 25, 2024 00:53
@martin-gaievski martin-gaievski marked this pull request as ready for review December 25, 2024 00:55
@martin-gaievski martin-gaievski added backport 2.x Label will add auto workflow to backport PR to 2.x branch Bug Fixes Changes to a system or product designed to handle a programming bug/glitch labels Dec 25, 2024
…is enabled in hybrid query

Signed-off-by: Martin Gaievski <[email protected]>
Fixed mismatch between document source and score fields when sorting is enabled in hybrid query

Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
@martin-gaievski martin-gaievski force-pushed the fixed_mismatch_in_doc_source_and_score_fields branch from c46ae50 to f3f177c Compare January 3, 2025 21:27
@vibrantvarun
Copy link
Member

LGTM Thanks.

@martin-gaievski martin-gaievski merged commit 030e3f4 into opensearch-project:main Jan 3, 2025
39 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 3, 2025
…es (#1043)

* Fixed mismatch between document source and score fields when sorting is enabled in hybrid query

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit 030e3f4)
martin-gaievski added a commit to martin-gaievski/neural-search that referenced this pull request Jan 3, 2025
…es (opensearch-project#1043)

* Fixed mismatch between document source and score fields when sorting is enabled in hybrid query

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit 030e3f4)
Signed-off-by: Martin Gaievski <[email protected]>
martin-gaievski added a commit that referenced this pull request Jan 3, 2025
…es (#1043) (#1057)

* Fixed mismatch between document source and score fields when sorting is enabled in hybrid query


(cherry picked from commit 030e3f4)

Signed-off-by: Martin Gaievski <[email protected]>
Co-authored-by: Martin Gaievski <[email protected]>
heemin32 pushed a commit to heemin32/neural-search that referenced this pull request Jan 9, 2025
…es (opensearch-project#1043) (opensearch-project#1057)

* Fixed mismatch between document source and score fields when sorting is enabled in hybrid query


(cherry picked from commit 030e3f4)

Signed-off-by: Martin Gaievski <[email protected]>
Co-authored-by: Martin Gaievski <[email protected]>
martin-gaievski added a commit that referenced this pull request Jan 10, 2025
…es (#1043)

* Fixed mismatch between document source and score fields when sorting is enabled in hybrid query

Signed-off-by: Martin Gaievski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch Bug Fixes Changes to a system or product designed to handle a programming bug/glitch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants