Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding draft for optimizing hybrid search blog post. #3503

Conversation

wrigleyDan
Copy link
Contributor

Description

This PR adds a blog post draft as proposed in #3454 and as suggested by @pajuric

Issues Resolved

#3454

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.

@pajuric
Copy link

pajuric commented Dec 17, 2024

@wrigleyDan - Thanks, Dan. I am awaiting feedback from Stavros, and then will submit for final editorial review by our team editor.

@pajuric pajuric self-assigned this Dec 17, 2024
@wrigleyDan
Copy link
Contributor Author

@pajuric Thanks!
Stavros did leave one comment on the Google Doc I shared earlier - so maybe there's no need to wait for feedback and it is already there. However, I don't know if there is yet more to come. The one comment is adressed in this PR.
Just want to make sure that we're not waiting unnecessary.

@pajuric
Copy link

pajuric commented Dec 20, 2024

@wrigleyDan - I added Stavros' final feedback in the form of comments. If you could incorporate those and let me know, we'll get this through reviews on Monday.

@wrigleyDan
Copy link
Contributor Author

@pajuric Thanks for adding the comments. They should now all be integrated together with most of the reviewdog check feedback I got through the automatic checks - those that I saw applicable.

Technically I'm out until Jan 7 but I have an eye on incoming Github emails, so that we can get this done sooner rather than later. So let me know if there's anything else that needs to be done and I'll try to see to it as soon as possible.

Thanks!

Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wrigleyDan Editorial review complete. Please see my comments and changes and let me know if you have any questions. Thanks!

Cc: @pajuric

_posts/2024-12-xx-hybrid-search-optimization.md Outdated Show resolved Hide resolved
_posts/2024-12-xx-hybrid-search-optimization.md Outdated Show resolved Hide resolved
_posts/2024-12-xx-hybrid-search-optimization.md Outdated Show resolved Hide resolved
_posts/2024-12-xx-hybrid-search-optimization.md Outdated Show resolved Hide resolved
_posts/2024-12-xx-hybrid-search-optimization.md Outdated Show resolved Hide resolved
_posts/2024-12-xx-hybrid-search-optimization.md Outdated Show resolved Hide resolved

The currently planned next steps include replicating the approach with a dataset that has higher judgment coverage and covers a different domain to see its generalizability.

Optimizing hybrid search typically is not the first step in search result quality optimization. Optimizing lexical search results first is especially important as the lexical search query is part of the hybrid search query. Bayesian optimization is an efficient technique to efficiently identify the best set of fields and field weights, sometimes also referred to as learning to boost.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Optimizing hybrid search typically is not the first step in search result quality optimization. Optimizing lexical search results first is especially important as the lexical search query is part of the hybrid search query. Bayesian optimization is an efficient technique to efficiently identify the best set of fields and field weights, sometimes also referred to as learning to boost.
Optimizing hybrid search is not typically the first step in search result quality optimization. Optimizing lexical search results first is especially important because the lexical search query is part of the hybrid search query. Bayesian optimization is an efficient technique for efficiently identifying the best set of fields and field weights, sometimes also referred to as "learning to boost."


Optimizing hybrid search typically is not the first step in search result quality optimization. Optimizing lexical search results first is especially important as the lexical search query is part of the hybrid search query. Bayesian optimization is an efficient technique to efficiently identify the best set of fields and field weights, sometimes also referred to as learning to boost.

The straightforward approach of trying out 66 different combinations can be created more elegantly by applying a technique like Bayesian optimization as well. In particular for large search indexes and a large amount of queries we expect this to result in a performance improvement.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The straightforward approach of trying out 66 different combinations can be created more elegantly by applying a technique like Bayesian optimization as well. In particular for large search indexes and a large amount of queries we expect this to result in a performance improvement.
The straightforward approach of trying out 66 different combinations can be performed more elegantly by applying a technique like Bayesian optimization as well. In particular, we expect this to result in a performance improvement for large search indexes and large numbers of queries.


The straightforward approach of trying out 66 different combinations can be created more elegantly by applying a technique like Bayesian optimization as well. In particular for large search indexes and a large amount of queries we expect this to result in a performance improvement.

Reciprocal rank fusion is another way of combining lexical search and neural search, currently under active development:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Reciprocal rank fusion is another way of combining lexical search and neural search, currently under active development:
Reciprocal rank fusion, currently under active development, is another way of combining lexical search and neural search:

* [https://github.com/opensearch-project/neural-search/issues/865](https://github.com/opensearch-project/neural-search/issues/865)
* [https://github.com/opensearch-project/neural-search/issues/659](https://github.com/opensearch-project/neural-search/issues/659)

We also plan to include this technique, as well to identify the best way of running hybrid search dynamically per query.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We also plan to include this technique, as well to identify the best way of running hybrid search dynamically per query.
We also plan to include this technique and to identify the best way of running hybrid search dynamically per query.

wrigleyDan and others added 3 commits December 23, 2024 17:07
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Daniel Wrigley <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Daniel Wrigley <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Daniel Wrigley <[email protected]>
wrigleyDan and others added 4 commits December 23, 2024 17:08
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Daniel Wrigley <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Daniel Wrigley <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Daniel Wrigley <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Daniel Wrigley <[email protected]>
title: "Optimizing hybrid search in OpenSearch"
authors:
- dwrigley
date: 2024-12-xx
Copy link

@pajuric pajuric Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update publish date to 2024-12-30

@pajuric
Copy link

pajuric commented Dec 23, 2024

@nateynateynate - Please publish this on 12/30. Please let me know if you need a second maintainer to help push it. I can grab someone.

Updates as per the latest review

Signed-off-by: Daniel Wrigley <[email protected]>
change date, last change from editorial review

Signed-off-by: Daniel Wrigley <[email protected]>
add feedback link to OpenSearch forum

Signed-off-by: Daniel Wrigley <[email protected]>
@wrigleyDan
Copy link
Contributor Author

@natebower Thanks for the review, I included the suggestions to my best knowledge. Let me know if I missed any.

@natebower
Copy link
Collaborator

@natebower Thanks for the review, I included the suggestions to my best knowledge. Let me know if I missed any.

@wrigleyDan A couple last suggestions on rewrites resulting from my suggestions. Otherwise, it looks like all of my comments were addressed, so should be LGTM from my end 😄

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Daniel Wrigley <[email protected]>
@wrigleyDan
Copy link
Contributor Author

@wrigleyDan A couple last suggestions on rewrites resulting from my suggestions. Otherwise, it looks like all of my comments were addressed, so should be LGTM from my end 😄

Done, thanks again and Happy Holidays :)

@pajuric
Copy link

pajuric commented Dec 23, 2024

Thanks, Dan. I Have this scheduled for 12/30 to publish. Happy Holidays!

Copy link
Member

@krisfreedain krisfreedain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick change requested for the filename @wrigleyDan

_posts/2024-12-xx-hybrid-search-optimization.md Outdated Show resolved Hide resolved
Copy link
Member

@krisfreedain krisfreedain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @wrigleyDan

@krisfreedain krisfreedain merged commit b233d34 into opensearch-project:main Dec 30, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants