Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2] Refactor retrieval #1750

Open
wants to merge 10 commits into
base: v2.0.0
Choose a base branch
from
Open

[v2] Refactor retrieval #1750

wants to merge 10 commits into from

Conversation

Samoed
Copy link
Collaborator

@Samoed Samoed commented Jan 10, 2025

Checklist

  • Run tests locally to make sure nothing is broken using make test.

  • Run the formatter to format the code using make lint.

  • Removed DRESModel

  • Removed work with files in HFDataLoader and left only HF repo

@Samoed Samoed mentioned this pull request Jan 10, 2025
2 tasks
# Conflicts:
#	mteb/abstasks/AbsTaskRetrieval.py
#	mteb/models/salesforce_models.py
@Samoed
Copy link
Collaborator Author

Samoed commented Jan 17, 2025

@KennethEnevoldsen @orionw Can you review PR, please? I'll run tasks to compare results

@Samoed
Copy link
Collaborator Author

Samoed commented Jan 17, 2025

Results for intfloat/multilingual-e5-small

main PR
SCIDOCS 0.13896 0.13896
SciDocsRR 0.78256 0.782559
SciFact 0.677 0.677
AskUbuntuDupQuestions 0.56424 0.56424

Copy link
Contributor

@orionw orionw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind also checking the instruction tasks just to verify? InstructIR is pretty fast for InstructionRetrieval and then something like FollowIR or mFollowIR for reranking instructions.

Looks good though, thanks for a great PR!

mteb/models/salesforce_models.py Outdated Show resolved Hide resolved
@@ -501,49 +512,6 @@ def convert_conv_history_to_query(
return convert_conv_history_to_query(conversations) # type: ignore


class DRESModel:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to finally have this removed!

@Samoed
Copy link
Collaborator Author

Samoed commented Jan 17, 2025

Results for intfloat/multilingual-e5-small

main PR
Core17InstructionRetrieval -0.002604 -0.002598
InstructIR only in v2 0.342919
mFollowIRCrossLingual (mFollowIRCrossLingualInstructionRetrieval) (eng-rus) -0.032008 -0.031988
mFollowIR (mFollowIRInstructionRetrieval) (rus) -0.031852 -0.031852

# Conflicts:
#	mteb/models/salesforce_models.py
@Samoed Samoed force-pushed the refactor_retrieval branch from 170b5e9 to e1f6fbf Compare January 25, 2025 10:56
@Samoed Samoed requested a review from isaac-chung January 25, 2025 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants