Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors with new MS MARCO v2.1 and BEIR regressions #2480

Closed
lintool opened this issue Apr 30, 2024 · 6 comments
Closed

Errors with new MS MARCO v2.1 and BEIR regressions #2480

lintool opened this issue Apr 30, 2024 · 6 comments

Comments

@lintool
Copy link
Member

lintool commented Apr 30, 2024

Running:

java -cp `ls target/*-fatjar.jar` io.anserini.reproduce.RunMsMarco -v 2

Getting some errors:

# Running condition "bm25-segmented": BM25 v2.1 Segmented Corpus (k1=0.9, b=0.4) 

  - topic_key: msmarco-v2-doc-dev

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc-segmented -topics msmarco-v2-doc-dev -output runs/run.msmarco-v2.1-doc.bm25-segmented.msmarco-v2-doc-dev.txt -hits 1000 -bm25 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
    Run successfully completed!

     MRR@10: 0.0000 [FAIL] expected 0.1973

  - topic_key: msmarco-v2-doc-dev2

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc-segmented -topics msmarco-v2-doc-dev2 -output runs/run.msmarco-v2.1-doc.bm25-segmented.msmarco-v2-doc-dev2.txt -hits 1000 -bm25 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
    Run successfully completed!

     MRR@10: 0.0000 [FAIL] expected 0.2000

  - topic_key: trec2021-dl

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc-segmented -topics trec2021-dl -output runs/run.msmarco-v2.1-doc.bm25-segmented.trec2021-dl.txt -hits 1000 -bm25 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
    Run successfully completed!

        MAP: 0.0000 [FAIL] expected 0.2609

     MRR@10: 0.0000 [FAIL] expected 0.9026

    nDCG@10: 0.0000 [FAIL] expected 0.5778

      R@100: 0.0000 [FAIL] expected 0.3811

       R@1K: 0.0000 [FAIL] expected 0.7115

  - topic_key: trec2022-dl

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc-segmented -topics trec2022-dl -output runs/run.msmarco-v2.1-doc.bm25-segmented.trec2022-dl.txt -hits 1000 -bm25 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
    Run successfully completed!

        MAP: 0.0000 [FAIL] expected 0.1079

     MRR@10: 0.0000 [FAIL] expected 0.7213

    nDCG@10: 0.0000 [FAIL] expected 0.3576

      R@100: 0.0000 [FAIL] expected 0.2330

       R@1K: 0.0000 [FAIL] expected 0.4790

  - topic_key: trec2023-dl

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc-segmented -topics trec2023-dl -output runs/run.msmarco-v2.1-doc.bm25-segmented.trec2023-dl.txt -hits 1000 -bm25 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
    Run successfully completed!

        MAP: 0.0000 [FAIL] expected 0.1391

     MRR@10: 0.0000 [FAIL] expected 0.6519

    nDCG@10: 0.0000 [FAIL] expected 0.3356

      R@100: 0.0000 [FAIL] expected 0.3049

       R@1K: 0.0000 [FAIL] expected 0.5852

@wu-ming233 can you please take a look?

@wu-ming233
Copy link
Member

Cannot seem to reproduce the issue locally at the moment...will try again after clearing cache.

@lintool
Copy link
Member Author

lintool commented Apr 30, 2024

Similarly, getting:

# Running condition "Dp": bge-base-en-v1.5 cached queries 

  - topic_key: trec-covid

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index beir-v1.0.0-trec-covid.bge-base-en-v1.5 -topics beir-trec-covid.bge-base-en-v1.5 -output runs/run.beir.Dp.trec-covid.txt -threads 16 -efSearch 1000 -removeQuery
    Run successfully completed!

Evaluation command failed for metric: nDCG@10

  - topic_key: bioasq

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index beir-v1.0.0-bioasq.bge-base-en-v1.5 -topics beir-bioasq.bge-base-en-v1.5 -output runs/run.beir.Dp.bioasq.txt -threads 16 -efSearch 1000 -removeQuery
    Run successfully completed!

Evaluation command failed for metric: nDCG@10

...

@lintool lintool changed the title MS MARCO v2.1 regressions Errors with new MS MARCO v2.1 and BEIR regressions Apr 30, 2024
@lintool
Copy link
Member Author

lintool commented Apr 30, 2024

More debugging trace:

# Running condition "Dp": bge-base-en-v1.5 cached queries 

  - topic_key: trec-covid

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index beir-v1.0.0-trec-covid.bge-base-en-v1.5 -topics beir-trec-covid.bge-base-en-v1.5 -output runs/run.beir.Dp.trec-covid.txt -threads 16 -efSearch 1000 -removeQuery
    Run successfully completed!

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -m ndcg_cut.10 beir-v1.0.0-trec-covid.test runs/run.beir.Dp.trec-covid.txt
Evaluation command failed for metric: nDCG@10

The issue is here:

% java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index beir-v1.0.0-trec-covid.bge-base-en-v1.5 -topics beir-trec-covid.bge-base-en-v1.5 -output runs/run.beir.Dp.trec-covid.txt -threads 16 -efSearch 1000 -removeQuery
Error: "-efSearch" is not a valid option. For help, use "-options" to print out information about options.

@wu-ming233 can you please fix?

@lintool
Copy link
Member Author

lintool commented Apr 30, 2024

Okay, this is weird. Adding debugging information and commenting out parts of the yaml:

# Running condition "bm25": BM25 v2.1 (k1=0.9, b=0.4) 

  - topic_key: trec2021-dl

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc -topics trec2021-dl -output runs/run.msmarco-v2.1-doc.bm25.trec2021-dl.txt -hits 1000 -bm25
    Run successfully completed!

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -M 100 -m map dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25.trec2021-dl.txt
        MAP: 0.2281 [OK]

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -M 100 -m recip_rank dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25.trec2021-dl.txt
     MRR@10: 0.8466 [OK]

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -m ndcg_cut.10 dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25.trec2021-dl.txt
    nDCG@10: 0.5183 [OK]

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -m recall.100 dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25.trec2021-dl.txt
      R@100: 0.3502 [OK]

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -m recall.1000 dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25.trec2021-dl.txt
       R@1K: 0.6915 [OK]

# Running condition "bm25-segmented": BM25 v2.1 Segmented Corpus (k1=0.9, b=0.4) 

  - topic_key: trec2021-dl

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc-segmented -topics trec2021-dl -output runs/run.msmarco-v2.1-doc.bm25-segmented.trec2021-dl.txt -hits 1000 -bm25 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
    Run successfully completed!

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -M 100 -m map dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25-segmented.trec2021-dl.txt
        MAP: 0.0000 [FAIL] expected 0.2609

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -M 100 -m recip_rank dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25-segmented.trec2021-dl.txt
     MRR@10: 0.0000 [FAIL] expected 0.9026

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -m ndcg_cut.10 dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25-segmented.trec2021-dl.txt
    nDCG@10: 0.0000 [FAIL] expected 0.5778

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -m recall.100 dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25-segmented.trec2021-dl.txt
      R@100: 0.0000 [FAIL] expected 0.3811

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -m recall.1000 dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25-segmented.trec2021-dl.txt
       R@1K: 0.0000 [FAIL] expected 0.7115

But when I copy/paste the commands separately, seems to work fine... 🤷‍♂️

@wu-ming233
Copy link
Member

Fixed the typo that caused this evaluation command to fail for bge-base-en-v1.5 cached queries:

% java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index beir-v1.0.0-trec-covid.bge-base-en-v1.5 -topics beir-trec-covid.bge-base-en-v1.5 -output runs/run.beir.Dp.trec-covid.txt -threads 16 -efSearch 1000 -removeQuery
Error: "-efSearch" is not a valid option. For help, use "-options" to print out information about options.

Still looking into the issue where the evaluation commands give actual metric 0 and failing the checks. I still cannot always reproduce the issue; currently suspecting it might have something to do with user downloading the indexes. I will keep investigating.

Sorry that I am taking some time with this fix :( my local compute takes very long to run the regressions. If this is urgent, I will look for more powerful computes.

@lintool
Copy link
Member Author

lintool commented Apr 30, 2024

Fixed the typo that caused this evaluation command to fail for bge-base-en-v1.5 cached queries:

% java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index beir-v1.0.0-trec-covid.bge-base-en-v1.5 -topics beir-trec-covid.bge-base-en-v1.5 -output runs/run.beir.Dp.trec-covid.txt -threads 16 -efSearch 1000 -removeQuery
Error: "-efSearch" is not a valid option. For help, use "-options" to print out information about options.

Thanks!

Still looking into the issue where the evaluation commands give actual metric 0 and failing the checks. I still cannot always reproduce the issue; currently suspecting it might have something to do with user downloading the indexes. I will keep investigating.

I don't think it's downloading... perhaps some type of process management issue from Java? Because when I run the commands myself, it seems to work fine. Maybe some underlying race condition?

Sorry that I am taking some time with this fix :( my local compute takes very long to run the regressions. If this is urgent, I will look for more powerful computes.

No worries, this isn't absolutely critical to the operation of the toolkit... (yet!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants