Errors with new MS MARCO v2.1 and BEIR regressions #2480

lintool · 2024-04-30T02:37:17Z

Running:

java -cp `ls target/*-fatjar.jar` io.anserini.reproduce.RunMsMarco -v 2

Getting some errors:

# Running condition "bm25-segmented": BM25 v2.1 Segmented Corpus (k1=0.9, b=0.4) 

  - topic_key: msmarco-v2-doc-dev

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc-segmented -topics msmarco-v2-doc-dev -output runs/run.msmarco-v2.1-doc.bm25-segmented.msmarco-v2-doc-dev.txt -hits 1000 -bm25 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
    Run successfully completed!

     MRR@10: 0.0000 [FAIL] expected 0.1973

  - topic_key: msmarco-v2-doc-dev2

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc-segmented -topics msmarco-v2-doc-dev2 -output runs/run.msmarco-v2.1-doc.bm25-segmented.msmarco-v2-doc-dev2.txt -hits 1000 -bm25 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
    Run successfully completed!

     MRR@10: 0.0000 [FAIL] expected 0.2000

  - topic_key: trec2021-dl

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc-segmented -topics trec2021-dl -output runs/run.msmarco-v2.1-doc.bm25-segmented.trec2021-dl.txt -hits 1000 -bm25 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
    Run successfully completed!

        MAP: 0.0000 [FAIL] expected 0.2609

     MRR@10: 0.0000 [FAIL] expected 0.9026

    nDCG@10: 0.0000 [FAIL] expected 0.5778

      R@100: 0.0000 [FAIL] expected 0.3811

       R@1K: 0.0000 [FAIL] expected 0.7115

  - topic_key: trec2022-dl

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc-segmented -topics trec2022-dl -output runs/run.msmarco-v2.1-doc.bm25-segmented.trec2022-dl.txt -hits 1000 -bm25 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
    Run successfully completed!

        MAP: 0.0000 [FAIL] expected 0.1079

     MRR@10: 0.0000 [FAIL] expected 0.7213

    nDCG@10: 0.0000 [FAIL] expected 0.3576

      R@100: 0.0000 [FAIL] expected 0.2330

       R@1K: 0.0000 [FAIL] expected 0.4790

  - topic_key: trec2023-dl

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc-segmented -topics trec2023-dl -output runs/run.msmarco-v2.1-doc.bm25-segmented.trec2023-dl.txt -hits 1000 -bm25 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
    Run successfully completed!

        MAP: 0.0000 [FAIL] expected 0.1391

     MRR@10: 0.0000 [FAIL] expected 0.6519

    nDCG@10: 0.0000 [FAIL] expected 0.3356

      R@100: 0.0000 [FAIL] expected 0.3049

       R@1K: 0.0000 [FAIL] expected 0.5852

@wu-ming233 can you please take a look?

The text was updated successfully, but these errors were encountered:

wu-ming233 · 2024-04-30T03:26:41Z

Cannot seem to reproduce the issue locally at the moment...will try again after clearing cache.

lintool · 2024-04-30T11:12:03Z

Similarly, getting:

# Running condition "Dp": bge-base-en-v1.5 cached queries 

  - topic_key: trec-covid

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index beir-v1.0.0-trec-covid.bge-base-en-v1.5 -topics beir-trec-covid.bge-base-en-v1.5 -output runs/run.beir.Dp.trec-covid.txt -threads 16 -efSearch 1000 -removeQuery
    Run successfully completed!

Evaluation command failed for metric: nDCG@10

  - topic_key: bioasq

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index beir-v1.0.0-bioasq.bge-base-en-v1.5 -topics beir-bioasq.bge-base-en-v1.5 -output runs/run.beir.Dp.bioasq.txt -threads 16 -efSearch 1000 -removeQuery
    Run successfully completed!

Evaluation command failed for metric: nDCG@10

...

lintool · 2024-04-30T11:22:48Z

More debugging trace:

# Running condition "Dp": bge-base-en-v1.5 cached queries 

  - topic_key: trec-covid

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index beir-v1.0.0-trec-covid.bge-base-en-v1.5 -topics beir-trec-covid.bge-base-en-v1.5 -output runs/run.beir.Dp.trec-covid.txt -threads 16 -efSearch 1000 -removeQuery
    Run successfully completed!

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -m ndcg_cut.10 beir-v1.0.0-trec-covid.test runs/run.beir.Dp.trec-covid.txt
Evaluation command failed for metric: nDCG@10

The issue is here:

% java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index beir-v1.0.0-trec-covid.bge-base-en-v1.5 -topics beir-trec-covid.bge-base-en-v1.5 -output runs/run.beir.Dp.trec-covid.txt -threads 16 -efSearch 1000 -removeQuery
Error: "-efSearch" is not a valid option. For help, use "-options" to print out information about options.

@wu-ming233 can you please fix?

lintool · 2024-04-30T11:36:14Z

Okay, this is weird. Adding debugging information and commenting out parts of the yaml:

# Running condition "bm25": BM25 v2.1 (k1=0.9, b=0.4) 

  - topic_key: trec2021-dl

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc -topics trec2021-dl -output runs/run.msmarco-v2.1-doc.bm25.trec2021-dl.txt -hits 1000 -bm25
    Run successfully completed!

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -M 100 -m map dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25.trec2021-dl.txt
        MAP: 0.2281 [OK]

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -M 100 -m recip_rank dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25.trec2021-dl.txt
     MRR@10: 0.8466 [OK]

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -m ndcg_cut.10 dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25.trec2021-dl.txt
    nDCG@10: 0.5183 [OK]

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -m recall.100 dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25.trec2021-dl.txt
      R@100: 0.3502 [OK]

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -m recall.1000 dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25.trec2021-dl.txt
       R@1K: 0.6915 [OK]

# Running condition "bm25-segmented": BM25 v2.1 Segmented Corpus (k1=0.9, b=0.4) 

  - topic_key: trec2021-dl

    Running retrieval command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index msmarco-v2.1-doc-segmented -topics trec2021-dl -output runs/run.msmarco-v2.1-doc.bm25-segmented.trec2021-dl.txt -hits 1000 -bm25 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
    Run successfully completed!

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -M 100 -m map dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25-segmented.trec2021-dl.txt
        MAP: 0.0000 [FAIL] expected 0.2609

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -M 100 -m recip_rank dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25-segmented.trec2021-dl.txt
     MRR@10: 0.0000 [FAIL] expected 0.9026

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -m ndcg_cut.10 dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25-segmented.trec2021-dl.txt
    nDCG@10: 0.0000 [FAIL] expected 0.5778

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -m recall.100 dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25-segmented.trec2021-dl.txt
      R@100: 0.0000 [FAIL] expected 0.3811

    Running evaluation command: java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar trec_eval -c -m recall.1000 dl21-doc-msmarco-v2.1 runs/run.msmarco-v2.1-doc.bm25-segmented.trec2021-dl.txt
       R@1K: 0.0000 [FAIL] expected 0.7115

But when I copy/paste the commands separately, seems to work fine... 🤷‍♂️

wu-ming233 · 2024-04-30T17:32:20Z

Fixed the typo that caused this evaluation command to fail for bge-base-en-v1.5 cached queries:

% java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index beir-v1.0.0-trec-covid.bge-base-en-v1.5 -topics beir-trec-covid.bge-base-en-v1.5 -output runs/run.beir.Dp.trec-covid.txt -threads 16 -efSearch 1000 -removeQuery
Error: "-efSearch" is not a valid option. For help, use "-options" to print out information about options.

Still looking into the issue where the evaluation commands give actual metric 0 and failing the checks. I still cannot always reproduce the issue; currently suspecting it might have something to do with user downloading the indexes. I will keep investigating.

Sorry that I am taking some time with this fix :( my local compute takes very long to run the regressions. If this is urgent, I will look for more powerful computes.

lintool · 2024-04-30T17:36:17Z

Fixed the typo that caused this evaluation command to fail for bge-base-en-v1.5 cached queries:

% java -cp /Users/jimmylin/workspace/anserini/target/anserini-0.36.1-SNAPSHOT-fatjar.jar io.anserini.search.SearchCollection -threads 16 -index beir-v1.0.0-trec-covid.bge-base-en-v1.5 -topics beir-trec-covid.bge-base-en-v1.5 -output runs/run.beir.Dp.trec-covid.txt -threads 16 -efSearch 1000 -removeQuery
Error: "-efSearch" is not a valid option. For help, use "-options" to print out information about options.

Thanks!

Still looking into the issue where the evaluation commands give actual metric 0 and failing the checks. I still cannot always reproduce the issue; currently suspecting it might have something to do with user downloading the indexes. I will keep investigating.

I don't think it's downloading... perhaps some type of process management issue from Java? Because when I run the commands myself, it seems to work fine. Maybe some underlying race condition?

Sorry that I am taking some time with this fix :( my local compute takes very long to run the regressions. If this is urgent, I will look for more powerful computes.

No worries, this isn't absolutely critical to the operation of the toolkit... (yet!)

lintool mentioned this issue Apr 30, 2024

Align RunMsMarco with Fatjar regression instructions #2473

Closed

lintool changed the title ~~MS MARCO v2.1 regressions~~ Errors with new MS MARCO v2.1 and BEIR regressions Apr 30, 2024

lintool mentioned this issue May 2, 2024

Fix MS MARCO V2.1 repo experiments on segmented doc collection #2483

Merged

wu-ming233 mentioned this issue May 2, 2024

Fixed BEIR regression typo #2484

Merged

lintool closed this as completed May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors with new MS MARCO v2.1 and BEIR regressions #2480

Errors with new MS MARCO v2.1 and BEIR regressions #2480

lintool commented Apr 30, 2024

wu-ming233 commented Apr 30, 2024

lintool commented Apr 30, 2024

lintool commented Apr 30, 2024

lintool commented Apr 30, 2024

wu-ming233 commented Apr 30, 2024

lintool commented Apr 30, 2024

Errors with new MS MARCO v2.1 and BEIR regressions #2480

Errors with new MS MARCO v2.1 and BEIR regressions #2480

Comments

lintool commented Apr 30, 2024

wu-ming233 commented Apr 30, 2024

lintool commented Apr 30, 2024

lintool commented Apr 30, 2024

lintool commented Apr 30, 2024

wu-ming233 commented Apr 30, 2024

lintool commented Apr 30, 2024