Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run 2CR on pre-built indexes directly on fatjar - can't read YAML files #2427

Closed
lintool opened this issue Mar 28, 2024 · 14 comments
Closed

Comments

@lintool
Copy link
Member

lintool commented Mar 28, 2024

We need to read the YAML files from the fatjar, not depend on a local file:

$ wget https://repo1.maven.org/maven2/io/anserini/anserini/0.25.0/anserini-0.25.0-fatjar.jar

$ java -cp anserini-0.25.0-fatjar.jar io.anserini.reproduce.RunMsMarco
Exception in thread "main" java.io.FileNotFoundException: src/main/java/io/anserini/reproduce/msmarco-v1-passage.yaml (No such file or directory)
	at java.base/java.io.FileInputStream.open0(Native Method)
	at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
	at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
	at java.base/java.io.FileInputStream.<init>(FileInputStream.java:112)
	at io.anserini.reproduce.RunMsMarco.main(RunMsMarco.java:41)
@ArthurChen189 ArthurChen189 self-assigned this Mar 28, 2024
@16BitNarwhal
Copy link
Member

working on this

@ArthurChen189 ArthurChen189 removed their assignment Apr 7, 2024
@ArthurChen189
Copy link
Member

Sorry, I was prepping for final exams and working on hackathon stuff. I am happy to help @16BitNarwhal if there is any question :)

@lintool
Copy link
Member Author

lintool commented Apr 27, 2024

@16BitNarwhal any progress on this?

@16BitNarwhal
Copy link
Member

16BitNarwhal commented Apr 27, 2024

I was able get RunMsMarco to read the YAML file from within the fatjar but the fatjar is still not completely self-contained as it can only be run from the root directory (it needs access to bin/run.sh which (I think) is running all fatjars from target/

I'm having a bit of trouble proceeding from here

@lintool
Copy link
Member Author

lintool commented Apr 27, 2024

I think you need to read directly from a jar, something like this? https://stackoverflow.com/questions/20389255/reading-a-resource-file-from-within-jar

@lintool
Copy link
Member Author

lintool commented Apr 28, 2024

#2469 was a first step, but we have another issue:

% java -cp `ls *-fatjar.jar` io.anserini.reproduce.RunMsMarco
# Running condition "bm25-default": BM25 (k1=0.9, b=0.4) 

  - topic_key: msmarco-v1-passage.dev

    Running retrieval command: bin/run.sh io.anserini.search.SearchCollection -threads 16 -index msmarco-v1-passage -topics msmarco-v1-passage.dev -output runs/run.msmarco-v1-passage.bm25-default.msmarco-v1-passage.dev.txt -hits 1000 -bm25

The commands are using bin/run.sh, which is only available in the repo. I'll have to find a way to adjust this somehow.

@16BitNarwhal
Copy link
Member

I think a solution could be to locate the fatjar in java and format the command string to use the located path i.e. java -cp $absolutefatjarpath ...

This way, running shouldn't depend on any files in the repo like bin/run.sh

@lintool
Copy link
Member Author

lintool commented Apr 28, 2024

Yea, I tried that... I was playing with something like

java -cp `ls . target 2> /dev/null | grep fatjar` ...

And then it got too janky. I'm thinking a reasonable solution might be to just store the sh scripts in the fatjar, and just extract the scripts into ., and then use the scripts to launch.

java -cp $absolutefatjarpath ...

This is doable also, but you just need to define the variable first... which isn't too bad, but just one more thing to do...

@lintool
Copy link
Member Author

lintool commented Apr 28, 2024

I think a solution could be to locate the fatjar in java and format the command string to use the located path i.e. java -cp $absolutefatjarpath ...

This way, running shouldn't depend on any files in the repo like bin/run.sh

BTW, we need to both run in the repo (i.e., cloned copy) and also in an arbitrary location... so by the time you get this working, it'll end up looking pretty janky also.

@16BitNarwhal
Copy link
Member

Yea, I tried that... I was playing with something like

java -cp `ls . target 2> /dev/null | grep fatjar` ...

And then it got too janky. I'm thinking a reasonable solution might be to just store the sh scripts in the fatjar, and just extract the scripts into ., and then use the scripts to launch.

java -cp $absolutefatjarpath ...

This is doable also, but you just need to define the variable first... which isn't too bad, but just one more thing to do...

I was thinking something along the lines of this
String path = new File(MyClass.class.getProtectionDomain().getCodeSource().getLocation() .toURI()).getPath(); and then plugging that into the command in RunMsMarco.java.

I think this can reliably find the fatjar file for any location
https://stackoverflow.com/questions/320542/how-to-get-the-path-of-a-running-jar-file

@lintool
Copy link
Member Author

lintool commented Apr 28, 2024

Want to give it a try?

@16BitNarwhal
Copy link
Member

Yes!

@16BitNarwhal
Copy link
Member

I replaced both bin/run.sh and bin/trec_eval with java -cp $fatjar and java -cp $fatjar trec_eval to prevent dependence on those files

I think RunMsMarco still depends on tools/ in the repo though

@lintool
Copy link
Member Author

lintool commented Apr 30, 2024

Closed by #2476.

@lintool lintool closed this as completed Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants