Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set native path with gradle #921

Merged
merged 30 commits into from
May 13, 2023
Merged

Set native path with gradle #921

merged 30 commits into from
May 13, 2023

Conversation

kermitt2
Copy link
Owner

@kermitt2 kermitt2 commented Jun 1, 2022

See #688

As dynamically loading native libraries is not possible with JDK > 10 (or with complicated hacks depending on the JVM version), this PR tries to set the java.library.path at launch via Gradle.

The current version covers lin64 only (service, training, evaluation)

To do:

@kermitt2 kermitt2 marked this pull request as draft June 1, 2022 13:00
…h, remove useless stuff for macOs, remove the jep library in grobid-home
@lfoppiano
Copy link
Collaborator

I've made some small progress (on the macOs).

In brief:

  • I've created the task called run-server and I've extended the library path using the conda environment.
  • I've removed the libjep.so from the mac os library path because, depending how it was built might require a python version that is not installed. If we pick it up from the environment the python version will match. E.g. the grobid-home jep was asking for python 3.6 and I have installed 3.8.
  • There is an hardcoded path lib/python3.8/site-packages/jep which needs to be fixed, eventually by visiting the CONDA_PREFIX and find the python version. Here we might need to add additional virtual environment support I suppose.

I can say "It works on my machine" 😂

@coveralls
Copy link

coveralls commented Jun 2, 2022

Coverage Status

Coverage increased (+0.008%) to 39.979% when pulling 46d6a05 on gradle-jep-path into cfa6fe4 on master.

@kermitt2
Copy link
Owner Author

kermitt2 commented Jun 2, 2022

Some feedback/question on your changes:

I've created the task called run-server

There's already a run task for the service

tasks.run {

Isn't redundant?

and I've extended the library path using the conda environment.

Does it make sense to have JEP native libraries in a python environment - given that it could be conda, python virtual environment, poetry, etc.? It seems opening plenty of cases and compatibility issues (it's also confusing with the DeLFT environment indicated in grobid.yaml).

I've removed the libjep.so from the mac os library path because, depending how it was built might require a python version that is not installed. If we pick it up from the environment the python version will match. E.g. the grobid-home jep was asking for python 3.6 and I have installed 3.8.

This is related to #689. I though first about building automatically with Docker and embedding plenty of JEP native variants, but too complicated...

In the latest version, I added a script to download and install JEP temporary, rebuild locally, and install the platform/python version compatible native JEP library under the GROBID native library path, see https://grobid.readthedocs.io/en/latest/Deep-Learning-models/#classic-python-and-virtualenv (point 4) and ./grobid-home/scripts/install_jep_lib.sh
I think it's a good way to go with linux and macos platform without too much complexity.
The script could be called via gradle in the future, and there should then always be a compatible library under grobid-home native library path.

There is an hardcoded path lib/python3.8/site-packages/jep which needs to be fixed, eventually by visiting the CONDA_PREFIX and find the python version. Here we might need to add additional virtual environment support I suppose.

Following the script, the JEP library would always be under grobid-home, without python specific environment, I think it should work all the time?

@lfoppiano
Copy link
Collaborator

There's already a run task for the service

[...]

Isn't redundant?

Yes 🦢 😓 😅

Following the script, the JEP library would always be under grobid-home, without python specific environment, I think it should work all the time?

I personally prefer to use the virtual environment, however, I'm good with using what you've already implemented. 😉

@lfoppiano
Copy link
Collaborator

lfoppiano commented Jun 27, 2022

In the latest version, I added a script to download and install JEP temporary, rebuild locally, and install the platform/python version compatible native JEP library under the GROBID native library path, see https://grobid.readthedocs.io/en/latest/Deep-Learning-models/#classic-python-and-virtualenv (point 4) and ./grobid-home/scripts/install_jep_lib.sh I think it's a good way to go with linux and macos platform without too much complexity. The script could be called via gradle in the future, and there should then always be a compatible library under grobid-home native library path.

I've checked it out. After more than one week start again it's hard 😅
I noticed that after checking it out you run python setup.py build install. Is the install required? I'm asking because the install is going to install jep also in local python installation, so I'm not sure is necessary to also copy jep under the grobid-home.
I've modified the script in local to run on mac os (and removed the install) but when I run grobid it lack the correct python version. I've run the script using a conda virtual environment to avoid having stuff installed in my root env, and I suppose the python version does not match with the root env.

I'm wondering, can we really avoid dealing with the mess of the virtual envs?


Sorry if I'm insisting on this...
In this way the python version can be inferred by looking within the virtual environment. This implementation could be improved indeed with some more robustness but it's to give an idea:

    tasks.run {
        workingDir = rootProject.rootDir
        def condaEnv = "${System.env.CONDA_PREFIX}/lib"
        def pythonDirectory = file(condaEnv).listFiles({it.toString().contains("/lib/python")} as FileFilter)?.first()
        pythonVersion = (pythonDirectory =~ /python([0-9]\.[0-9]+)/)[0][1]

        systemProperty "java.library.path","${System.getProperty('java.library.path')}:" +
            "${System.env.CONDA_PREFIX}/lib:" +
            "${System.env.CONDA_PREFIX}/lib/python${pythonVersion}/site-packages/jep"
    }

@lfoppiano
Copy link
Collaborator

The last commits are mostly commented code, I just want to keep it somewhere 😄

@kermitt2
Copy link
Owner Author

kermitt2 commented Jul 15, 2022

@lfoppiano The PR was normally just about setting the native path of JNI JEP under grobid-home/lib from gradle, so that we can run Grobid with java 10+ with Linux and Mac OS. We are trying to do several other things here and I think this makes it very confusing and we might never arrive to something that can be merged :)

  1. So I would leave out the part related to the python environment for the moment out of this PR, in particular not mixing the gradle part and the environment. This is very experimental I think. So let's not need to worry here about how the native JEP JNI library is coming under grobid-home/lib in this particular PR.

  2. We could do another PR to clarify the JEP install and how to use environment with GROBID+DeLFT. I think we should keep it simple and maintainable. We can simply create an environment for example (with whatever python environment tool), and then do everything with it is activated (install Grobid, run the JEP install script and then start Grobid within this environment). We might not use something else, because a) it is too complicated via Gradle to consider all possible environment cases and paths and b) it currently doesn't work: the python env, path given in the Grobid config is not working with JEP and I don't know why (either everything has to run global or everything within a common python environment activated).

  3. Setting the JEP JNI library path from Gradle is working with Linux now, but the issue is to have it working also with Mac OS, so make gradle able to detect the platform and set the native path appropriately grobid/lib/lin-64or grobid/lib/mac-64. If we can solve this, we can merge the PR.

What do you think?

@lfoppiano
Copy link
Collaborator

lfoppiano commented Jul 20, 2022

@kermitt2 OK for me to just work on assigning the paths for Mac Os and Linux. Should be implemented in 46d6a05
My bad for insisting on the virtual environment, it's just that I cannot really test without them.

@kermitt2 kermitt2 added this to the 0.7.3 milestone Nov 7, 2022
@kermitt2 kermitt2 marked this pull request as ready for review April 23, 2023 11:15
@kermitt2
Copy link
Owner Author

kermitt2 commented Apr 23, 2023

I went through the difference run tasks and adapted gradle for setting java.library.path in every cases.

Tested on Linux-64, it works with OpenJDK 1.15 for the service, training commands, tests and evaluation commands (native Wapiti and JEP). For the batch mode, the -Djava.library.path= need to be added to the command line, which makes it bloated but that's java life - the documentation is updated for that too.

@lfoppiano could you maybe test all these different run cases on Mac ? Normally that should work the same, but just to be sure before merging.

@lfoppiano
Copy link
Collaborator

I could not install easily JDK 15, so I tested with JDK 17.
Meanwhile:

  • I've updated gradle to version 7.2 (7.1.1 did not work due to incompatibility with the JDK bytecode)
  • I've added jvmArgs "--add-opens", "java.base/java.lang=ALL-UNNAMED" as parameters to make it working

@kermitt2 kermitt2 merged commit 9794136 into master May 13, 2023
@lfoppiano lfoppiano deleted the gradle-jep-path branch June 12, 2023 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants