LAREX is a semi-automatic open-source tool for layout analysis on early printed books. It uses a rule based connected components approach which is very fast, easily comprehensible for the user and allows an intuitive manual correction if necessary. The PageXML format is used to support integration into existing OCR workflows. Evaluations showed that LAREX provides an efficient and flexible way to segment pages of early printed books.
Please feel free to visit the tool homepage and the web application.
In the last few weeks a new, browser based GUI has been built from scratch. This also required some substantial changes within the rest of the code. We are now working on a step by step adaption and integration of the existing functionality. Nevertheless, feel free to start testing right away but please keep in mind that it's work in progress.
To run LAREX the image processing library OpenCV including java bindings is required. We recommend using version 2.4.9. The usage of newer versions (3.X) is possible but some adapations within the code will be necessary. LAREX expects the corresponding .jar to be located in /src/main/WEB-INF/lib and to be called "opencv.jar".
apt-get install tomcat7
(or use tomcat8)
apt-get install maven
apt-get install libopencv-dev
apt-get install openjdk-8-jdk
git clone https://github.com/chreul/LAREX.git
mkdir LAREX/Larex/src/main/webapp/WEB-INF/lib
ln -s /usr/share/java/opencv.jar LAREX/Larex/src/main/webapp/WEB-INF/lib/opencv.jar
run mvn clean install -f LAREX/Larex/pom.xml
.
Either:
sudo ln -s LAREX/Larex/target/Larex.war /var/lib/tomcat7/webapps/Larex.war
or
cp LAREX/Larex/target/Larex.war /var/lib/tomcat7/webapps/Larex.war
set LD_LIBRARY_PATH
to $CMAKE_INSTALL_PREFIX/share/OpenCV/java
when starting the Tomcat server.
It is recommended to use Eclipse.
Rename your .jar to "opencv.jar" and the corresponding .ddl to "opencv.dll" and copy it to /src/main/WEB-INF/lib.
In Eclipse go to Help -> Install New Software -> Work with neon -> Install Web, XML, Java EE and OSGi Enterprise Development
Install Maven as seen above and build the project.
Download the most recent version under http://tomcat.apache.org/download-90.cgi.
Select the web perspective and add the Tomcat server.
Go to localhost:8080/Larex
.
You can add your own books by copying them to src/webapp/resources/books.
.tif support and the PageXML export will be added in the very near future.
Reul, C., Springmann, U., and Puppe, F.: LAREX - A semi-automatic open-source Tool for Layout Analysis and Region Extraction on Early Printed Books. Accepted for oral presentation at DATeCH 2017. Draft available at arXiv.
Reul, C., Dittrich, M., and Gruner, M.: Case Study of a highly automated Layout Analysis and OCR of an incunabulum: ‘Der Heiligen Leben’ (1488).. Accepted for oral presentation at DATeCH 2017. Draft available at arXiv.