Skip to content

Fork of tess-two rewritten from scratch to support latest version of Tesseract OCR.

License

Notifications You must be signed in to change notification settings

devi-inikhiya/Tesseract4Android

 
 

Repository files navigation

Tesseract4Android

Fork of tess-two rewritten from scratch to build with CMake and support latest Android Studio and Tesseract OCR.

The Java/JNI wrapper files and tests for Leptonica / Tesseract are based on the tess-two project, which is based on Tesseract Tools for Android.

Dependencies

This project uses additional libraries (with their own specific licenses):

Prerequisites

  • Android 4.1 (API 16) or higher
  • A v4.0.0 trained data file(s) for language(s) you want to use. Data files must be copied to the Android device to a directory named tessdata.
  • Application must hold permission READ_EXTERNAL_STORAGE to access tessdata directory.

Variants

This library is available in two variants.

  • Standard - Single-threaded. Best for single-core processors or when using multiple Tesseract instances in parallel.
  • OpenMP - Multi-threaded. Provides better performance on multi-core processors when using only single instance of Tesseract.

Usage

You can get compiled version of Tesseract4Android from JitPack.io.

  1. Add the JitPack repository to your project root build.gradle file at the end of repositories:
allprojects {
    repositories {
        ...
        maven { url 'https://jitpack.io' }
    }
}
  1. Add the dependency to your app module build.gradle file:
dependencies {
    // To use Standard variant:
    implementation 'cz.adaptech:tesseract4android:4.1.1'

    // To use OpenMP variant:
    // NOTE: This variant is currently unavailable due to issues with JitPack. You must compile it yourself.
    //implementation 'cz.adaptech:tesseract4android-openmp:4.1.1'
}
  1. Use the TessBaseAPI class in your code:
// Create Tesseract instance
TessBaseAPI tess = new TessBaseAPI();

// Given path must contain subdirectory `tessdata` where are `*.traineddata` language files
String dataPath = new File(Environment.getExternalStorageDirectory(), "tesseract").getAbsolutePath();

// Initialize API for specified language (can be called multiple times during Tesseract lifetime)
if (!tess.init(dataPath, "eng")) {
    // Error initializing Tesseract (wrong data path or language) 
    tess.recycle();
    return;
}

// Specify image and then recognize it and get result (can be called multiple times during Tesseract lifetime)
tess.setImage(image);
String text = tess.getUTF8Text();

// Release Tesseract when you don't want to use it anymore
tess.recycle();

Building

You can use Android Studio to open the project and build the AAR. Or you can use gradlew from command line.

To build the release version of the library, use task tesseract4android:assembleRelease. After successful build, you will have resulting AAR files in the <project dir>/tesseract4Android/build/outputs/aar/ directory.

Android Studio

  • Open this project in Android Studio.
  • Open Gradle panel, expand Tesseract4Android / :tesseract4Android / Tasks / other and run assembleRelease.

GradleW

  • In project directory create local.properties file containing:
sdk.dir=c\:\\your\\path\\to\\android\\sdk
ndk.dir=c\:\\your\\path\\to\\android\\ndk

Note for paths on Windows you must use \ to escape some special characters, as in example above.

  • Call gradlew tesseract4android:assembleRelease from command line.

License

Copyright 2019 Adaptech s.r.o., Robert Pösel

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

About

Fork of tess-two rewritten from scratch to support latest version of Tesseract OCR.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 57.6%
  • C++ 27.6%
  • Roff 5.5%
  • Shell 4.4%
  • Makefile 2.3%
  • Java 1.1%
  • Other 1.5%