Skip to content
This repository has been archived by the owner on Mar 17, 2022. It is now read-only.

Tesseract 4.0 Support? #196

Closed
seantibb opened this issue Mar 7, 2017 · 49 comments
Closed

Tesseract 4.0 Support? #196

seantibb opened this issue Mar 7, 2017 · 49 comments

Comments

@seantibb
Copy link

seantibb commented Mar 7, 2017

First, I love tess-two...really :). I was just reading through the tesseract-ocr wiki (https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance) and noticed there are some major performance gains with 4.0. Is there anything I can do to help update tess-two to support 4.0 as well?

Thanks!

@rmtheis
Copy link
Owner

rmtheis commented Mar 21, 2017

Thanks. I definitely want to update to support Tesseract 4.0 for the reasons you point to. I'll need help to do it for sure, and I appreciate all the contributions from you and everyone else!

There are two things that contributors can help with right now that will help toward supporting Tesseract 4:

  1. Investigation of Crashing reported on 64-bit devices #197. This bug needs more info as to:
  • which architectures (x86, x86_64, armeabi-v7a, arm64-v8a, mips, mips64) it happens on
  • which versions of tess-two it happens on
  • which commit introduced the crash

The crash is reproducible on emulators, so having a 64-bit device isn't a requirement for looking into this.

  1. Many of the changes in Tesseract 3.05 are back-ports of Tesseract 4 code, so tess-two support for Tesseract 3.05 will be a step in the right direction toward supporting Tesseract 4. When I have a chance I plan to upload a branch that's a work in progress for supporting Tesseract 3.05. I'll be needing some help getting that branch working. I plan to update this issue when I upload that branch.

@rmtheis
Copy link
Owner

rmtheis commented Apr 3, 2017

Update: I've pushed code to the master branch that runs Tesseract 3.05.00. The problems I had been having with an earlier version of the Tesseract code have been resolved. I plan to make a release on Bintray/JCenter with these new changes soon.

@rmtheis
Copy link
Owner

rmtheis commented Apr 11, 2017

Update: The Tesseract 3.05.00 code has been released in tess-two 6.3.0.

I have pushed a branch called tesseract4 that's a work in progress for Tesseract 4.0. It builds, but it's not working as of right now.

@jasonwedepohl
Copy link

Tesseract 4.0's LSTM is "much more memory-intensive" according to the doc on accuracy and performance. I can't find the specs of the test machine, but is possible that the memory constraints of most mobile devices will slow down the engine. I did read somewhere that the plan is to mark the original Tesseract engine as obsolete, so I hope that LSTM can really perform better on devices with 1 to 2 GB of RAM.

@rmtheis
Copy link
Owner

rmtheis commented Jun 21, 2017

06-20 20:54:51.936 1354-1354/? A/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
06-20 20:54:51.936 1354-1354/? A/DEBUG: Build fingerprint: 'Android/sdk_google_phone_x86/generic_x86:6.0/MASTER/3738108:userdebug/test-keys'
06-20 20:54:51.936 1354-1354/? A/DEBUG: Revision: '0'
06-20 20:54:51.936 1354-1354/? A/DEBUG: ABI: 'x86'
06-20 20:54:51.936 1354-1354/? A/DEBUG: pid: 3428, tid: 3441, name: ationTestRunner  >>> com.googlecode.tesseract.android.test <<<
06-20 20:54:51.936 1354-1354/? A/DEBUG: signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
06-20 20:54:51.938 1354-1354/? A/DEBUG:     eax 00000000  ebx 00000d64  ecx 00000d71  edx 00000006
06-20 20:54:51.938 1354-1354/? A/DEBUG:     esi ae7b9980  edi 00000002
06-20 20:54:51.938 1354-1354/? A/DEBUG:     xcs 00000073  xds 0000007b  xes 0000007b  xfs 0000004f  xss 0000007b
06-20 20:54:51.938 1354-1354/? A/DEBUG:     eip b72dbf26  ebp 00000d71  esp ae7b83e0  flags 00200202
06-20 20:54:51.951 1354-1354/? A/DEBUG: backtrace:
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #00 pc 00083f26  /system/lib/libc.so (tgkill+22)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #01 pc 000815f8  /system/lib/libc.so (pthread_kill+70)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #02 pc 00027205  /system/lib/libc.so (raise+36)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #03 pc 000209e4  /system/lib/libc.so (abort+80)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #04 pc 0012b127  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (ERRCODE::error(char const*, TessErrorLogCode, char const*, ...) const+263)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #05 pc 000fdb6a  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::ImageData::PreScale(int, int, float*, int*, int*, GenericVector<TBOX>*) const+138)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #06 pc 0018dba6  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Input::PrepareLSTMInputs(tesseract::ImageData const&, tesseract::Network const*, int, tesseract::TRand*, float*)+70)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #07 pc 00195edb  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::LSTMRecognizer::RecognizeLine(tesseract::ImageData const&, bool, bool, bool, float, float*, tesseract::NetworkIO*, tesseract::NetworkIO*)+155)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #08 pc 00195621  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::LSTMRecognizer::RecognizeLine(tesseract::ImageData const&, bool, bool, double, bool, UNICHARSET const*, TBOX const&, float, bool, tesseract::PointerVector<WERD_RES>*)+705)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #09 pc 000b852a  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::LSTMRecognizeWord(BLOCK const&, ROW*, WERD_RES*, tesseract::PointerVector<WERD_RES>*)+426)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #10 pc 000a0f45  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::classify_word_pass1(tesseract::WordData const&, WERD_RES**, tesseract::PointerVector<WERD_RES>*)+117)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #11 pc 0009df0a  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::RetryWithLanguage(tesseract::WordData const&, void (tesseract::Tesseract::*)(tesseract::WordData const&, WERD_RES**, tesseract::PointerVector<WERD_RES>*), bool, WERD_RES**, tesseract::PointerVector<WERD_RES>*)+170)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #12 pc 00098935  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::classify_word_and_language(int, PAGE_RES_IT*, tesseract::WordData*)+453)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #13 pc 00099666  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::RecogAllWordsPassN(int, ETEXT_DESC*, PAGE_RES_IT*, GenericVector<tesseract::WordData>*)+774)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #14 pc 0009ac80  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::recog_all_words(PAGE_RES*, ETEXT_DESC*, TBOX const*, char const*, int)+464)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #15 pc 0008544a  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::TessBaseAPI::Recognize(ETEXT_DESC*)+890)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #16 pc 00083d56  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::TessBaseAPI::GetUTF8Text()+70)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #17 pc 0026ba0d  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (Java_com_googlecode_tesseract_android_TessBaseAPI_nativeGetUTF8Text+77)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #18 pc 00022d2c  /data/app/com.googlecode.tesseract.android.test-1/oat/x86/base.odex (offset 0x12000) (java.lang.String com.googlecode.tesseract.android.TessBaseAPI.nativeGetUTF8Text(long)+128)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #19 pc 00024fb5  /data/app/com.googlecode.tesseract.android.test-1/oat/x86/base.odex (offset 0x12000) (java.lang.String com.googlecode.tesseract.android.TessBaseAPI.getUTF8Text()+185)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #20 pc 000171b6  /data/app/com.googlecode.tesseract.android.test.test-1/oat/x86/base.odex (offset 0xe000) (void com.googlecode.tesseract.android.test.TessBaseAPITest.testChoiceIterator()+378)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #21 pc 00137a82  /system/lib/libart.so (art_quick_invoke_stub+338)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #22 pc 001435c4  /system/lib/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+212)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #23 pc 0050f858  /system/lib/libart.so (art::InvokeMethod(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jobject*, _jobject*, unsigned int)+1736)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #24 pc 0048c5e3  /system/lib/libart.so (art::Method_invoke(_JNIEnv*, _jobject*, _jobject*, _jobject*)+80)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #25 pc 72a3aca4  /data/dalvik-cache/x86/system@[email protected] (offset 0x1eb2000)
06-20 20:54:51.999 1354-1354/? A/DEBUG: Tombstone written to: /data/tombstones/tombstone_00

@kirantpatil
Copy link

Hi All,

Any updates on this issue ?

@kirantpatil
Copy link

Can we use tess-two with Tesseract 4.0 ?

@amin1985
Copy link

in tesseract 4
dotproductsse.cpp , dotproductavx.cpp
https://github.com/tesseract-ocr/tesseract/blob/197b89b6ac8ca61c0feeb88479cecea6600b8733/arch/dotproductavx.cpp
fprintf(stderr, "DotProductAVX can't be used on Android\n");

it mentioned that "AVX" and "SSE" can't be used on Android
what is avx?
Intel® Advanced Vector Extensions (Intel® AVX) has been extended to support 256-bit instruction size on 64-bit processors
so its hardware based cpu architect and intel patent that available on intel and AMD CPUs
https://en.wikipedia.org/wiki/Advanced_Vector_Extensions

as i know none of android CPUs support it (Intel doesn't let)
it means lstm will not be available on android and if it does, it will be available only on new devices
or optimized lstm ocr released for android(if possible)

or maybe i am wrong about this post?!

@Wikinaut
Copy link

Wikinaut commented Oct 6, 2017

Have you tried to compile and build the recent Tesseract 4.0 https://github.com/tesseract-ocr/tesseract version?

@ruthloeser
Copy link

Hi,
I am trying to compile and run Tesseract 4,
I get
I/Tesseract(native): Initialized Tesseract API with language=eng
A/libc: Fatal signal 6 (SIGABRT), code -6 in tid 23391 (le.tess_two_app)
Any idea what causing this error

@rezaee
Copy link

rezaee commented Mar 13, 2018

When will tess-two support tesseract4?

@magamine
Copy link

any news for tesseract 4 ?

@avielas
Copy link

avielas commented Mar 27, 2018

what about tesseract 4 ?

@nirajan-pant
Copy link

I found Tesseract OCR iOS is a Framework for iOS7+, compiled also for armv7s and arm64. Update tesseract version to 4.00.00alpha at https://github.com/chaoskyme/Tesseract-OCR-iOS

Will this help to figure out the compile issues for Android?

@avielas
Copy link

avielas commented May 14, 2018

Sounds interesting but I think it doesn’t help because the major challenge is the JNI interface which exists just on OCR Android

@rezaee
Copy link

rezaee commented May 18, 2018

Dear @rmtheis ,
Thanks for your great work. But me as a mid-level or beginner programmer, don't know exactly how can I help to porting tesseract 4 on Android. Maybe if you could explain more in details or break the project down into some small projects, we can help you to do it sooner.

@ghost
Copy link

ghost commented May 18, 2018

I like to contribute too, but this is my first time and first post here and don't know how can I do that?

@AbdelsalamHaa
Copy link

did any of u guys could use tess two with tesseract 4 so far or not ?? is there any way to get tesseract 4.0 to work with andriod ??
Thank you so much.

@ghost
Copy link

ghost commented May 27, 2018

Maybe the owner is left the project?

@hejin
Copy link

hejin commented Jun 4, 2018

Hi guys, I thought we may have asked for too much for the project contributors.

LSTM/RNN inference performance & resource optimization in mobile/embed platforms is not just a piece of cake as supposed.

for guys wish to contribute, my suggestion is to get the latest stable release (tesseract v3.0.5) to run with pure JNI/c++ code in android firstly. This project(code) by @rmtheis and other guys has already provided enough HOWTO information. They have no duty to answer all the questions since
it's OPEN SOURCE project !!! Let's appreciate the great work by these guys @rmtheis et al.

@rmtheis
Copy link
Owner

rmtheis commented Jul 3, 2018

I don't know when I'll have time to work on updating this project to use the Tesseract 4 beta. If anyone wants to take this task on, please have at it!

One smaller (but still pretty big) task that would help toward that effort would be to make a pull request that gets Travis CI working on this project. What I have in mind is a Travis configuration that builds the project and then runs the instrumented tests on emulators for armv7, armv8, x86, and x86-64.

@avielas
Copy link

avielas commented Jul 20, 2018

I checkout tesseract4 branch (from tess-two repository) and succeed to run './gradlew assemble' with tests passed accuracy of 89%. Can I use tess-two now with tesseract4 support?
If not, what should I do more to get this support on my android app?
Actually I run also my application tests with the compiled tess-two (tesseract4 brach) but I get exactly the same results (as master branch)

@avielas
Copy link

avielas commented Jul 25, 2018

@rmtheis can you please answer my question?

@hadar-ayoub
Copy link

Hi,

I think we need a list of remaining tasks to integrate completely the tesseract 4 on this library.

@rmtheis What can i do to contribute on it?

Regards,
Ayoub

@rmtheis
Copy link
Owner

rmtheis commented Aug 12, 2018

Currently the tesseract4 branch builds successfully with NDK r16b, and the legacy OEM mode 0 works, but but I'm seeing the following crash when running with v4 training data and the LSTM OEM mode 1:

2018-08-12 13:46:25.744 8943-8943/? A/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
2018-08-12 13:46:25.744 8943-8943/? A/DEBUG: Build fingerprint: 'google/sdk_gphone_x86/generic_x86:9/PPP4.180612.007/4860066:userdebug/dev-keys'
2018-08-12 13:46:25.744 8943-8943/? A/DEBUG: Revision: '0'
2018-08-12 13:46:25.744 8943-8943/? A/DEBUG: ABI: 'x86'
2018-08-12 13:46:25.744 8943-8943/? A/DEBUG: pid: 8924, tid: 8940, name: ationTestRunner  >>> com.googlecode.tesseract.android.test <<<
2018-08-12 13:46:25.745 8943-8943/? A/DEBUG: signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
2018-08-12 13:46:25.745 8943-8943/? A/DEBUG:     eax 00000000  ebx 000022dc  ecx 000022ec  edx 00000006
2018-08-12 13:46:25.745 8943-8943/? A/DEBUG:     edi 000022dc  esi d597f1b0
2018-08-12 13:46:25.746 8943-8943/? A/DEBUG:     ebp 00000000  esp d597f168  eip f2b94b59
2018-08-12 13:46:25.782 8943-8943/? A/DEBUG: backtrace:
2018-08-12 13:46:25.782 8943-8943/? A/DEBUG:     #00 pc 00000b59  [vdso:f2b94000] (__kernel_vsyscall+9)
2018-08-12 13:46:25.783 8943-8943/? A/DEBUG:     #01 pc 0001fdf8  /system/lib/libc.so (syscall+40)
2018-08-12 13:46:25.783 8943-8943/? A/DEBUG:     #02 pc 00022ed3  /system/lib/libc.so (abort+115)
2018-08-12 13:46:25.783 8943-8943/? A/DEBUG:     #03 pc 00145cea  /data/app/com.googlecode.tesseract.android.test-L9dyqNXunn4MUzRlrLu1rg==/lib/x86/libtess.so (ERRCODE::error(char const*, TessErrorLogCode, char const*, ...) const+266)
2018-08-12 13:46:25.783 8943-8943/? A/DEBUG:     #04 pc 000e3502  /data/app/com.googlecode.tesseract.android.test-L9dyqNXunn4MUzRlrLu1rg==/lib/x86/libtess.so (_ZN9tesseract9Tesseract24init_tesseract_lang_dataEPKcS2_S2_NS_13OcrEngineModeEPPciPK13GenericVectorI6STRINGESA_bPNS_15TessdataManagerE+1170)
2018-08-12 13:46:25.783 8943-8943/? A/DEBUG:     #05 pc 000e3d9e  /data/app/com.googlecode.tesseract.android.test-L9dyqNXunn4MUzRlrLu1rg==/lib/x86/libtess.so (_ZN9tesseract9Tesseract14init_tesseractEPKcS2_S2_NS_13OcrEngineModeEPPciPK13GenericVectorI6STRINGESA_bPNS_15TessdataManagerE+606)
2018-08-12 13:46:25.783 8943-8943/? A/DEBUG:     #06 pc 0008448a  /data/app/com.googlecode.tesseract.android.test-L9dyqNXunn4MUzRlrLu1rg==/lib/x86/libtess.so (_ZN9tesseract11TessBaseAPI4InitEPKciS2_NS_13OcrEngineModeEPPciPK13GenericVectorI6STRINGESA_bPFbRKS7_PS6_IcEE+474)
2018-08-12 13:46:25.783 8943-8943/? A/DEBUG:     #07 pc 0008429b  /data/app/com.googlecode.tesseract.android.test-L9dyqNXunn4MUzRlrLu1rg==/lib/x86/libtess.so (_ZN9tesseract11TessBaseAPI4InitEPKcS2_NS_13OcrEngineModeEPPciPK13GenericVectorI6STRINGESA_b+107)
2018-08-12 13:46:25.783 8943-8943/? A/DEBUG:     #08 pc 002a80f4  /data/app/com.googlecode.tesseract.android.test-L9dyqNXunn4MUzRlrLu1rg==/lib/x86/libtess.so (Java_com_googlecode_tesseract_android_TessBaseAPI_nativeInitOem+100)

@wolfhe
Copy link

wolfhe commented Sep 11, 2018

@rmtheis what's your testing environment? If it's running on Android/ARM instead of x86 emulator/, I suspect there are some issues in the project build setting - from the stacktrace it shows it's running some x86 code.

@wolfhe
Copy link

wolfhe commented Sep 11, 2018

hi guys, since many of us are interested in the 4.0 stuff, why not try to build & run it and report issues here. The steps might looks like that:

  1. read tess-two wiki page and try to build it with tesseract 4.0 beta.x
  2. run your self-built tess-two with real android phones with various versions with traditional OCR engines, and report issues here
  3. run your self-built tess-two with real android phones with various versions with the fancy LSTM engine, and report issues here.

If we can just make LSTM engine (even w/o any architecture native optimization, e.g. Using hand-written Neon code (ARM SSE/AVX counterpart in x86)) run with an android phone, it would be a great leap ahead.

comments?

@hejin
Copy link

hejin commented Sep 13, 2018

the ANDROID_BUILD macro in tess-two/jni/com_googlecode_tesseract_android/src/ looks problematic.

current tess-two building (branch tesseract4.0) doesnt define this macro, so it will enable the LSTM stuff for the real android build.

the lucky thing is : there are some defensive coding in tesseract/arch sources which just simply abort the x86 SSE/AVX optimization in the compiling time:

// from dotproductsse.cpp

#if !defined(SSE4_1)
// This code can't compile with "-msse4.1", so use dummy stubs.

#include "dotproductsse.h"
#include <stdio.h>
#include <stdlib.h>

namespace tesseract {
double DotProductSSE(const double* u, const double* v, int n) {
fprintf(stderr, "DotProductSSE can't be used on Android\n");
abort();
}
int32_t IntDotProductSSE(const int8_t* u, const int8_t* v, int n) {
fprintf(stderr, "IntDotProductSSE can't be used on Android\n");
abort();
}
} // namespace tesseract

#else // !defined(SSE4_1)
// Non-Android code here

not sure if the result of the calling of 'abort()' is that people observed in running time while trying to launch a tess-two with LSTM engine in android.

@rmtheis

@wolfhe
Copy link

wolfhe commented Sep 13, 2018

@hejin does this mean the LSTM feature was intended to be disabled in android?

@hejin
Copy link

hejin commented Sep 13, 2018

yep it looks like the tesseract 4.0 authors won't enable LSTM feature in android platform too early for potential resource running out issues. so they use the ANDROID_BUILD macro to disable LSTM feature temporarily. however the tess-two JNI build instruction looks not to follow the rule to use the ANDROID_BUILD macro (pls correct me if wrong @rmtheis ), so the LSTM feature will be enabled in tess-two tesseract4.0 branch. as a defensive approach to avoid more issues by wrongly taken x86 AVX/SSE instructions in ARM platforms, the LSTM operators optimization people replaced the optimized operator subroutines with a calling of abort() function while the not-expected case does happen!

@Robyer
Copy link
Contributor

Robyer commented Dec 19, 2018

Final version of Tesseract 4.0 was released few weeks ago. Is there any new progress or time expectation when it will be integrated in tess-two?

EDIT: Someone said here that he was able to compile Tesseract for Android (without tess-two) - https://groups.google.com/d/msg/tesseract-ocr/zuZYuz12oQc/VCavzreVCQAJ

@rmtheis
Copy link
Owner

rmtheis commented Dec 30, 2018

@Robyer I won't have time to update tess-two for Tesseract 4.0 anytime soon. This project is in need of someone familiar with C++ to take this task on! I'm happy to review and test proposed changes. Please don't hesitate to contribute yourself if you're at all inclined to do so -- your past contributions have been hugely helpful.

I'm not sure what to make of the linked comment about the cmake build. Please share your results if you end up looking into that approach.

@Robyer
Copy link
Contributor

Robyer commented Jan 9, 2019

@rmtheis Will you have time to help me understand the current build configuration that you use for native code? I tried to rework building your native code to standard ndkBuild in Gradle (I wanted to have proper native code completion and debugging in Android Studio) by removing your custom tasks, specifying jni.srcDirs = ['jni'] sourceset and adding this into tess-two build.gradle file (and similar to eyes-two):

android {
    externalNativeBuild {
        ndkBuild {
            path file('jni/Android.mk')
        }
    }
}

but there were some errors with references to liblept. It seems both tess-two and eyes-two depends on leptonica, but also tess-two depends on eyes-two. Problem is that eyes-two can't compile leptonica, but expects leptonica prebuilt library which is compiled by tess-two. So it's somehow circular reference which works only in your manual compilation.

I think we should separate leptonica into its own module and then make tess-two and eyes-two modules directly dependent on leptonica module. But I don't understand the Android.mk files and the sources enough to easily do that. Perhaps you can help with that?

So far I prepared PR #256 to make project work properly in latest Android Studio. Then if you look at Robyer@572c2f1 you will see changes to use ndkBuild in Gradle, but Android.mk/Application.mk files needs to be modified to make it compile. It doesn't know how to compile liblept.so which is needed in libhydrogen.so.

@rmtheis
Copy link
Owner

rmtheis commented Jan 12, 2019

@Robyer Agreed that using externalNativeBuild would be better than the custom task calling out to the command line. I ended up using the command line approach after giving up on getting externalNativeBuild to work. I don't recall what the sticking point was at the time.

I'm not aware of anywhere that the tess-two module depends on the eyes-two module, and the intent is to not have that type of circular dependency. I agree that it would be a better design to have Leptonica as a separate module, but the overall legacy project structure is so time-consuming for me to rearrange that I'd be reluctant to take that project on. Like you mention, it probably would require substantial changes to the Android.mk files and so on.

When I try building your ndkBuildGradle branch, I see the issue you mentioned with libhydrogen and liblept. I'm not sure how to resolve that issue. When I remove the eyes-two module from the project and try again, it starts building but then fails with the mystery error make (e=87): The parameter is incorrect. By the way, I've been using NDK r16b, which is an older version.

@rhardih
Copy link

rhardih commented Jan 16, 2019

I've used tess-two in the past, but since going native and basically only needing the .so files, I've switched to a more direct way of building tesseract, just using the sdk/ndk.

I'm not sure if this information is directly transferable to the build issues of tess-two, but just in case, I've got a working build chain for Tesseract 4.0.0, that might help as an example?:

https://github.com/rhardih/bad/blob/master/tesseract/tesseract-4.0.0.Dockerfile

It obviously depends on Leptonica as well, which is also included:

https://github.com/rhardih/bad/blob/master/leptonica/leptonica.Dockerfile

If these is completely unhelpful, please disregard. :)

@zsmartercn
Copy link

Hi,@ALL
We porting Tesseract 4.0(final) to Android base on tess-two and rewrite dot product function with ARM NEON. The project also includes a full OCR demo App.
Please view https://github.com/zsmartercn/Tess4Android.

@Robyer
Copy link
Contributor

Robyer commented Jan 24, 2019

@zsmartercn Hi, is it intentional that you squashed all your changes into single first commit? It's completely unusable to cherry-pick potential fixes or changes back to tess-two repository. Perhaps you can make pull requests with important changes from which could benefit tess-two users?

@Robyer
Copy link
Contributor

Robyer commented Jan 25, 2019

Success!

I created new AS project from scratch to be able to use default directory structure and configure CMake instead of ndkBuild and after various changes I'm finally able to successfully compile and use Tesseract 4.0 even with LSTM (it seems). Also debugging, code completion and other things works nicely in Android Studio 3.3.

Because of the completely reworked project structure I won't be able to provide PR for tess-two though. After I clean my code and changes, I will publish it as a separate repository.

@rmtheis
Copy link
Owner

rmtheis commented Jan 27, 2019

Excellent--thanks @zsmartercn and @Robyer, for your contributions to open source. I'm looking forward to trying out your projects, and I'll plan to merge your changes for Tesseract 4 support back into this project when I have some time.

@AmitPrajapati1902
Copy link

AmitPrajapati1902 commented Jan 28, 2019

@Robyer When you update latest code with CMake build ? please provide some details to prepare current @zsmartercn repo to CMake base build.

@Robyer
Copy link
Contributor

Robyer commented Jan 28, 2019

Here it is! https://github.com/adaptech-cz/Tesseract4Android 🎉

Note eyes-two is not included yet. Monitor changes from tess-two are not implemented either - it should be reworked to use PROGRESS_FUNC2 instead of editing PROGRESS_FUNC and ETEXT_DESC directly.

@rmtheis Why is in your tesseract4 branch this "Add hack to handle log2" commit? What it does?

@AmitPrajapati1902
Copy link

@Robyer Thanks man, it works great.

@rmtheis
Copy link
Owner

rmtheis commented Jan 30, 2019

@Robyer log2 was unavailable, so that commit manually replaced instances of that method call with a replacement that's mathematically equivalent in order to get the code to build, similar to what the Tesseract 4 code now has here: https://github.com/tesseract-ocr/tesseract/blob/9fd8f471f371117c2e5dff5474495218fba63e8c/src/lstm/weightmatrix.cpp#L29

@Robyer
Copy link
Contributor

Robyer commented Jan 30, 2019

@rmtheis I see, that explains why I didn't experienced the missing log2 problem myself. Thanks.

@irann93
Copy link

irann93 commented Jan 31, 2019

@zsmartercn @Robyer Thanks for the effort. It works!!

@ygyin-ivy
Copy link

ygyin-ivy commented Jun 5, 2019

i fixed this issue.
lept 1.76.0 (should not be 1.74.*)
tesseract https://codeload.github.com/tesseract-ocr/tesseract/tar.gz/4.0.0

my Android.mk of tesseract is

Android.zip
, to enable lstm model.

EXPLICIT_SRC_EXCLUDES should include fileio.cpp (training use) to remove dependence of glob.c, or download a copy of glob.c to local.

when build on windows, max path length should be < 251.
so i rename com_googlecode_*_android to *

apk run correctly on mobiles of api 19->api23 (armeabi-v7a)

06-20 20:54:51.936 1354-1354/? A/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
06-20 20:54:51.936 1354-1354/? A/DEBUG: Build fingerprint: 'Android/sdk_google_phone_x86/generic_x86:6.0/MASTER/3738108:userdebug/test-keys'
06-20 20:54:51.936 1354-1354/? A/DEBUG: Revision: '0'
06-20 20:54:51.936 1354-1354/? A/DEBUG: ABI: 'x86'
06-20 20:54:51.936 1354-1354/? A/DEBUG: pid: 3428, tid: 3441, name: ationTestRunner  >>> com.googlecode.tesseract.android.test <<<
06-20 20:54:51.936 1354-1354/? A/DEBUG: signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
06-20 20:54:51.938 1354-1354/? A/DEBUG:     eax 00000000  ebx 00000d64  ecx 00000d71  edx 00000006
06-20 20:54:51.938 1354-1354/? A/DEBUG:     esi ae7b9980  edi 00000002
06-20 20:54:51.938 1354-1354/? A/DEBUG:     xcs 00000073  xds 0000007b  xes 0000007b  xfs 0000004f  xss 0000007b
06-20 20:54:51.938 1354-1354/? A/DEBUG:     eip b72dbf26  ebp 00000d71  esp ae7b83e0  flags 00200202
06-20 20:54:51.951 1354-1354/? A/DEBUG: backtrace:
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #00 pc 00083f26  /system/lib/libc.so (tgkill+22)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #01 pc 000815f8  /system/lib/libc.so (pthread_kill+70)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #02 pc 00027205  /system/lib/libc.so (raise+36)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #03 pc 000209e4  /system/lib/libc.so (abort+80)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #04 pc 0012b127  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (ERRCODE::error(char const*, TessErrorLogCode, char const*, ...) const+263)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #05 pc 000fdb6a  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::ImageData::PreScale(int, int, float*, int*, int*, GenericVector<TBOX>*) const+138)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #06 pc 0018dba6  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Input::PrepareLSTMInputs(tesseract::ImageData const&, tesseract::Network const*, int, tesseract::TRand*, float*)+70)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #07 pc 00195edb  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::LSTMRecognizer::RecognizeLine(tesseract::ImageData const&, bool, bool, bool, float, float*, tesseract::NetworkIO*, tesseract::NetworkIO*)+155)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #08 pc 00195621  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::LSTMRecognizer::RecognizeLine(tesseract::ImageData const&, bool, bool, double, bool, UNICHARSET const*, TBOX const&, float, bool, tesseract::PointerVector<WERD_RES>*)+705)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #09 pc 000b852a  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::LSTMRecognizeWord(BLOCK const&, ROW*, WERD_RES*, tesseract::PointerVector<WERD_RES>*)+426)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #10 pc 000a0f45  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::classify_word_pass1(tesseract::WordData const&, WERD_RES**, tesseract::PointerVector<WERD_RES>*)+117)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #11 pc 0009df0a  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::RetryWithLanguage(tesseract::WordData const&, void (tesseract::Tesseract::*)(tesseract::WordData const&, WERD_RES**, tesseract::PointerVector<WERD_RES>*), bool, WERD_RES**, tesseract::PointerVector<WERD_RES>*)+170)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #12 pc 00098935  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::classify_word_and_language(int, PAGE_RES_IT*, tesseract::WordData*)+453)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #13 pc 00099666  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::RecogAllWordsPassN(int, ETEXT_DESC*, PAGE_RES_IT*, GenericVector<tesseract::WordData>*)+774)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #14 pc 0009ac80  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::Tesseract::recog_all_words(PAGE_RES*, ETEXT_DESC*, TBOX const*, char const*, int)+464)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #15 pc 0008544a  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::TessBaseAPI::Recognize(ETEXT_DESC*)+890)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #16 pc 00083d56  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (tesseract::TessBaseAPI::GetUTF8Text()+70)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #17 pc 0026ba0d  /data/app/com.googlecode.tesseract.android.test-1/lib/x86/libtess.so (Java_com_googlecode_tesseract_android_TessBaseAPI_nativeGetUTF8Text+77)
06-20 20:54:51.952 1354-1354/? A/DEBUG:     #18 pc 00022d2c  /data/app/com.googlecode.tesseract.android.test-1/oat/x86/base.odex (offset 0x12000) (java.lang.String com.googlecode.tesseract.android.TessBaseAPI.nativeGetUTF8Text(long)+128)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #19 pc 00024fb5  /data/app/com.googlecode.tesseract.android.test-1/oat/x86/base.odex (offset 0x12000) (java.lang.String com.googlecode.tesseract.android.TessBaseAPI.getUTF8Text()+185)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #20 pc 000171b6  /data/app/com.googlecode.tesseract.android.test.test-1/oat/x86/base.odex (offset 0xe000) (void com.googlecode.tesseract.android.test.TessBaseAPITest.testChoiceIterator()+378)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #21 pc 00137a82  /system/lib/libart.so (art_quick_invoke_stub+338)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #22 pc 001435c4  /system/lib/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+212)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #23 pc 0050f858  /system/lib/libart.so (art::InvokeMethod(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jobject*, _jobject*, unsigned int)+1736)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #24 pc 0048c5e3  /system/lib/libart.so (art::Method_invoke(_JNIEnv*, _jobject*, _jobject*, _jobject*)+80)
06-20 20:54:51.953 1354-1354/? A/DEBUG:     #25 pc 72a3aca4  /data/dalvik-cache/x86/system@[email protected] (offset 0x1eb2000)
06-20 20:54:51.999 1354-1354/? A/DEBUG: Tombstone written to: /data/tombstones/tombstone_00

@alexcohn
Copy link
Contributor

See also https://github.com/alexcohn/tess-two/tree/4.1

@denzerd
Copy link

denzerd commented Aug 19, 2019

Hi,

great work. A small suggestion, perhaps it would be nice to put some warning on the front page/README of this project in order to inform that there is a different repo with Tesseract-4.1.0 available. I wasted a lot of hours today because I got different results between this project and the command line, until I finally realised that the versions are different.

Best regards

@rmtheis
Copy link
Owner

rmtheis commented Oct 20, 2019

I'm wrapping up the maintenance on this repo and I don't plan on making updates in the future. Note that updates to support Tesseract 4.0 have been made on other forks of this repo such as https://github.com/alexcohn/tess-two/tree/4.1.

Thanks everyone, for your interest and support!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests