Skip to content

Commit

Permalink
#579 Updated to tesseract 5.2
Browse files Browse the repository at this point in the history
  • Loading branch information
charlesw committed Nov 8, 2022
1 parent 8c08c79 commit 4a7e1db
Show file tree
Hide file tree
Showing 18 changed files with 107 additions and 40 deletions.
3 changes: 2 additions & 1 deletion ChangeLog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
### Version 5.0
* Upgraded to Tesseract 5.0 [Issue 579](https://github.com/charlesw/tesseract/issues/579)
* Upgraded to Tesseract 5.2 [Issue 579](https://github.com/charlesw/tesseract/issues/579)
* Fixed Fix dynamic linking on macos [Issue #588](https://github.com/charlesw/tesseract/issues/588)
* Fixed null reference exception when executing assembly is not available [Issue 591](https://github.com/charlesw/tesseract/issues/591)

Expand All @@ -8,6 +8,7 @@
* Setting regions of interest doesn't work [Issue 489](https://github.com/charlesw/tesseract/issues/489)
* PageSegMode.SingleBlockVertText does not work [Issue 490](https://github.com/charlesw/tesseract/issues/490)
* Unz files don't work [Issue 594](https://github.com/charlesw/tesseract/issues/594)
* Removed support for dotnet 4.0 and 4.5

### Version 4.1.1

Expand Down
18 changes: 9 additions & 9 deletions docs/Compling_tesseract_and_leptonica.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The following also differ from [[Compiling-Tesseract-and-Leptonica]] in that the
The main benefit of this is that it's possible to compile tesseract against the leptonica dll rather than statically
linking leptonica into tesseract which increases file size (since the leptonica dll is still required).

1. Install Visual Studio 2019
1. Install Visual Studio 2022
2. Install CMake (ensure it's on your path)
3. Install [vcpkg](https://github.com/Microsoft/vcpkg/)
* Note: I also set an environment variable VCPKG_HOME to this directory and added it to path for convenience
Expand All @@ -21,13 +21,13 @@ linking leptonica into tesseract which increases file size (since the leptonica
vcpkg install giflib:x86-windows-static libjpeg-turbo:x86-windows-static liblzma:x86-windows-static libpng:x86-windows-static tiff:x86-windows-static zlib:x86-windows-static
vcpkg install giflib:x64-windows-static libjpeg-turbo:x64-windows-static liblzma:x64-windows-static libpng:x64-windows-static tiff:x64-windows-static zlib:x64-windows-static
git clone https://github.com/DanBloomberg/leptonica.git & cd leptonica
git checkout -b 1.80.0 1.80.0
git checkout -b 1.82.0 1.82.0
mkdir vs16-x86 & cd vs16-x86
cmake .. -G "Visual Studio 16 2019" -A Win32 -DSW_BUILD=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_TOOLCHAIN_FILE=%VCPKG_HOME%\scripts\buildsystems\vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x86-windows-static -DCMAKE_INSTALL_PREFIX=..\..\build\x86
cmake .. -G "Visual Studio 17 2022" -A Win32 -DSW_BUILD=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_TOOLCHAIN_FILE=%VCPKG_HOME%\scripts\buildsystems\vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x86-windows-static -DCMAKE_INSTALL_PREFIX=..\..\build\x86
cmake --build . --config Release --target install
cd ..
mkdir vs16-x64 & cd vs16-x64
cmake .. -G "Visual Studio 16 2019" -A x64 -DSW_BUILD=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_TOOLCHAIN_FILE=%VCPKG_HOME%\scripts\buildsystems\vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x64-windows-static -DCMAKE_INSTALL_PREFIX=..\..\build\x64
cmake .. -G "Visual Studio 17 2022" -A x64 -DSW_BUILD=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_TOOLCHAIN_FILE=%VCPKG_HOME%\scripts\buildsystems\vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x64-windows-static -DCMAKE_INSTALL_PREFIX=..\..\build\x64
cmake --build . --config Release --target install
```
4. Build Tesseract:
Expand All @@ -36,13 +36,13 @@ linking leptonica into tesseract which increases file size (since the leptonica
```
git clone https://github.com/tesseract-ocr/tesseract.git
cd tesserct
git checkout -b 4.1.1 4.1.1
mkdir vs16-x86 & cd vs16-x86
cmake .. -G "Visual Studio 16 2019" -A Win32 -DAUTO_OPTIMIZE=OFF -DSW_BUILD=OFF -DBUILD_TRAINING_TOOLS=OFF -DCMAKE_INSTALL_PREFIX=..\..\build\x86
git checkout -b 5.2.0 5.2.0
mkdir vs17-x86 & cd vs17-x86
cmake .. -G "Visual Studio 17 2022" -A Win32 -DAUTO_OPTIMIZE=OFF -DSW_BUILD=OFF -DBUILD_TRAINING_TOOLS=OFF -DCMAKE_INSTALL_PREFIX=..\..\build\x86
cmake --build . --config Release --target install
cd ..
mkdir vs16-x64 & cd vs16-x64
cmake .. -G "Visual Studio 16 2019" -A x64 -DAUTO_OPTIMIZE=OFF -DSW_BUILD=OFF -DBUILD_TRAINING_TOOLS=OFF -DCMAKE_INSTALL_PREFIX=..\..\build\x64
mkdir vs17-x64 & cd vs17-x64
cmake .. -G "Visual Studio 17 2022" -A x64 -DAUTO_OPTIMIZE=OFF -DSW_BUILD=OFF -DBUILD_TRAINING_TOOLS=OFF -DCMAKE_INSTALL_PREFIX=..\..\build\x64
cmake --build . --config Release --target install
```

Expand Down
59 changes: 59 additions & 0 deletions docs/Compling_tesseract_and_leptonica.md.bak
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Compling tesseract and leptonica.md
* [Index](./ReadMe.md)

## Notes
Build instructions for Tesseract 4.1.1 and leptonica 1.80.0. Please note that build systems do change so while the following
has been tested with the listed versions building against any other versions including master may not work as expected and
aren't supported.

The following also differ from [[Compiling-Tesseract-and-Leptonica]] in that they use vcpkg to manage the dependencies.
The main benefit of this is that it's possible to compile tesseract against the leptonica dll rather than statically
linking leptonica into tesseract which increases file size (since the leptonica dll is still required).

1. Install Visual Studio 2022
2. Install CMake (ensure it's on your path)
3. Install [vcpkg](https://github.com/Microsoft/vcpkg/)
* Note: I also set an environment variable VCPKG_HOME to this directory and added it to path for convenience

4. Build Leptonica:

```
vcpkg install giflib:x86-windows-static libjpeg-turbo:x86-windows-static liblzma:x86-windows-static libpng:x86-windows-static tiff:x86-windows-static zlib:x86-windows-static
vcpkg install giflib:x64-windows-static libjpeg-turbo:x64-windows-static liblzma:x64-windows-static libpng:x64-windows-static tiff:x64-windows-static zlib:x64-windows-static
git clone https://github.com/DanBloomberg/leptonica.git & cd leptonica
git checkout -b 1.82.0 1.82.0
mkdir vs16-x86 & cd vs16-x86
cmake .. -G "Visual Studio 17 2022" -A Win32 -DSW_BUILD=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_TOOLCHAIN_FILE=%VCPKG_HOME%\scripts\buildsystems\vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x86-windows-static -DCMAKE_INSTALL_PREFIX=..\..\build\x86
cmake --build . --config Release --target install
cd ..
mkdir vs16-x64 & cd vs16-x64
cmake .. -G "Visual Studio 17 2022" -A x64 -DSW_BUILD=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_TOOLCHAIN_FILE=%VCPKG_HOME%\scripts\buildsystems\vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x64-windows-static -DCMAKE_INSTALL_PREFIX=..\..\build\x64
cmake --build . --config Release --target install
```
4. Build Tesseract:


```
git clone https://github.com/tesseract-ocr/tesseract.git
cd tesserct
git checkout -b 5.2.0 5.2.0
mkdir vs17-x86 & cd vs17-x86
cmake .. -G "Visual Studio 17 2022" -A Win32 -DAUTO_OPTIMIZE=OFF -DSW_BUILD=OFF -DBUILD_TRAINING_TOOLS=OFF -DCMAKE_INSTALL_PREFIX=..\..\build\x86
cmake --build . --config Release --target install
cd ..
mkdir vs17-x64 & cd vs17-x64
cmake .. -G "Visual Studio 17 2022" -A x64 -DAUTO_OPTIMIZE=OFF -DSW_BUILD=OFF -DBUILD_TRAINING_TOOLS=OFF -DCMAKE_INSTALL_PREFIX=..\..\build\x64
cmake --build . --config Release --target install
```

### Leptonica Notes:

* Leptonica now needs to be built to use shared libraries (dlls) explicitly, this is accomplished by setting the ``BUILD_SHARED_LIBS`` to ``ON`` (``-DBUILD_SHARED_LIBS=ON``)
* Using [Self build](https://github.com/SoftwareNetwork/sw) hasn't been tested and is disabled using ``SW_BUILD=OFF``.

### Tesseract Notes:

* For portability architecture optimizations have been disabled using ``-DAUTO_OPTIMIZE=OFF`.
This however will disable platform specific optimizations (AVX, SSE4.1, etc) which would likely
result in better performance if your guarantied they will be available.
* Like leptonica Self Build has also been disabled using ``-DSW_BUILD=OFF``.
Empty file added src/InternalTrace.3044.log
Empty file.
Empty file added src/InternalTrace.3144.log
Empty file.
Empty file added src/InternalTrace.3536.log
Empty file.
Empty file added src/InternalTrace.7132.log
Empty file.
Empty file added src/InternalTrace.8476.log
Empty file.
6 changes: 3 additions & 3 deletions src/Tesseract.Drawing/Tesseract.Drawing.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@
<PackageProjectUrl>https://github.com/charlesw/tesseract/</PackageProjectUrl>
<RepositoryUrl>https://github.com/charlesw/tesseract/</RepositoryUrl>
<PackageTags>Tesseract Ocr</PackageTags>
<Version>4.1.1</Version>
<AssemblyVersion>4.1.1.0</AssemblyVersion>
<Version>5.2.0</Version>
<AssemblyVersion>5.2.0</AssemblyVersion>
<NeutralLanguage></NeutralLanguage>
<PackageLicenseExpression>Apache-2.0</PackageLicenseExpression>
<RootNamespace>Tesseract</RootNamespace>
Expand All @@ -35,7 +35,7 @@


<ItemGroup>
<PackageReference Include="System.Drawing.Common" Version="5.0.0" />
<PackageReference Include="System.Drawing.Common" Version="6.0.0" />
</ItemGroup>

<ItemGroup>
Expand Down
18 changes: 9 additions & 9 deletions src/Tesseract.Net48Tests/Tesseract.Net48Tests.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@
<Compile Include="..\Tesseract.Tests\Leptonica\PixATests.cs" Link="Leptonica\PixATests.cs" />
<Compile Include="..\Tesseract.Tests\Leptonica\PixTests\ImageManipulationTests.cs" Link="Leptonica\PixTests\ImageManipulationTests.cs" />
<Compile Include="..\Tesseract.Tests\Leptonica\PixTests\PixDataAccessTests.cs" Link="Leptonica\PixTests\PixDataAccessTests.cs" />
<Compile Include="..\Tesseract.Tests\ResultIteratorTests\FontAttributesTests.cs" Link="ResultIteratorTests\FontAttributesTests.cs" />
<Compile Include="..\Tesseract.Tests\ResultIteratorTests\OfAnEmptyPixTests.cs" Link="ResultIteratorTests\OfAnEmptyPixTests.cs" />
<Compile Include="..\Tesseract.Tests\ResultIteratorTests\FontAttributesTests.cs" Link="ResultIteratorTests\FontAttributesTests.cs" />
<Compile Include="..\Tesseract.Tests\ResultIteratorTests\OfAnEmptyPixTests.cs" Link="ResultIteratorTests\OfAnEmptyPixTests.cs" />
<Compile Include="..\Tesseract.Tests\PageSerializer.cs" Link="PageSerializer.cs" />
<Compile Include="..\Tesseract.Tests\ResultRendererTests.cs" Link="ResultRendererTests.cs" />
<Compile Include="..\Tesseract.Tests\TesseractResultSet.cs" Link="TesseractResultSet.cs" />
Expand All @@ -40,12 +40,12 @@
</ItemGroup>

<ItemGroup>
<PackageReference Include="nunit" Version="3.12.0" />
<PackageReference Include="NUnit3TestAdapter" Version="3.17.0">
<PackageReference Include="nunit" Version="3.13.3" />
<PackageReference Include="NUnit3TestAdapter" Version="4.3.0">
<PrivateAssets>all</PrivateAssets>
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
</PackageReference>
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="16.8.0" />
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.3.2" />
</ItemGroup>

<ItemGroup>
Expand Down Expand Up @@ -210,12 +210,12 @@
</ItemGroup>

<Target Name="SymlinkLinuxDependencies" AfterTargets="AfterBuild" Condition=" '$([System.Runtime.InteropServices.RuntimeInformation]::IsOSPlatform($([System.Runtime.InteropServices.OSPlatform]::Linux)))' ">
<Exec Command="ln -sf /usr/lib/x86_64-linux-gnu/liblept.so $(OutDir)x64/libleptonica-1.80.0.so"/>
<Exec Command="ln -sf /usr/lib/x86_64-linux-gnu/libtesseract.so.4 $(OutDir)x64/libtesseract41.so"/>
<Exec Command="ln -sf /usr/lib/x86_64-linux-gnu/liblept.so $(OutDir)x64/libleptonica-1.80.0.so" />
<Exec Command="ln -sf /usr/lib/x86_64-linux-gnu/libtesseract.so.4 $(OutDir)x64/libtesseract41.so" />
</Target>

<Target Name="SymlinkMacOSDependencies" AfterTargets="AfterBuild" Condition=" '$([System.Runtime.InteropServices.RuntimeInformation]::IsOSPlatform($([System.Runtime.InteropServices.OSPlatform]::OSX)))' ">
<Exec Command="ln -sf /usr/local/lib/liblept.dylib $(OutDir)x64/libleptonica-1.80.0.dylib"/>
<Exec Command="ln -sf /usr/local/lib/libtesseract.dylib $(OutDir)x64/libtesseract41.dylib"/>
<Exec Command="ln -sf /usr/local/lib/liblept.dylib $(OutDir)x64/libleptonica-1.80.0.dylib" />
<Exec Command="ln -sf /usr/local/lib/libtesseract.dylib $(OutDir)x64/libtesseract41.dylib" />
</Target>
</Project>
6 changes: 3 additions & 3 deletions src/Tesseract.NetCore31Tests/Tesseract.NetCore31Tests.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,9 @@
</ItemGroup>

<ItemGroup>
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="16.8.0" />
<PackageReference Include="nunit" Version="3.12.0" />
<PackageReference Include="NUnit3TestAdapter" Version="3.17.0">
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.3.2" />
<PackageReference Include="nunit" Version="3.13.3" />
<PackageReference Include="NUnit3TestAdapter" Version="4.3.0">
<PrivateAssets>all</PrivateAssets>
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
</PackageReference>
Expand Down
13 changes: 12 additions & 1 deletion src/Tesseract.Tests/Leptonica/ColorTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ public void Color_CastColorToNetColor()
Assert.That(castColor.A, Is.EqualTo(color.Alpha));
}
#endif

[TestCase]
public void Color_ConvertColorToNetColor()
{
Expand All @@ -33,5 +33,16 @@ public void Color_ConvertColorToNetColor()
Assert.That(castColor.B, Is.EqualTo(color.Blue));
Assert.That(castColor.A, Is.EqualTo(color.Alpha));
}

[TestCase]
public void Color_ConvertNetColorToColor()
{
var color = System.Drawing.Color.FromArgb(100, 150, 200);
var castColor = color.ToPixColor();
Assert.That(color.R, Is.EqualTo(castColor.Red));
Assert.That(color.G, Is.EqualTo(castColor.Green));
Assert.That(color.B, Is.EqualTo(castColor.Blue));
Assert.That(color.A, Is.EqualTo(castColor.Alpha));
}
}
}
4 changes: 2 additions & 2 deletions src/Tesseract.sln
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio Version 16
VisualStudioVersion = 16.0.29409.12
# Visual Studio Version 17
VisualStudioVersion = 17.3.32929.385
MinimumVisualStudioVersion = 10.0.40219.1
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Tesseract", "Tesseract\Tesseract.csproj", "{AB8F7CF1-E75B-4BD3-8853-2348ECDEA969}"
EndProject
Expand Down
20 changes: 8 additions & 12 deletions src/Tesseract/Tesseract.csproj
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFrameworks>netstandard2.0;net40;net45;net48</TargetFrameworks>
<TargetFrameworks>netstandard2.0;net47;net48</TargetFrameworks>
</PropertyGroup>
<PropertyGroup>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
Expand All @@ -9,25 +9,21 @@
<Authors>Charles Weld</Authors>
<Company />
<Product>Tesseract</Product>
<Description>Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.</Description>
<Description>Tesseract 5 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.</Description>
<Copyright>Copyright 2012-2020 Charles Weld</Copyright>
<PackageReleaseNotes>https://github.com/charlesw/tesseract/blob/master/ChangeLog.md</PackageReleaseNotes>
<PackageProjectUrl>https://github.com/charlesw/tesseract/</PackageProjectUrl>
<RepositoryUrl>https://github.com/charlesw/tesseract/</RepositoryUrl>
<PackageTags>Tesseract Ocr</PackageTags>
<Version>4.1.1</Version>
<AssemblyVersion>4.1.1.0</AssemblyVersion>
<Version>5.2.0</Version>
<AssemblyVersion>5.2.0</AssemblyVersion>
<NeutralLanguage></NeutralLanguage>
<PackageLicenseExpression>Apache-2.0</PackageLicenseExpression>
</PropertyGroup>
<!-- .NET 4.0 references, compilation flags and build options -->
<PropertyGroup Condition=" '$(TargetFramework)' == 'net40'">
<DefineConstants>NET40;NETFULL;SYSTEM_DRAWING_SUPPORT</DefineConstants>
</PropertyGroup>

<!-- .NET 4.5 references, compilation flags and build options -->
<PropertyGroup Condition=" '$(TargetFramework)' == 'net45'">
<DefineConstants>NET45;NETFULL;SYSTEM_DRAWING_SUPPORT</DefineConstants>
<!-- .NET 4.7 references, compilation flags and build options -->
<PropertyGroup Condition=" '$(TargetFramework)' == 'net47'">
<DefineConstants>NET47;NETFULL;SYSTEM_DRAWING_SUPPORT</DefineConstants>
</PropertyGroup>

<!-- .NET 4.8 references, compilation flags and build options -->
Expand Down Expand Up @@ -71,4 +67,4 @@
<CopyToOutputDirectory>Never</CopyToOutputDirectory>
</None>
</ItemGroup>
</Project>
</Project>
Binary file modified src/Tesseract/x64/leptonica-1.82.0.dll
Binary file not shown.
Binary file modified src/Tesseract/x64/tesseract.exe
Binary file not shown.
Binary file modified src/Tesseract/x86/leptonica-1.82.0.dll
Binary file not shown.
Binary file modified src/Tesseract/x86/tesseract.exe
Binary file not shown.

0 comments on commit 4a7e1db

Please sign in to comment.