diff --git "a/User-Projects-\342\200\223-3rdParty.md" "b/User-Projects-\342\200\223-3rdParty.md" index 3e5f0bc..180ef06 100644 --- "a/User-Projects-\342\200\223-3rdParty.md" +++ "b/User-Projects-\342\200\223-3rdParty.md" @@ -1,115 +1,124 @@ # GUIs and Other Projects using Tesseract OCR -## 1. GUIs - -| **Name** | **Linux** | **Mac** | **Windows** | **License** | **Description** | -|:---------|:----------|:--------|:------------|:------------|:----------------| -| [normcap](https://github.com/dynobo/normcap) | X | X | X | GPL v3 | OCR powered screen-capture tool to capture information instead of images. | -| [gImageReader](https://github.com/manisandro/gImageReader) | X | | X | GPL v3 | A graphical GTK frontend to tesseract-ocr | -| [VietOCR](http://vietocr.sourceforge.net/) | X | X | X | Apache 2.0 | A GUI frontend for Tesseract OCR engine. Supports optical character recognition for Vietnamese and other languages supported by Tesseract | -| [NeOCR](https://sourceforge.net/projects/ne-ocr/) | | | X | Freeware | A GUI frontend for Tesseract 4.0 OCR engine. | -| [Free-Ocr-Windows-Desktop](https://github.com/A9T9/Free-Ocr-Windows-Desktop/releases)| | | X | [GNU AGPL v3](https://github.com/A9T9/Free-Ocr-Windows-Desktop/blob/master/LICENSE) | Free OCR application for the Windows Desktop - Essentially a graphical user interface (GUI) for the Tesseract OCR engine. The application also includes support for reading and scanned PDF files | -| [YAGF](http://sourceforge.net/projects/yagf-ocr/) | X | | | GPL v3 | A graphical front-end for cuneiform and tesseract | -| [OCR2Text](https://github.com/writecrow/ocr2text) | X | X | X | MIT | CLI tool for batch-processing PDF to TXT | -| [OCRFeeder](https://wiki.gnome.org/action/show/Apps/OCRFeeder) | X | | | GPL v3 | OCRFeeder is a document layout analysis and optical character recognition system | -| [Lector](https://github.com/zdenop/lector) | X | | X | GPL v2 | A graphical ocr solution for GNU/Linux based on Python, Qt4 and Tesseract OCR | -| [Tesseract-OCR QT4 gui](https://github.com/zdenop/tesseract-ocr-qt4gui) | X | | | Apache 2.0 | Tesseract-OCR QT4 gui is a simple GUI for tesseract | -| [Lime OCR](http://code.google.com/p/lime-ocr/) | | | X | GPL v3 | A simple, free OCR software for Windows using tesseract-ocr engine | -| [Ocrivist](http://www.ocrivist.com/) | X | | | GPL v3 | Ocrivist is a utility which makes it possible to scan and OCR books and other printed documents to PDF or Djvu format | -| [Tesseract-GUI](http://tesseract-gui.sourceforge.net) | X | | | GPL v2 | Tessract-GUI is not a front-end for tesseract-ocr, it is just a graphical way to use it with simple image manipulation through ImageMagick | -| [QTesseract](http://code.google.com/p/qtesseract/) | X | | | LGPL v3 | QT GUI for the Tesseract OCR | -| [dpScreenOCR](https://danpla.github.io/dpscreenocr/) | X | | X | zlib | Program to recognize text on screen | -| [pmOCR](http://github.com/deajan/pmocr/) | X | | | BSD | Batch OCR tool, also file monitor event OCR with tesseract | -| [tesseract4java](http://github.com/tesseract4java/tesseract4java) | X | X | X | GPLv3 | A cross-platform GUI for training and running Tesseract with advanced features like batch recognition and accuracy evaluation | -| [Linux-Intelligent-OCR-Solution(lios)](https://github.com/Nalin-x-Linux/lios-3.git) | X | | | GPLv3 | A GUI for scanning, running and training Tesseract with total accessibility for visually impaired and advanced features like Scanner Brightness optimizer, Text-Cleaner, etc | -[SunnyPage OCR](http://www.sunnypage.ge/en/) | | | X | Proprietary | A GUI frontend for Tesseract OCR engine with automatic adjustment of image brightness, image processing and PDF support.| -| [PDF OCR X](http://solutions.weblite.ca/pdfocrx/) | | X | X | Proprietary | PDF OCR is a simple drag-and-drop utility for Mac OS X and Windows, that converts your PDFs and images into text documents or searchable PDF files | -| [TaxWorkFlow](https://thetaxworkflow.com)| | | X | Proprietary | TaxWorkFlow is an accounting practice management application that includes GUI frontend for Tesseract OCR engine. The app supports AVX and allows to create OCR'ed PDF files of selected resolution and compression from PDF files and 100+ image file formats. | -| [AmhOCR](https://github.com/KumnegerH/AmhOCR) | | | X | GPLv3 | Tesseract Powered Windows Desktop OCR Application With Multiple Pre/Post Processing GUI | -| [TesseractStudio.Net Github](https://github.com/OpaitSoftware/TesseractStudio.Net) | | | X | Proprietary | (Exe, SourceCode Not Available,Site Urls are Dead) A graphical interface to tesseract 4.0 | -| [TesseractStudio.Net](https://www.opait.com/TessStudio/index.html) | | | X | Proprietary | A graphical interface to tesseract 4.0 | - - -## 2. Online OCR services - - * [OCR.net](http://ocr.net): Powered by PDF OCR X in back-end. Converts PDFs and Images to Text or searchable PDF. - * [WeOCR](http://ocr1.sc.isc.tohoku.ac.jp/e1/): is a platform for Web-enabled OCR (Optical Character Reader/Recognition) systems that enables people to use character recognition over networks - * [CustomOCR](http://www.customocr.com/index.php?r=site/page&view=demos.tesseract_ocr) - * [Free OCR](http://www.free-ocr.com/) - * [i2OCR](http://www.i2ocr.com/) - * [Indic-OCR OCR Service](https://indic-ocr.github.io/ocrservice/) An online OCR service for Indian languages - -## 3. Mobile - - * _Android_: - * [tess-two](https://github.com/rmtheis/tess-two) - A fork of Tesseract Tools for Android [tesseract-android-tools](http://code.google.com/p/tesseract-android-tools/) that adds some additional functions. - * [Tesseract4Android](https://github.com/adaptech-cz/Tesseract4Android) - A fork of [tess-two](https://github.com/rmtheis/tess-two) rewritten from scratch to support latest version of Tesseract OCR. - * [textfairy](https://play.google.com/store/apps/details?id=com.renard.ocr) Android OCR App with source code at [github.com](https://github.com/renard314/textfairy) - * [Character Recognition](https://play.google.com/store/apps/details?id=org.atai.TessUI#?t=W251bGwsMSwxLDUwMSwib3JnLmF0YWkuVGVzc1VJIl0.) Android OCR App with source code at [gitorious.org](https://gitorious.org/character-recognition/) - * [tesseract-android-tools](http://code.google.com/p/tesseract-android-tools/): set of Android APIs (archived in Google Code Archive at 2013-01-28) - * [Mobile OCR](http://code.google.com/p/mobileocr/): The goal of Mobile OCR is to create an application for the Android platform that will recognize text from an image taken by the phone's camera. The application will be fully accessible to low vision and blind users - * [Across India](https://indic-ocr.github.io/acrossindia/): An app which lets users take pictures of sign boards in Indian Languages or English and transliterate it to the language that they can read. - - * _iOS_: - * [Tesseract-OCR-iOS](https://github.com/gali8/Tesseract-OCR-iOS) - Tesseract OCR iOS is a Framework for iOS7+, compiled also for armv7s and arm64. - * [OCR-iOS-Example](https://github.com/robmathews/OCR-iOS-Example) - a simple example of how to do optical character recognition (OCR) on iOS. - * [Tesseract-iPhone-Demo ](https://github.com/nolanbrown/Tesseract-iPhone-Demo) - example based on tesseract 2.04. - * _More OS_: - * [ScanBizCards](http://www.scanbizcards.com): Mobile solution for business card scanning. _Requirements:_ iPhone 4/iPhone 3/Android 2.0 - - * _macOS_: - * [Tesseract macOS](https://github.com/scott0123/Tesseract-macOS) - Tesseract OCR framework for macOS, supporting both Objective C and swift. Compiled for both x86 and arm64. +## 1. Desktop Applications + +| **Name** | **Platforms** | **License** | **Description** | +|:---------|:------------|:------------|:----------------| +| **(a9t9) Free OCR for Windows Desktop** [[website](https://ocr.space/blog/p/free-ocr-windows.html)] [[GitHub](https://github.com/A9T9/Free-Ocr-Windows-Desktop/)]| Windows | [GNU AGPL v3](https://github.com/A9T9/Free-Ocr-Windows-Desktop/blob/master/LICENSE) | Free OCR application for the Windows Desktop - Essentially a graphical user interface (GUI) for the Tesseract OCR engine. The application also includes support for reading and scanned PDF files | +| **AmhOCR** [[Github](https://github.com/KumnegerH/AmhOCR)] | Windows | GPLv3 | Tesseract Powered Windows Desktop OCR Application With Multiple Pre/Post Processing GUI | +| **dpScreenOCR** [[website](https://danpla.github.io/dpscreenocr/)] [[GitHub](https://github.com/danpla/dpscreenocr/)] | Linux, Windows | zlib | Program to recognize text on screen | +| **gImageReader** [[GitHub](https://github.com/manisandro/gImageReader)] | Linux, Windows | GPL v3 | A graphical GTK frontend to tesseract-ocr | +| **Lector** [[GitHub](https://github.com/zdenop/lector)] | Linux, Windows | GPL v2 | A graphical ocr solution for GNU/Linux based on Python, Qt4 and Tesseract OCR | +| **Lime OCR** [[Google Code](http://code.google.com/p/lime-ocr/)] | Windows | GPL v3 | A simple, free OCR software for Windows using tesseract-ocr engine | +| **Linux-Intelligent-OCR-Solution (lios)** [[GitHub](https://github.com/Nalin-x-Linux/lios-3)] | Linux | GPLv3 | A GUI for scanning, running and training Tesseract with total accessibility for visually impaired and advanced features like Scanner Brightness optimizer, Text-Cleaner, etc | +| **NeOCR** [[SourceForge](https://sourceforge.net/projects/ne-ocr/)] | Windows | Freeware | A GUI frontend for Tesseract 4.0 OCR engine. | +| **NormCap** [[website](https://dynobo.github.io/normcap/)] [[GitHub](https://github.com/dynobo/normcap)] | Linux, macOS, Windows | GPL v3 | OCR-powered screen-capture tool to capture information instead of images. | +| **OCR2Text** [[GitHub](https://github.com/writecrow/ocr2text)] | Linux, macOS, Windows | MIT | CLI tool for batch-processing PDF to TXT | +| **OCRFeeder** [[Wiki](https://wiki.gnome.org/action/show/Apps/OCRFeeder)] [[Gitlab](https://gitlab.gnome.org/GNOME/ocrfeeder)] | Linux | GPL v3 | OCRFeeder is a document layout analysis and optical character recognition system | +| **Ocrivist** [[Google Code](https://code.google.com/archive/p/ocrivist/)] | Linux | GPL v3 | Ocrivist is a utility which makes it possible to scan and OCR books and other printed documents to PDF or Djvu format | +| **Opait TessPro** [[website](https://www.opait.com/TessPro/)] | Windows | [Proprietary](https://github.com/OpaitSoftware/TesseractStudio.Net/blob/master/LICENSE.md) | A graphical interface to tesseract 4.0 | +| **Opait TessStudio** [[website](https://www.opait.com/TessStudio/)] | Windows | [Freeware](https://github.com/OpaitSoftware/TesseractStudio.Net/blob/master/LICENSE.md) | A limited, freeware version of Opait TessPro | +| **PDF OCR X** [[website](http://solutions.weblite.ca/pdfocrx/)] | macOS, Windows | Proprietary | PDF OCR is a simple drag-and-drop utility for Mac OS X and Windows, that converts your PDFs and images into text documents or searchable PDF files | +| **pmOCR** [[GitHub](http://github.com/deajan/pmocr/)] | Linux | BSD | Batch OCR tool, also file monitor event OCR with tesseract | +| **QTesseract** [[Google Code](http://code.google.com/p/qtesseract/)] | Linux | LGPL v3 | QT GUI for the Tesseract OCR | +| **SunnyPages OCR** [[website](http://www.sunnypages.eu/)] | Windows | Proprietary | A GUI frontend for Tesseract OCR engine with automatic adjustment of image brightness, image processing and PDF support. | +| **TaxWorkFlow** [[website](https://thetaxworkflow.com)] | Windows| Proprietary | TaxWorkFlow is an accounting practice management application that includes GUI frontend for Tesseract OCR engine. The app supports AVX and allows to create OCR'ed PDF files of selected resolution and compression from PDF files and 100+ image file formats. | +| **tesseract4java** [[GitHub](http://github.com/tesseract4java/tesseract4java)] | Linux, macOS, Windows | GPLv3 | A cross-platform GUI for training and running Tesseract with advanced features like batch recognition and accuracy evaluation | +| **Tesseract-GUI** [[SourceForge](http://tesseract-gui.sourceforge.net)] | Linux | GPL v2 | Tessract-GUI is not a front-end for tesseract-ocr, it is just a graphical way to use it with simple image manipulation through ImageMagick | +| **Tesseract-OCR QT4 gui** [[GitHub](https://github.com/zdenop/tesseract-ocr-qt4gui)] | Linux | Apache 2.0 | Tesseract-OCR QT4 gui is a simple GUI for tesseract | +| **VietOCR** [[SourceForge](http://vietocr.sourceforge.net/)] | Linux, macOS, Windows | Apache 2.0 | A GUI frontend for Tesseract OCR engine. Supports optical character recognition for Vietnamese and other languages supported by Tesseract | +| **YAGF** [[Sourceforge](http://sourceforge.net/projects/yagf-ocr/)] | Linux | GPL v3 | A graphical front-end for cuneiform and tesseract | +| **gscan2pdf** [[SourceForge](http://gscan2pdf.sourceforge.net/)] | Linux | GPL v3 | A Gtk3 application for producing PDFs or DjVu documents from scans | +| **Bindery** [[website](https://blender3d.github.io/Bindery/)] [[GitHub](https://github.com/Blender3D/Bindery)] | Linux, Windows | Unknown | A PyQt4 application for binding processed images from Scan Tailor into PDFs or DjVu documents | + +## 2. Web Applications + +### Hosted Only + + * **OCR.net** [[website](http://ocr.net)] Converts PDFs and Images to Text or searchable PDF. Powered by PDF OCR X in back-end. + * **WeOCR** [[website](http://ocr1.sc.isc.tohoku.ac.jp/e1/)] a platform for Web-enabled OCR (Optical Character Reader/Recognition) systems that enables people to use character recognition over networks + * **CustomOCR** [[website](http://www.customocr.com/index.php?r=site/page&view=demos.tesseract_ocr)] + * **Free OCR** [[website](http://www.free-ocr.com/)] + * **i2OCR** [[website](http://www.i2ocr.com/)] + * **Indic-OCR OCR Service** [[website](https://indic-ocr.github.io/ocrservice/)] [[GitHub](https://github.com/indic-ocr/ocrservice/)] An online OCR service for Indian languages + +### Self-Hosted + + 1. **Simple OCR Web Server** [[GitHub](https://github.com/ybur-yug/python_ocr_tutorial)] using python, flask, tesseract-ocr, and leptonica + 1. **OpenOCR** [[GitHub](https://github.com/tleyden/open-ocr)] makes it simple to host your own OCR REST API. + 1. **tesseract-web-service** [[GitHub](https://github.com/guitarmind/tesseract-web-service)] is An implementation of RESTful web service for tesseract-OCR using tornado + +## 3. Mobile Applications + +| **Name** | **Platform** | **Description** | +|:---------|:------------|:----------------| +**NeOCR** [[Google Play](https://play.google.com/store/apps/details?id=np.edu.ku.ilprl.neocr) | Android | Also available for Windows | +| **tess-two** [[GitHub](https://github.com/rmtheis/tess-two)] | Android | A fork of **Tesseract Tools for Android** [[Google Code](http://code.google.com/p/tesseract-android-tools/)] with additional functions | +| **Text Fairy** [[Google Play](https://play.google.com/store/apps/details?id=com.renard.ocr)] [[GitHub](https://github.com/renard314/textfairy)] | Android | | +| **Character Recognition** [[Gitorious](https://gitorious.org/character-recognition/)] | Android | | +| **Mobile OCR** [[Google Code](http://code.google.com/p/mobileocr/)] | Android | Recognizes photo text for accessibility by low vision and blind users | +| **Across India** [[website](https://indic-ocr.github.io/acrossindia/)] [[GitHub](https://github.com/indic-ocr/acrossindia/)] | Android | Transliterates photo text in Indian languages or English between different scripts | +| **OCR-iOS-Example** [[GitHub](https://github.com/robmathews/OCR-iOS-Example)] | iOS | Example application | +| **Tesseract-iPhone-Demo** [[GitHub](https://github.com/nolanbrown/Tesseract-iPhone-Demo)] | iOS | Example based on tesseract 2.04. +| **ScanBizCards** [[GitHub](http://www.scanbizcards.com)] | Android, iOS | Business card scanning | +| **My Expenses** [[F-Droid](https://f-droid.org/en/packages/org.totschnig.myexpenses/)] [[GitHub](https://github.com/mtotschnig/MyExpenses)] | Android | Tracks income and expenses; uses **Totschnig OCR** for scanning receipts +| **Shubham Tyagi OCR** [[F-Droid](https://f-droid.org/packages/io.github.subhamtyagi.ocr/)] [[GitHub](https://github.com/SubhamTyagi/android-ocr)] | Android | based on Tesseract 5 using **Tesseract4Android** | + ## 4. Others (Utilities, Tools, Command-Line Interfaces [CLI], etc) ### A. PDF to Searchable PDF tools (ie: any tool which can also handle a non-searchable PDF as an input): - 1. [OCRmyPDF](https://github.com/jbarlow83/OCRmyPDF) - Adds OCR text layer to scanned PDF files and images, allowing them to be searched. Processes pages in parallel on multi-core CPUs. Keeps exact resolution of original embedded images without recompressing JPEGs, when possible. Includes image several preprocessing options, detailed documentation, and support for many exotic PDFs. - 1. [pdf2pdfocr](https://github.com/LeoFCardoso/pdf2pdfocr) is a tool to OCR a PDF (or supported images) and add a text layer in the original file making it a searchable PDF. It is a python script that uses tesseract and other open source tools. Linux, macOS and Windows supported. - 1. [pdf2searchablepdf](https://github.com/ElectricRCAircraftGuy/PDF2SearchablePDF) - a tool which allows converting any non-searchable PDF, OR any entire directory of images, to a searchable PDF + 1. **OCRmyPDF** [[GitHub](https://github.com/jbarlow83/OCRmyPDF)] Adds OCR text layer to scanned PDF files and images, allowing them to be searched. Processes pages in parallel on multi-core CPUs. Keeps exact resolution of original embedded images without recompressing JPEGs, when possible. Includes image several preprocessing options, detailed documentation, and support for many exotic PDFs. + 1. **pdf2pdfocr** [[GitHub](https://github.com/LeoFCardoso/pdf2pdfocr)] A tool to OCR a PDF (or supported images) and add a text layer in the original file making it a searchable PDF. It is a python script that uses tesseract and other open source tools. Linux, macOS and Windows supported. + 1. **pdf2searchablepdf** [[GitHub](https://github.com/ElectricRCAircraftGuy/PDF2SearchablePDF)] A tool for converting non-searchable PDFs or directories of images to searchable PDFs -### B. Others: - 1. [ocr-fileformat](https://github.com/UB-Mannheim/ocr-fileformat) - Validate and transform between OCR file formats (hOCR, ALTO, PAGE, FineReader) - 1. [Tess4J](https://github.com/nguyenq/tess4j) - A Java JNA wrapper for Tesseract OCR API. - 1. [Traineddata inspector](https://mazoea.com/te/traineddata/) - to inspect some of the internals of traineddata files - 1. [TopOCR](http://www.topocr.com/) - high Quality OCR for Cameras with tesseract-ocr support (paid product) - 1. [Simple OCR Web Server](https://github.com/ybur-yug/python_ocr_tutorial) using python, flask, tesseract-ocr, and leptonica - 1. [Display OCR](https://github.com/arturaugusto/display_ocr) is OpenCV-Python + python-tesseract real-time image preprocess and OCR of 7 segments font. - 1. [OpenOCR](https://github.com/tleyden/open-ocr) makes it simple to host your own OCR REST API. - 1. https://github.com/guitarmind/tesseract-web-service is An implementation of RESTful web service for tesseract-OCR using tornado - 1. [RasterEdge .NET Image SDK - OCR Recognition](http://www.rasteredge.com/dotnet-imaging/addon-ocr-sdk/) is robust, high-performance recognition application of royalty-free distribution for desktop or server applications. - 1. [DevScope OCR SDK](http://www.devscope.net/products/DevScopeOCR) is an Optical Character Recognition toolkit engine based on Tesseract OCR v3 that allows to develop applications using Microsoft .NET framework - 1. [Paperwork](https://github.com/jflesch/paperwork) - using OCR to grep dead trees the easy way (requires pyocr) - 1. [Aletheia](http://www.primaresearch.org/tools.php) - An Advanced Document Layout and Text Ground-Truthing System for Production Environments - 1. [gscan2pdf](http://gscan2pdf.sourceforge.net/) a GUI to produce PDFs or DjVus from scanned documents - 1. [Audiveris](http://audiveris.kenai.com/) is an open-source Optical Music Recognition software which processes the image of a music sheet to automatically provide symbolic music information in MusicXML standard. - 1. [Ocrivist](https://code.google.com/p/ocrivist/) is a utility which makes it possible to scan and OCR books and other printed documents to PDF or Djvu format. - 1. [thu-ipv6-login](https://code.google.com/p/thu-ipv6-login/) a python script for IPv6 authentication in Tsinghua University with support for OCR of authcode - 1. [Wolfram Mathematica 9.0](http://www.wolfram.com/mathematica/) use tesseract for [recognizing text](https://groups.google.com/d/msg/tesseract-ocr/NmxFclHcsAE/-KaiT5oJ8oQJ) - 1. [node-dv](https://github.com/creatale/node-dv) is a node.js library for processing and understanding scanned documents - 1. [hocr-tools](https://github.com/tmbdev/hocr-tools) - python tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML. They include hocr-pdf tool for creating searchable pdf. - 1. [PyPDFOCR](https://github.com/virantha/pypdfocr) - Tesseract-OCR based PDF filing - 1. [ChronoScan](http://www.chronoscan.org) is a complete suite for document Scanning & Data Entry - 1. [speedy-ocr](http://www.donaldmarang.org/speedy-ocr.html) utility to simplify scanning and OCR focus to help blind and visually impaired community. It is part of [Vinux project](http://vinuxproject.org). - 1. [Project VIRAL](http://apps.man.poznan.pl/trac/varico/wiki) Varico Invoice Recognition with Assisted Learning - 1. [Bindery](http://blender3d.github.com/Bindery/): A simple GUI for binding post processed scanned pages into digital documents - 1. [Clarify](http://code.google.com/p/clarify): Clarify helps you OCR 'image-only' PDFs. Your input is a PDF that you normally cannot extract text from. The output is text. Clarify is a python module that wraps up tesseract-ocr, xpdf and netpbm. _Requirements:_ python, tesseract-ocr, xpdf, netpbm - 1. [hOcr2Pdf.NET](http://hocrtopdf.codeplex.com/documentation): hOcr2Pdf.NET is a library that programmers can use to create highly compressed, searchable pdf's for applications. _Requirements:_ .NET 2.0 or higher, Tesseract 3.0, JBig2.exe - 1. [PDFBeads](http://rubyforge.org/projects/pdfbeads): convert scanned images to a single searchable PDF file based on hOCR files. _Requirements:_ ruby, RMagick, hpricot - 1. [ExactImage/hocr2pdf](http://www.exactcode.com/site/open_source/exactimage/hocr2pdf/): creates a Searchable PDF from hOCR input. _Requirements:_ libagg - 1. [HocrConverter](https://github.com/jbrinley/HocrConverter): creates PDFs and plain text from hOCR documents. _Requirements:_ python, reportlab - 1. [HocrToPdf.java](http://www.acoveo.com/acoveo/files/HocrToPdf.java): java source for very basic hOCR to PDF converter. Compiled version can be found at project [modi2hocr](http://code.google.com/p/modi2hocr/source/browse/trunk/). _Requirements:_ java, jericho, iText2 - 1. [hOcr2Pdf.NET](http://hocrtopdf.codeplex.com/): is a .NET library to convert .hocr html produced by Tesseract or Cuneiform into searchable pdfs using HtmlAgilityPack and iTextSharp. _Requirements:_ C#. - 1. [Tally-Ho](http://code.google.com/p/tallyho/): Tally-Ho is a screen reader intended for sites like google books - 1. [Mayan EDMS](http://rosarior.github.com/mayan/): Document management system with tesseract as its base - 1. [Olena](http://git.lrde.epita.fr/?p=olena.git;a=summary): a generic and efficient image processing platform (tesseract is used in its part called [scribo](http://git.lrde.epita.fr/?p=olena.git;a=tree)) - 1. [ocrodjvu](http://jwilk.net/software/ocrodjvu) is a wrapper for OCR systems, that allows you to perform OCR on DjVu files - 1. [PaRADIIT](https://sites.google.com/site/paradiitproject/home) (Pattern Redundancy Analysis for Document Image Indexation & Transcription) is a project initiated and sponsored by 2 successive Google DH awards. It aims to turn ancient books, especially from the Renaissance, into accessible digital libraries. - 1. [The ISRI Analytic Tools](https://github.com/eddieantonio/isri-ocr-evaluation-tools) consist of 17 tools for measuring the performance of and experimenting with OCR output. - 1. [Indic Messenger](https://indic-ocr.github.io/indicmessenger) A Facebook chat bot which can OCR images containing Indian/English text and transliterate it to other Indian scripts. - 1. [LibreOCR](https://indic-ocr.github.io/LibreOCR/) A [LibreOffice](http://www.libreoffice.org) extension which can convert an image to OCT and open in the Writer application. - 1. [hertzg/tesseract-server](https://github.com/hertzg/tesseract-server/) A lightweight, docker based, mutli-arch, stateless JSON HTTP API service for tesseract. +### B. Libraries and SDKs + + 1. **Tesseract macOS** [[GitHub](https://github.com/scott0123/Tesseract-macOS)] - Tesseract OCR framework for macOS, supporting both Objective C and swift. Compiled for both x86 and arm64. + 1. **Tess4J** [[GitHub](https://github.com/nguyenq/tess4j)] - A Java JNA wrapper for Tesseract OCR API. + 1. **Tesseract Android Tools** [[Google Code](http://code.google.com/p/tesseract-android-tools/)] | Android | set of Android APIs (archived in Google Code Archive at 2013-01-28) + 1. **Tesseract-OCR-iOS** [[GitHub](https://github.com/gali8/Tesseract-OCR-iOS)] | iOS | Tesseract OCR iOS is a Framework for iOS7+, compiled also for armv7s and arm64. + 1. **Tesseract4Android** [[GitHub](https://github.com/adaptech-cz/Tesseract4Android)] | A fork of **tess-two** [[GitHub](https://github.com/rmtheis/tess-two)] rewritten from scratch to support latest version of Tesseract OCR. + 1. **DevScope OCR SDK** [[website](http://www.devscope.net/products/DevScopeOCR)] A Microsoft .NET framework based on Tesseract v3 + 1. **RasterEdge .NET Image SDK - OCR Recognition** [[website](http://www.rasteredge.com/dotnet-imaging/addon-ocr-sdk/)] is robust, high-performance recognition application of royalty-free distribution for desktop or server applications. + 1. **node-dv** [[GitHub](https://github.com/creatale/node-dv)] A node.js library for processing and understanding scanned documents + 1. **hocr-tools** [[GitHub](https://github.com/tmbdev/hocr-tools)] Python tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML. They include hocr-pdf tool for creating searchable pdf. + 1. **Clarify** [[Google Code](http://code.google.com/p/clarify)] A Python module that outputs text from 'image-only' PDFs. (Wraps tesseract-ocr, xpdf and netpbm. Requires python, tesseract-ocr, xpdf, netpbm.) + 1. **Tesseract Server (OCR over HTTP)** [[GitHub](https://github.com/hertzg/tesseract-server/)] A lightweight, docker based, multi-arch, stateless JSON HTTP API service for tesseract + 1. **Totschnig OCR** [[F-Droid](https://f-droid.org/en/packages/org.totschnig.ocr.tesseract/)] [[GitHub](https://github.com/mtotschnig/OCR)] A library for adding OCR features to Android applications. Currently used by **My Expenses** for scanning receipts. + 1. **hOcr2Pdf.NET** [[website](http://hocrtopdf.codeplex.com/documentation)] A .NET 2.0 library for creating highly compressed, searchable PDFs using Tesseract 3.0. (Requires JBig2.exe) + 1. **PDFBeads** [[RubyForge](http://rubyforge.org/projects/pdfbeads)] A Ruby library that converts scanned images to single searchable PDF file based on hOCR files. _Requirements:_ ruby, RMagick, hpricot + 1. **HocrConverter** [[GitHub](https://github.com/jbrinley/HocrConverter)] A python library that creates PDFs and plain text from hOCR documents. _Requirements:_ python, reportlab + 1. **hOcr2Pdf.NET** [[website](http://hocrtopdf.codeplex.com/)] A .NET library to convert .hocr html produced by Tesseract or Cuneiform into searchable pdfs using HtmlAgilityPack and iTextSharp. _Requirements:_ C#. + +### C. Utilities + + 1. **Traineddata inspector** [[GitHub](https://github.com/mazoea/te-pytraineddata)] - to inspect some of the internals of traineddata files + 1. **ocr-fileformat** [[GitHub](https://github.com/UB-Mannheim/ocr-fileformat)] - Validate and transform between OCR file formats (hOCR, ALTO, PAGE, FineReader) + 1. **TopOCR** [[website](http://www.topocr.com/)] - high Quality OCR for Cameras with tesseract-ocr support (paid product) + 1. **Display OCR** [[GitHub](https://github.com/arturaugusto/display_ocr)] is OpenCV-Python + python-tesseract real-time image preprocess and OCR of 7 segments font. + 1. **Paperwork** [[GitHub](https://github.com/jflesch/paperwork)] - using OCR to grep dead trees the easy way (requires pyocr) + 1. **Aletheia** [[website](http://www.primaresearch.org/tools.php)] - An Advanced Document Layout and Text Ground-Truthing System for Production Environments + 1. **Audiveris** [[website](http://audiveris.kenai.com/)] An open-source Optical Music Recognition software which processes the image of a music sheet to automatically provide symbolic music information in MusicXML standard. + 1. **Ocrivist** [[Google Code](https://code.google.com/p/ocrivist/)] Scan and OCR books and other printed documents to PDF or Djvu format + 1. **thu-ipv6-login** [[Google Code](https://code.google.com/p/thu-ipv6-login/)] A python script for IPv6 authentication in Tsinghua University with support for OCR of authcode + 1. **Wolfram Mathematica** [[website](http://www.wolfram.com/mathematica/)] A technical computing platform that uses Tesseract for text recognition [[source](https://groups.google.com/d/msg/tesseract-ocr/NmxFclHcsAE/-KaiT5oJ8oQJ)] + 1. **PyPDFOCR** [[GitHub](https://github.com/virantha/pypdfocr)] Tesseract-OCR based PDF filing + 1. **ChronoScan** [[website](http://www.chronoscan.org)] A complete suite for document Scanning & Data Entry + 1. **speedy-ocr** [[website](http://www.donaldmarang.org/speedy-ocr.html)] A utility to simplify scanning and OCR focus to help blind and visually impaired community. It is part of [Vinux project](http://vinuxproject.org). + 1. **Project VIRAL** [[website](http://apps.man.poznan.pl/trac/varico/wiki)] **V**arico **I**nvoice **R**ecognition with **A**ssisted **L**earning + 1. **ExactImage/hocr2pdf** [[website](http://www.exactcode.com/site/open_source/exactimage/hocr2pdf/)] creates a Searchable PDF from hOCR input. _Requirements:_ libagg + 1. **HocrToPdf.java** modi2hocr [[website](http://www.acoveo.com/acoveo/files/HocrToPdf.java)] [[Google Code](http://code.google.com/p/modi2hocr/source/browse/trunk/)] An example Java source for very basic hOCR to PDF conversion. _Requirements:_ java, jericho, iText2 + 1. **Tally-Ho** [[Google Code](http://code.google.com/p/tallyho/)] A screen reader intended for sites like Google Books + 1. **Mayan EDMS** [[Gitlab](https://gitlab.com/mayan-edms/mayan-edms)] A document management system with tesseract as its base + 1. **Olena** [[source](http://git.lrde.epita.fr/?p=olena.git;a=summary)] A generic and efficient image processing platform (tesseract is used in its part called [scribo](http://git.lrde.epita.fr/?p=olena.git;a=tree)) + 1. **ocrodjvu** [[website](http://jwilk.net/software/ocrodjvu)] A wrapper for performing OCR on DjVu files + 1. **PaRADIIT** [[Google Code](https://sites.google.com/site/paradiitproject/home)] **Pa**ttern **R**edundancy **A**nalysis for **D**ocument **I**mage **I**ndexation & **T**ranscription is a Google DH project that aims to turn ancient books, especially from the Renaissance, into accessible digital libraries. + 1. **The ISRI Analytic Tools** [[GitHub](https://github.com/eddieantonio/isri-ocr-evaluation-tools)] A set of 17 tools for measuring the performance of and experimenting with OCR output. + 1. **Indic Messenger** [[website](https://indic-ocr.github.io/indicmessenger)] [[GitHub](https://github.com/indic-ocr/indicmessenger)] A Facebook chat bot which can OCR images containing Indian/English text and transliterate it to other Indian scripts. + 1. **LibreOCR** [[website](https://indic-ocr.github.io/LibreOCR/)] [[GitHub](https://github.com/indic-ocr/LibreOCR/)] A [LibreOffice](http://www.libreoffice.org) Writer extension for converting images to OCT ### IMPACT related