Skip to content

Commit

Permalink
Prepare for release
Browse files Browse the repository at this point in the history
  • Loading branch information
jeroen committed Jan 26, 2018
1 parent 9d57df2 commit 3e272f1
Showing 4 changed files with 20 additions and 9 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: tesseract
Type: Package
Title: Open Source OCR Engine for R
Version: 1.7.9000
Title: Open Source OCR Engine
Version: 1.8
Author: Jeroen Ooms
Maintainer: Jeroen Ooms <[email protected]>
Description: Bindings to 'Tesseract': An OCR engine with unicode (UTF-8) support
2 changes: 1 addition & 1 deletion R/tesseract.R
Original file line number Diff line number Diff line change
@@ -4,7 +4,7 @@
#' are reading. Works best for images with high contrast, little noise and horizontal text.
#'
#' Tesseract uses training data to perform OCR. Most systems default to English
#' training data. To improve OCR performance for other langauges you can to install the
#' training data. To improve OCR performance for other languages you can to install the
#' training data from your distribution. For example to install the spanish training data:
#'
#' - [tesseract-ocr-spa](https://packages.debian.org/testing/tesseract-ocr-spa) (Debian, Ubuntu)
21 changes: 16 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -81,12 +81,23 @@ brew install tesseract
```
Tesseract uses training data to perform OCR. Most systems default to English
training data. To improve OCR performance for other langauges you can to install the
training data from your distribution. For example to install the spanish training data:
training data. To improve OCR results for other langauges you can to install the
appropriate training data. On Windows and OSX you can do this in R using
`tesseract_download()`:
```r
tesseract_download('fra')
```
On Linux you need to install the appropriate training data from your distribution.
For example to install the spanish training data:
- [tesseract-ocr-spa](https://packages.debian.org/testing/tesseract-ocr-spa) (Debian, Ubuntu)
- [tesseract-langpack-spa](https://apps.fedoraproject.org/packages/tesseract-langpack-spa) (Fedora, EPEL)
On other platforms you can manually download training data from [github](https://github.com/tesseract-ocr/tessdata)
and store it in a path on disk that you pass in the `datapath` parameter. Alternatively
you can set a default path via the `TESSDATA_PREFIX` environment variable.
Alternatively you can manually download training data from [github](https://github.com/tesseract-ocr/tessdata)
and store it in a path on disk that you pass in the `datapath` parameter or set a default path via the
`TESSDATA_PREFIX` environment variable. Note that the Tesseract 4 and Tesseract 3 use different
training data format. Make sure to download training data from the branch that matches your libtesseract version.
2 changes: 1 addition & 1 deletion man/tesseract.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 3e272f1

Please sign in to comment.