Skip to content

tesseract-ocr/langdata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

bc528bf · Mar 9, 2024
Jun 24, 2015
Jul 24, 2021
Jun 24, 2015
Jan 13, 2017
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Aug 15, 2014
Jun 24, 2015
Oct 12, 2019
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Aug 24, 2021
May 21, 2019
Aug 24, 2021
May 21, 2019
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Oct 2, 2019
Mar 9, 2024
Jan 13, 2017
Jun 24, 2015
Jan 13, 2017
Feb 21, 2017
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Jan 13, 2017
Jun 24, 2015
Dec 14, 2016
Jun 24, 2015
Jun 24, 2015
Feb 21, 2018
Jun 24, 2015
Oct 12, 2019
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Jul 5, 2018
Jun 24, 2015
Jun 24, 2015
Feb 22, 2018
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Jun 5, 2017
Jun 24, 2015
Jun 24, 2015
Aug 24, 2021
May 21, 2019
Mar 29, 2018
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Oct 2, 2019
Apr 9, 2018
Mar 23, 2018
Jun 24, 2015
Oct 12, 2019
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Jan 13, 2017
Jun 24, 2015
Jun 24, 2015
Feb 21, 2018
Jun 24, 2015
Jun 24, 2015
Jan 13, 2017
Jun 24, 2015
Jan 11, 2017
Dec 8, 2017
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Jan 13, 2017
Jun 24, 2015
Dec 14, 2016
May 14, 2015
Feb 21, 2018
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Jan 13, 2017
Dec 14, 2016
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Mar 8, 2016
Jun 24, 2015
Nov 24, 2018
Jan 13, 2017
Mar 29, 2018
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Feb 20, 2018
Jun 24, 2015
Jun 24, 2015
Dec 30, 2015
Jan 13, 2017
Jun 24, 2015
Jan 13, 2018
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Jun 24, 2015
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Jun 24, 2015
Jul 24, 2021
Jul 24, 2021
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Oct 29, 2015
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Jun 13, 2019
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
May 14, 2015
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Aug 12, 2014
Jun 24, 2015
Aug 12, 2014
Mar 29, 2018
Jul 24, 2021
Aug 12, 2014
Jul 25, 2017

Repository files navigation

langdata

Source training data for Tesseract for lots of languages

Want to re-train tesseract for a specific language, by modifying/augmenting the original training data? Then you have come to the right place!

If you want to find a language data set to run Tesseract, then look at our tessdata repository instead.

To re-create the training of a single language, lang, you need the following:

  • All the data in the lang directory.
  • The corresponding unicharset/xheights files for the script(s) used by lang.
  • All the remaining non-lang-specific files in the top-level directory, such as font_properties.
  • You also need to obtain the fonts needed to train the language. Some languages were trained with commercially available fonts, so you will need to buy them in order to reproduce the training exactly, or use substitutes.