Skip to content

This project provides a framework for reading labels in the context of natural history collections.

License

Notifications You must be signed in to change notification settings

NHMDenmark/NHMDlabelreader

Repository files navigation

Python package

Labels reader

This project aims to provide automated reading of natural history labels such as Herbarium labels and archive cards.

The project includes Python scripts and ideas for automated reading of machine typed labels (not yet for handwritten labels) and Data Matrix codes or QR codes.

Requirements

The following must be installed on the system. On macOS, I install this via MacPorts.

For OCR using tesseract:

tesseract
tesseract-dan
tesseract-eng
tesseract-deu
tesseract-lat

For reading PDF files:

imagemagick

Create a virtual environment

python3 -m venv venv

Install requirements via pip into virtual environment

source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

The binary wheels in the pypi repository of the current version 2.2.0 of zxing-cpp has a problem and must therefore be build from the source code package by

pip uninstall zxing-cpp
python -m pip install zxing-cpp==2.2.0 --no-binary zxing-cpp

Development

Testing

To run the tests using pytest do the following from the same directory as this README file.

source venv/bin/activate
pytest tests

Check the output for any failures.

Packaging

To create wheel and source packages ready for distribution do:

source venv/bin/activate
pip install --upgrade -r build_requirements.txt
python -m build

This creates a dist directory with the two package files. To install the wheel file into another virtual environment do

python -m venv venv2
source venv2/bin/activate
pip install --upgrade pip
pip install dist/NHMDlabelreader-0.0.1-py3-none-any.whl

For more instructions on how to configure setup.cfg, see the setuptools quickstart.

We are not currently publishing this package to PiPI. To upload to PyPI follow these instructions.

Github actions

Currently there are two github actions workflow that both need to be activated manually in the repository on github.com. For more advanced workflows see Ole Engstrøms IKPLS repository

Documentation

Additional documentation can be found in docs.

spidercardreader

This script parses archive cards from the Ole Bøggild collection of Danish spiders.

butterflyatlasreader

This script parses a table of taxa from the butterfly atlas book.

csadcardreader

This script attempts to parse information on archive cards from the C-SAD Botany collection at NHMD.

About

This project provides a framework for reading labels in the context of natural history collections.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages