This patch release brings the support of AMP for PyTorch training to docTR along with artefact object detection.

Note: doctr 0.4.1 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

Automatic Mixed Precision (AMP) ⚡

Training scripts with PyTorch back-end now benefit from AMP to reduce the RAM footprint and potentially increase the maximum batch size! This comes especially handy on text detection which require high spatial resolution inputs!

Artefact detection 🛸

Document understanding goes beyond textual elements, as information can be encoded in other visual forms. For this reason, we have extended the range of supported tasks by adding object detection. This will be focused on non-textual elements in documents, including QR codes, barcodes, ID pictures, and logos.

Here are some early results:

This release comes with a training & validation set DocArtefacts, and a reference training script. Keep an eye for models we will be releasing in the next release!

Get more of docTR with Colab tutorials 📖

You've been waiting for it, from now on, we will be adding regularly new tutorials for docTR in the form of jupyter notebooks that you can open and run locally or on Google Colab for instance!

Check the new page in the documentation to have an updated list of all our community notebooks: https://mindee.github.io/doctr/latest/notebooks.html

Breaking changes

Deprecated support of FP16 for datasets

Float-precision can be leveraged in deep learning to decrease the RAM footprint of trainings. The common data type float32 has a lower resolution counterpart float16 which is usually only supported on GPU for common deep learning operations. Initially, we were planning to make all our operations available in both to reduce memory footprint in the end.

However, with the latest development of Deep Learning frameworks, and their Automatic Mixed Precision mechanism, this isn't required anymore and only adds more constraints on the development side. We thus deprecated this feature from our datasets and predictors:

0.4.0	0.4.1
`>>> from doctr.datasets import FUNSD` `>>> ds = FUNSD(train=True, download=True, fp16=True)` `>>> print(getattr(ds, "fp16"))` `True`	`>>> from doctr.datasets import FUNSD` `>>> ds = FUNSD(train=True, download=True)` `>>> print(getattr(ds, "fp16"))` `None`

Detailed changes

New features

Adds Arabic to supported vocabs in #514 (@mzeidhassan)
Adds XML export method to DocumentBuilder in #544 (@felixdittrich92)
Adds flags to control the behaviour with rotated elements in #551 (@charlesmindee)
Adds unittest to ensure headers are correct in #556 (@fg-mindee)
Adds isort ordering & dedicated CI check in #557 (@fg-mindee)
Adds IIIT-5K to supported datasets in #589 (@felixdittrich92)
Adds support of AMP to all PyTorch training scripts in #604 (@fg-mindee)
Adds DocArtefacts dataset for object detection on non-textual elements in #583 (@SiddhantBahuguna)
Speeds up CTC decoding in PyTorch by x10 in #633 (@fg-mindee)
Added train script for artefact detection in #593 (@SiddhantBahuguna)
Added GPU support for classification and improve memory pinning in #629 (@fg-mindee)
Added an object detection metric in #628 (@fg-mindee)
Split DocArtefacts into subsets and updated its class mapping in #601 (@fg-mindee)
Added README specific for the API with route examples in #612 (@fg-mindee)
Added SVT dataset integration in #620 (@felixdittrich92)
Added links to tutorial notebooks in the documentation in #619 (@fg-mindee)
Added new architectures to model selection in demo in #600 (@fg-mindee)
Add det/reco_predictor arch in OCRPredictor.__repr__ in #595 (@RBMindee)
Improves coverage by adding missing unittests in #545 (@fg-mindee)
Resolve both lines and blocks by default when building a doc in #548 (@charlesmindee)
Relocated test/ to tests/ and made contribution process easier in #598 (@fg-mindee)
Fixed Makefile by converting spaces to tabs in #615 (@fg-mindee)
Updated flake8 config to spot unused imports & undefined variables in #623 (@fg-mindee)
Adds 2 new rotation flags in the ocr_predictor in #632 (@charlesmindee)

Bug fixes

Fixed evaluation script clipping issue in #522 (@charlesmindee)
Fixed API template issues with new httpx version in #535 (@fg-mindee)
Fixed TransformerDecoder for PyTorch 1.10 in #539 (@fg-mindee)
Fixed a bug in resolve_lines in #537 (@charlesmindee)
Fixed target computation in MASTER model (PyTorch backend) in #546 (@charlesmindee)
Fixed portuguese entry in VOCAB in #571 (@fmobrj)
Fixed header check typo in #557 (@fg-mindee)
Fixed keras version constraint in #579 (@fg-mindee)
Updated streamlit version in demo app in #611 (@charlesmindee)
Updated environment collection script in #575 (@fg-mindee)
Removed console print in builder in #566 (@fg-mindee)
Fixed docstring and export as xml dim bug in #586 (@felixdittrich92)
Fixed README instruction for page synthesis in #590 (@fg-mindee)
Adds missing console log and removed Tensorboard in #626 (@fg-mindee)
Fixed docstrings of datasets in #603 (@felixdittrich92)
Fixed documentation build requirements in #549 (@fg-mindee)

Improvements

Applied post release modifications in #520 (@fg-mindee)
Updated benchmark entry of crnn_mobilenet_v3 small in #523 (@charlesmindee)
Updated perf crnn_mobilenet_v3_large performances in doc (TF) in #526 (@charlesmindee)
Added automatic detection of rotated bbox in training utils in #534 (@fg-mindee)
Cleaned rotation transforms in #536 (@fg-mindee)
Updated library name spelling in #541 (@fg-mindee)
Updates README of detection training in #542 (@K-for-Code)
Updated package index in #543 (@fg-mindee)
Updated README in #555 (@fg-mindee)
Updated CONTRIBUTING and issue templates in #560 (@fg-mindee)
Removed unused imports and prevents XML attacks in #582 (@fg-mindee)
Updated references to demo in README in #599 (@fg-mindee)
Updated readme and help in analyze.py in #596 (@RBMindee)
Specified that the API template only supports images for now in #609 (@fg-mindee)
Updated command to install tf/pytorch build in #614 (@charlesmindee)
Added checkpoint format to gitignore in #613 (@fg-mindee)
Specified comment in SAR about symbol encoding in #617 (@fg-mindee)
Drops support of np.float16 in #627 (@fg-mindee)

New Contributors

Our thanks & warm welcome to the following persons for their first contributions: @mzeidhassan @K-for-Code @felixdittrich92 @SiddhantBahuguna @RBMindee @thentgesMindee 🙏

Full Changelog: v0.4.0...v0.4.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.1: Enables AMP training and adds support of artefact object detection