Skip to content

Releases: mindee/doctr

v0.10.0

21 Oct 08:37
d5dbc73
Compare
Choose a tag to compare

Note: docTR 0.10.0 requires python >= 3.9
Note: docTR 0.10.0 requires either TensorFlow >= 2.15.0 or PyTorch >= 2.0.0

What's Changed

Soft Breaking Changes (TensorFlow backend only) 🛠

  • Changed the saving format from /weights to .weights.h5

NOTE: Please update your custom trained models and HuggingFace hub uploaded models, this will be the last release supporting manual loading from /weights.

New features

Disable page orientation classification

  • If you deal with documents which contains only small rotations (~ -45 to 45 degrees), you can disable the page orientation classification to speed up the inference.
  • This will only have an effect with assume_straight_pages=False and/or straighten_pages=True and/or detect_orientation=True.
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True, assume_straight_pages=False, disable_page_orientation=True)

Disable crop orientation classification

  • If you deal with documents which contains only horizontal text, you can disable the crop orientation classification to speed up the inference.
  • This will only have an effect with assume_straight_pages=False and/or straighten_pages=True.
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True, assume_straight_pages=False, disable_crop_orientation=True)

Loading custom exported orientation classification models

You can now load your custom trained orientation models, the following snippet demonstrates how:

from doctr.io import DocumentFile
from doctr.models import ocr_predictor, mobilenet_v3_small_page_orientation, mobilenet_v3_small_crop_orientation
from doctr.models.classification.zoo import crop_orientation_predictor, page_orientation_predictor

custom_page_orientation_model = mobilenet_v3_small_page_orientation("<PATH_TO_CUSTOM_EXPORTED_ONNX_MODEL>")
custom_crop_orientation_model = mobilenet_v3_small_crop_orientation("<PATH_TO_CUSTOM_EXPORTED_ONNX_MODEL>"))

predictor = ocr_predictor(pretrained=True, assume_straight_pages=False, detect_orientation=True)

# Overwrite the default orientation models
predictor.crop_orientation_predictor = crop_orientation_predictor(custom_crop_orientation_model)
predictor.page_orientation_predictor = page_orientation_predictor(custom_page_orientation_model)

What's Changed

Breaking Changes 🛠

New Features

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.9.0...v0.10.0

v0.9.0

08 Aug 13:57
894eafd
Compare
Choose a tag to compare

Note: docTR 0.9.0 requires python >= 3.9
Note: docTR 0.9.0 requires either TensorFlow >= 2.11.0 or PyTorch >= 1.12.0.

What's Changed

Soft Breaking Changes 🛠

  • The default detection model changed from db_resnet50 to fast_base.
    NOTE: Can be reverted by passing the detection model predictor = ocr_predictor(det_arch="db_resnet50", pretrained=True)
  • The default value of resolve_blocks changed from True to False
    NOTE: Can be reverted by passing resolve_blocks=True to the ocr_predictor

New features

✨ Installation ✨

We have splitted docTR into some optional parts to make it a bit more lightweight and to exclude parts which are not required for inference.
Optional parts are:

  • visualization (to support .show())
  • html support (to support .from_url(...))
  • contribution module
# for TensorFlow without any optional dependencies
pip install "python-doctr[tf]"

# for PyTorch without any optional dependencies
pip install "python-doctr[torch]"

# Installs pytorch and all available optional parts
pip install "python-doctr[torch,viz,html,contib]"

✨ ONNX and OnnxTR ✨

We have build a standalone library to provide a super lightweight way to use existing docTR onnx exported models or your custom onces.

benefits:

  • kown docTR interface (ocr_predictor, etc.)
  • no PyTorch or TensorFlow required - build on top of onnxruntime
  • more lightweight package with faster inference latency and less required resources
  • 8-Bit quantized models for faster inference on CPU

Give it a try and check it out: OnnxTR
docTR docs: ONNX / OnnxTR

Screenshot from 2024-08-09 09-15-37

What's Changed

Breaking Changes 🛠

  • [models] Change default model to fast_base - soft breaking change by @felixdittrich92 in #1588
  • [misc] update README & fix mypy & change resolve blocks default by @felixT2K in #1686

New Features

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.8.1...v0.9.0

v0.8.1

04 Mar 14:50
62d94ff
Compare
Choose a tag to compare

Note: doctr 0.8.1 requires either TensorFlow >= 2.11.0 or PyTorch >= 1.12.0.

What's Changed

v0.8.0

28 Feb 13:13
67d1087
Compare
Choose a tag to compare

Note: doctr 0.8.0 requires either TensorFlow >= 2.11.0 or PyTorch >= 1.12.0.

What's Changed

Breaking Changes 🛠

  • db_resnet50_rotation (PyTorch) and linknet_resnet18_rotation (TensorFlow) are removed (All models can handle rotated documents now)
  • .show(doc) changed to .show()

New features

  • All models have pretrained checkpoints now by @odulcy-mindee
  • All detection models was retrained on rotated samples by @odulcy-mindee
  • Improved orientation detection for documents rotated between -90 and 90 degrees by @felixdittrich92
  • Conda deployment job & receipt added by @frgfm
  • Official docTR docker images are added by @odulcy-mindee => docker-images
  • New benchmarks and documentation improvements by @felixdittrich92
  • WildReceipt dataset added by @HamzaGbada
  • EarlyStopping callback added to all training scripts by @SkaarFacee
  • Hook mechanism added to ocr_predictor to maniplulate the detection predictions in the middle of the pipeline to your needs by @felixdittrich92
from doctr.model import ocr_predictor

class CustomHook:
    def __call__(self, loc_preds):
        # Manipulate the location predictions here
        # 1. The outpout structure needs to be the same as the input location predictions
        # 2. Be aware that the coordinates are relative and needs to be between 0 and 1
        return loc_preds

my_hook = CustomHook()

predictor = ocr_predictor(pretrained=True)
# Add a hook in the middle of the pipeline
predictor.add_hook(my_hook)
# You can also add multiple hooks which will be executed sequentially
for hook in [my_hook, my_hook, my_hook]:
    predictor.add_hook(hook)

What's Changed

Breaking Changes 🛠

New Features

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.7.0...v0.8.0

v0.7.0

09 Sep 13:23
75bddfc
Compare
Choose a tag to compare

Note: doctr 0.7.0 requires either TensorFlow >= 2.11.0 or PyTorch >= 1.12.0.
Note: We will release the missing PyTorch checkpoints with 0.7.1

What's Changed

Breaking Changes 🛠

  • We changed the preserve_aspect_ratio parameter to True by default in #1279
    => To restore the old behaviour you can pass preserve_aspect_ratio=False to the predictor instance

New features

Add of the KIE predictor

The KIE predictor is a more flexible predictor compared to OCR as your detection model can detect multiple classes in a document. For example, you can have a detection model to detect just dates and adresses in a document.

The KIE predictor makes it possible to use detector with multiple classes with a recognition model and to have the whole pipeline already setup for you.

from doctr.io import DocumentFile
from doctr.models import kie_predictor

# Model
model = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Analyze
result = model(doc)

predictions = result.pages[0].predictions
for class_name in predictions.keys():
    list_predictions = predictions[class_name]
    for prediction in list_predictions:
        print(f"Prediction for {class_name}: {prediction}")

The KIE predictor results per page are in a dictionary format with each key representing a class name and it's value are the predictions for that class.

What's Changed

Breaking Changes 🛠

New Features

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.6.0...v0.7.0

v0.6.0

29 Sep 11:51
dcbb21f
Compare
Choose a tag to compare

Highlights of the release:

Note: doctr 0.6.0 requires either TensorFlow >= 2.9.0 or PyTorch >= 1.8.0.

Full integration with Huggingface Hub (docTR meets Huggingface)

hf

  • Loading from hub:
from doctr.io import DocumentFile
from doctr.models import ocr_predictor, from_hub
image = DocumentFile.from_images(['data/example.jpg'])
# Load a custom detection model from huggingface hub
det_model = from_hub('Felix92/doctr-torch-db-mobilenet-v3-large')
# Load a custom recognition model from huggingface hub
reco_model = from_hub('Felix92/doctr-torch-crnn-mobilenet-v3-large-french')
# You can easily plug in this models to the OCR predictor
predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
result = predictor(image)
  • Pushing to the hub:
from doctr.models import recognition, login_to_hub, push_to_hf_hub
login_to_hub()
my_awesome_model = recognition.crnn_mobilenet_v3_large(pretrained=True)
push_to_hf_hub(my_awesome_model, model_name='doctr-crnn-mobilenet-v3-large-french-v1', task='recognition', arch='crnn_mobilenet_v3_large')

Documentation: https://mindee.github.io/doctr/using_doctr/sharing_models.html

Predefined datasets can be used also for recognition task

from doctr.datasets import CORD
# Crop boxes as is (can contain irregular)
train_set = CORD(train=True, download=True, recognition_task=True)
# Crop rotated boxes (always regular)
train_set = CORD(train=True, download=True, use_polygons=True, recognition_task=True)
img, target = train_set[0]

Documentation: https://mindee.github.io/doctr/using_doctr/using_datasets.html

New models (both frameworks)

  • classification: VisionTransformer (ViT)
  • recognition: Vision Transformer for Scene Text Recognition (ViTSTR)

Bug fixes recognition models

  • MASTER and SAR architectures are now operational in both frameworks (TensorFlow and PyTorch)

ONNX support (experimential)

  • All models can now be exported into ONNX format (only TF mobilenet left for 0.7.0)

NOTE: full production pipeline with ONNX / build is planned for 0.7.0 (the models can be only exported up to the logits without any post processing included)

Further features

  • our demo is now also PyTorch compatible, thanks to @odulcy-mindee
  • it is now possible to detect the language of the extracted text, thanks to @aminemindee

What's Changed

Breaking Changes 🛠

New Features

Bug Fixes

Improvements

Miscellaneous

  • [Refactor] commit tags by @felixdittrich92 in #871
  • Update io/pdf.py to new pypdfium2 API by @mara004 in #944
  • docs: Documentation the reason for keras version specifier by @frgfm in #958
  • [datasets] update IC / SROIE / FUNSD / CORD by @felixdittrich92 in #983
  • [datasets] revert whitespace filtering and fix svhn reco by @felixdittrich92 in #987
  • fix: update tensorflow-addons to match tensorflow version by @ianardee in #998
  • move transformers implementation to modules by @felixdittr...
Read more

v0.5.1

22 Mar 10:41
9d03085
Compare
Choose a tag to compare

This minor release includes: improvement of the documentation thanks to @felixdittrich92, bugs fixed, support of rotation extended to Tensorflow backend, a switch from PyMuPDF to pypdfmium2 and a nice integration to the Hugginface Hub thanks to @fg-mindee !

Note: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

Improvement of the documentation

The documentation has been improved adding a new theme, illustrations, and docstring has been completed and developed.
This how it renders:

doc
Capture d’écran de 2022-03-22 11-08-31

Rotated text detection extended to Tensorflow backend

We provide weights for the linknet_resnet18_rotation model which has been deeply modified: We implemented a new loss (based on Dice Loss and Focal Loss), we changed the computation of the targets so that polygons are shrunken the same way they are in the DBNet which improves highly the precision of the segmenter and we trained the model preserving the aspect ratio of the images.
All these improvements led to much better results, and the pretrained model is now very robust.

Preserving the aspect ratio in the detection task

You can now choose to preserve the aspect ratio in the detection_predictor:

>>> from doctr.models import detection_predictor
>>> predictor = detection_predictor('db_resnet50_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)

This option can also be activated in the high level end-to-end predictor:

>>> from doctr.model import ocr_predictor
>>> model = ocr_predictor('linknet_resnet18_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)

Integration within the HugginFace Hub

The artefact detection model is now available on the HugginFace Hub, this is amazing:

Capture d’écran de 2022-03-22 11-33-14

On DocTR, you can now use the .from_hub() method so that those 2 snippets are equivalent:

# Pretrained
from doctr.models.obj_detection import fasterrcnn_mobilenet_v3_large_fpn
model = fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)

and:

# HF Hub
from doctr.models.obj_detection.factory import from_hub
model = from_hub("mindee/fasterrcnn_mobilenet_v3_large_fpn")

Breaking changes

Replacing the PyMuPDF dependency with pypdfmium2 which is license compatible

We replaced for the PyMuPDF dependency with pypdfmium2 for a license-compatibility issue, so we loose the word and objects extraction from source pdf which was done with PyMuPDF. It wasn't used in any models so it is not a big issue, but anyway we will work in the future to re-integrate such a feature.

Full changelog

What's Changed

Breaking Changes 🛠

New Features

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.5.0...v0.5.1

v0.5.0: Skew-aware OCR & extended model/dataset zoo

31 Dec 18:32
b9d8feb
Compare
Choose a tag to compare

This release adds support of rotated documents, and extends both the model & dataset zoos.

Note: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

🙃 😃 Rotation-aware text detection 🙃 😃

It's no secret: this release focus was to bring the same level of performance to rotated documents!

predictions

docTR is meant to be your best tool for seamless document processing, and it couldn't do without supporting a very natural & common augmentation of input documents. This large project was subdivided into three parts:

Straightening pages before text detection

Developing a heuristic-based method to estimate the page skew, and rotate it before forwarding it to any deep learning model. Our thanks to @Rob192 for his contribution on this part 🙏

This behaviour can be enabled to avoid retraining the text detection models. However, the heuristics approach has its limits in terms of robustness.

Text detection training with rotated images

doctr_sample

The core of this project was to enable our text detection models to produce non-degraded heatmaps & localization candidates when processing a rotated page.

Crop orientation resolution

rot2

Finally, once the localization candidates have been extracted, there is no saying that this localization candidate will read from left to right. In order to remove this doubt, a lightweight image orientation classifier was added to refine the crops that will be sent to text recognition!

🦓 A wider pretrained classification model zoo 🦓

The stability of trainings in deep learning for complex tasks has mostly been helped by leveraging transfer learning. As such, OCR tasks usually require a backbone as a feature extractor. For this reason, all checkpoints of classification models in both PyTorch & TensorFlow have been updated 🚀
Those were trained using our synthetic character classification dataset, for more details cf. Character classification training

🖼️ New public datasets join the fray

Thanks to @felixdittrich92, the list of supported datasets has considerably grown 🥳
This includes widely popular datasets used for benchmarks on OCR-related tasks, you can find the full list over here 👉 #587

Synthetic text recognition dataset

Additionally, we followed up on the existing CharGenerator by introducing WordGenerator:

  • generates an image of word of length randomly sampled within a specified range, with characters randomly sampled from the specified vocab.
  • you can even pass a list of fonts so that each word font family is randomly picked among them

Below are some samples using a font_size=32:
wordgenerator_sample

📑 New notebooks

Two new notebooks have made their way into the documentation:

  • producing searchable PDFs from docTR analysis results
  • introduction to document artefact detection (QR code, bar codes, ID pictures, etc.) with docTR

image

Breaking changes

Revamp of classification models

With the retraining of all classification backbones, several changes have been introduced:

  • Model naming: linknet16 --> linknet_resnet18
  • Architecture changes: all classification backbones are available with a classification head now.

Enforcing relative coordinates in datasets

In order to unify our data pipelines, we forced the conversion to relative coordinates on all datasets!

0.4.1 0.5.0
>>> from doctr.datasets import FUNSD
>>> ds = FUNSD(train=True, download=True)
>>> img, target = ds[0]
>>> print(target['boxes'].dtype, target['boxes'].max())
(dtype('int64'), 862)
>>> from doctr.datasets import FUNSD
>>> ds = FUNSD(train=True, download=True)
>>> img, target = ds[0]
>>> print(target['boxes'].dtype, target['boxes'].max())
(dtype('float32'), 0.98341835)

Full changelog

Breaking Changes 🛠

New Features

Bug Fixes

Read more

v0.4.1: Enables AMP training and adds support of artefact object detection

22 Nov 11:22
74ff9ff
Compare
Choose a tag to compare

This patch release brings the support of AMP for PyTorch training to docTR along with artefact object detection.

Note: doctr 0.4.1 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

Automatic Mixed Precision (AMP) ⚡

Training scripts with PyTorch back-end now benefit from AMP to reduce the RAM footprint and potentially increase the maximum batch size! This comes especially handy on text detection which require high spatial resolution inputs!

Artefact detection 🛸

Document understanding goes beyond textual elements, as information can be encoded in other visual forms. For this reason, we have extended the range of supported tasks by adding object detection. This will be focused on non-textual elements in documents, including QR codes, barcodes, ID pictures, and logos.

Here are some early results:

2x3_art(1)

This release comes with a training & validation set DocArtefacts, and a reference training script. Keep an eye for models we will be releasing in the next release!

Get more of docTR with Colab tutorials 📖

You've been waiting for it, from now on, we will be adding regularly new tutorials for docTR in the form of jupyter notebooks that you can open and run locally or on Google Colab for instance!

Check the new page in the documentation to have an updated list of all our community notebooks: https://mindee.github.io/doctr/latest/notebooks.html

Breaking changes

Deprecated support of FP16 for datasets

Float-precision can be leveraged in deep learning to decrease the RAM footprint of trainings. The common data type float32 has a lower resolution counterpart float16 which is usually only supported on GPU for common deep learning operations. Initially, we were planning to make all our operations available in both to reduce memory footprint in the end.

However, with the latest development of Deep Learning frameworks, and their Automatic Mixed Precision mechanism, this isn't required anymore and only adds more constraints on the development side. We thus deprecated this feature from our datasets and predictors:

0.4.0 0.4.1
>>> from doctr.datasets import FUNSD
>>> ds = FUNSD(train=True, download=True, fp16=True)
>>> print(getattr(ds, "fp16"))
True
>>> from doctr.datasets import FUNSD
>>> ds = FUNSD(train=True, download=True)
>>> print(getattr(ds, "fp16"))
None

Detailed changes

New features

Bug fixes

Improvements

New Contributors

Our thanks & warm welcome to the following persons for their first contributions: @mzeidhassan @K-for-Code @felixdittrich92 @SiddhantBahuguna @RBMindee @thentgesMindee 🙏

Full Changelog: v0.4.0...v0.4.1

v0.4.0: Full support of PyTorch and a growing pretrained model zoo

01 Oct 18:58
51663dd
Compare
Choose a tag to compare

This release brings the support of PyTorch out of beta, makes text recognition more robust, and provides light architectures for complex tasks.

Note: doctr 0.4.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

No more width limitation for text recognition

Some documents such as French ID card include very long strings that can be challenging to transcribe:

fr_id_card_sample (copy)

This release enables a smart split/merge strategy for wide crops to avoid performance drops. Previously the whole crop was analyzed altogether, while right now, it is split into reasonably sized crops, the inference is performed in batch then predictions are merged together.

The following snippet:

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

doc = DocumentFile.from_images('path/to/img.png')
predictor = ocr_predictor(pretrained=True)
print(predictor(doc).pages[0])

used to yield:

Page(
  dimensions=(447, 640)
  (blocks): [Block(
    (lines): [Line(
      (words): [
        Word(value='1XXXXXX', confidence=0.0023),
        Word(value='1XXXX', confidence=0.0018),
      ]
    )]
    (artefacts): []
  )]
)

and now yields:

Page(
  dimensions=(447, 640)
  (blocks): [Block(
    (lines): [Line(
      (words): [
        Word(value='IDFRABERTHIER<<<<<<<<<<<<<<<<<<<<<<', confidence=0.49),
        Word(value='8806923102858CORINNE<<<<<<<6512068F6', confidence=0.22),
      ]
    )]
    (artefacts): []
  )]
)

Framework specific predictors

PyTorch support is now no longer in beta, so we made some efforts so that switching from one deep learning backend to another is unified 🙌 Predictors are designed to be the recommended interface for inference with your models!

0.3.1 (TensorFlow) 0.3.1 (PyTorch) 0.4.0
>>> from doctr.models import detection_predictor
>>> predictor = detection_predictor(pretrained=True)
>>> out = predictor(doc, training=False)
>>> from doctr.models import detection_predictor
>>> import torch
>>> predictor = detection_predictor(pretrained=True)
>>> predictor.model.eval()
>>> with torch.no_grad(): out = predictor(doc)
>>> from doctr.models import detection_predictor
>>> predictor = detection_predictor(pretrained=True)
>>> out = predictor(doc)

An evergrowing model zoo 🦓

As PyTorch goes out of beta, we have bridged the gap between PyTorch & TensorFlow pretrained models' availability. Additionally, by leveraging our integration of light backbones, this release comes with lighter architectures for text detection and text recognition:

  • db_mobilenet_v3_large
  • crnn_mobilenet_v3_small
  • crnn_mobilenet_v3_large

The full list of supported architectures is available 👉 here

Demo live on HuggingFace Spaces

If you have enjoyed the Streamlit demo, but prefer not to run in on your own hardware, feel free to check out the online version on HuggingFace Spaces:
Hugging Face Spaces

Courtesy of @osanseviero for deploying it, and HuggingFace for hosting & serving 🙏

Breaking changes

Deprecated crnn_resnet31 & sar_vgg16_bn

After going over some backbone compatibility and re-assessing whether all combinations should be trained, DocTR is focusing on reproducing the paper's authors' will or improve upon it. As such, we have deprecated the following recognition models (that had no pretrained params): crnn_resnet31, sar_vgg16_bn.

Deprecated models.export

Since doctr.models.export was specific to TensorFlow and it didn't bring much more value than TensorFlow tutorials, we added instructions in the documentation and deprecated the submodule.

New features

Datasets

Resources to access data in efficient ways

IO

Features to manipulate input & outputs

Models

Deep learning model building and inference

Utils

Utility features relevant to the library use cases.

Transforms

Data transformations operations

Test

Verifications of the package well-being before release

Documentation

Online resources for potential users

References

Reference training scripts

  • Added option to select vocab in the training of character classification and text recognition #502 (@fg-mindee)

Others

Other tools and implementations

Bug fixes

Datasets

Models

Transforms

Utils

  • Fixed page synthesis for characters outside of latin-1 #496 (@fg-mindee)

Documentation

References

Others

Improvements

Datasets

Models

  • Deprecated doctr.models.export #463 (@fg-mindee)
  • Deprecated crnn_resnet31 & sar_vgg16_bn recognition models #468 (@fg-mindee)
  • Relocated DocumentBuilder to doctr.models.builder, split predictor into framework-specific objects #481 (@fg-mindee)
  • Added more robust argument checks in DocumentBuilder & refactored crop preparation and result processing in ocr predictors #497 (@fg-mindee)
  • Reflected changes of detection target formats on detection models #491 (@fg-mindee)

Utils

Documentation

Tests

References

  • Reflected changes of detection dataset target format #491 (@fg-mindee)

Others

Read more