Releases: mindee/doctr
v0.10.0
Note: docTR 0.10.0 requires python >= 3.9
Note: docTR 0.10.0 requires either TensorFlow >= 2.15.0 or PyTorch >= 2.0.0
What's Changed
Soft Breaking Changes (TensorFlow backend only) 🛠
- Changed the saving format from
/weights
to.weights.h5
NOTE: Please update your custom trained models and HuggingFace hub uploaded models, this will be the last release supporting manual loading from /weights
.
New features
- Added numpy 2.0 support @felixdittrich92
- New and updated notebooks was added @felixdittrich92 --> notebooks
- Custom orientation model loading @felixdittrich92
- Additional functionality to control the pipeline when dealing with rotated documents @milosacimovic @felixdittrich92
- Bulit-in datasets can now be loaded directly for detection with
detection_task=True
comparable to the existingrecognition_task=True
@felixdittrich92
Disable page orientation classification
- If you deal with documents which contains only small rotations (~ -45 to 45 degrees), you can disable the page orientation classification to speed up the inference.
- This will only have an effect with
assume_straight_pages=False
and/orstraighten_pages=True
and/ordetect_orientation=True
.
from doctr.models import ocr_predictor
model = ocr_predictor(pretrained=True, assume_straight_pages=False, disable_page_orientation=True)
Disable crop orientation classification
- If you deal with documents which contains only horizontal text, you can disable the crop orientation classification to speed up the inference.
- This will only have an effect with
assume_straight_pages=False
and/orstraighten_pages=True
.
from doctr.models import ocr_predictor
model = ocr_predictor(pretrained=True, assume_straight_pages=False, disable_crop_orientation=True)
Loading custom exported orientation classification models
You can now load your custom trained orientation models, the following snippet demonstrates how:
from doctr.io import DocumentFile
from doctr.models import ocr_predictor, mobilenet_v3_small_page_orientation, mobilenet_v3_small_crop_orientation
from doctr.models.classification.zoo import crop_orientation_predictor, page_orientation_predictor
custom_page_orientation_model = mobilenet_v3_small_page_orientation("<PATH_TO_CUSTOM_EXPORTED_ONNX_MODEL>")
custom_crop_orientation_model = mobilenet_v3_small_crop_orientation("<PATH_TO_CUSTOM_EXPORTED_ONNX_MODEL>"))
predictor = ocr_predictor(pretrained=True, assume_straight_pages=False, detect_orientation=True)
# Overwrite the default orientation models
predictor.crop_orientation_predictor = crop_orientation_predictor(custom_crop_orientation_model)
predictor.page_orientation_predictor = page_orientation_predictor(custom_page_orientation_model)
What's Changed
Breaking Changes 🛠
- [TF] First changes on the road to Keras v3 by @felixdittrich92 in #1724
- [Build] update minor version & update torch to >= 2.0 by @felixdittrich92 in #1747
New Features
- Disable page and crop orientation by @milosacimovic in #1735
Bug Fixes
- [Bug] fix straighten pages by @felixdittrich92 in #1697
- [Fix] Remove image padding after rotation correction with
straighten_pages=True
by @felixdittrich92 in #1731 - [datasets] Allow detection task for built-in datasets by @felixdittrich92 in #1717
- [Bug] Fix eval scripts + possible overflow in Resize by @felixdittrich92 in #1715
- [demo] Add missing viz dep for demo by @felixT2K in #1751
Improvements
- [Datasets] Add Vietnamese letters by @MinhChien9 in #1693
- feat: added ukrainian vocab by @holyCowMp3 in #1700
- [orientation] Enable usage of custom trained orientation models by @felixdittrich92 in #1708
- [demo] Automate doctr demo update via CI job by @felixdittrich92 in #1742
- [TF] Move model building & unify train scripts by @felixdittrich92 in #1744
- [demo/docs] Update notebook docs & minor demo update / fix by @felixT2K in #1755
- [Reconstitution] Improve reconstitution by @felixdittrich92 in #1750
Miscellaneous
- [misc] post release 0.9.1 by @felixT2K in #1689
- [build] NumPy 2.0 support by @felixdittrich92 in #1709
New Contributors
- @MinhChien9 made their first contribution in #1693
- @holyCowMp3 made their first contribution in #1700
- @milosacimovic made their first contribution in #1735
Full Changelog: v0.9.0...v0.10.0
v0.9.0
Note: docTR 0.9.0 requires python >= 3.9
Note: docTR 0.9.0 requires either TensorFlow >= 2.11.0 or PyTorch >= 1.12.0.
What's Changed
Soft Breaking Changes 🛠
- The default
detection
model changed fromdb_resnet50
tofast_base
.
NOTE: Can be reverted by passing the detection modelpredictor = ocr_predictor(det_arch="db_resnet50", pretrained=True)
- The default value of
resolve_blocks
changed fromTrue
toFalse
NOTE: Can be reverted by passingresolve_blocks=True
to theocr_predictor
New features
- Fast models got pretrained checkpoints by @odulcy-mindee @felixdittrich92
- Introducing a contributions module which replaces the obj detection and builds a place for more pipelines by @felixdittrich92
- Improved orientation detection by @felixdittrich92 @odulcy-mindee
- Improved and updated API template by @felixdittrich92
- Include
objectness_score
in results by @felixdittrich92 - Add word crop general orientation to output by @felixdittrich92
- Split library into parts (optional dependencies) by @felixdittrich92
- Add page orientation predictor by @felixdittrich92 @odulcy-mindee
- Add onnx inference doc by @felixdittrich92
✨ Installation ✨
We have splitted docTR into some optional parts to make it a bit more lightweight and to exclude parts which are not required for inference.
Optional parts are:
- visualization (to support
.show()
) - html support (to support
.from_url(...)
) - contribution module
# for TensorFlow without any optional dependencies
pip install "python-doctr[tf]"
# for PyTorch without any optional dependencies
pip install "python-doctr[torch]"
# Installs pytorch and all available optional parts
pip install "python-doctr[torch,viz,html,contib]"
✨ ONNX and OnnxTR ✨
We have build a standalone library to provide a super lightweight way to use existing docTR onnx exported models or your custom onces.
benefits:
- kown docTR interface (
ocr_predictor
, etc.) - no
PyTorch
orTensorFlow
required - build on top ofonnxruntime
- more lightweight package with faster inference latency and less required resources
- 8-Bit quantized models for faster inference on CPU
Give it a try and check it out: OnnxTR
docTR docs: ONNX / OnnxTR
What's Changed
Breaking Changes 🛠
- [models] Change default model to
fast_base
- soft breaking change by @felixdittrich92 in #1588 - [misc] update README & fix mypy & change resolve blocks default by @felixT2K in #1686
New Features
- [prototype] object det replacement / init contrib modules by @felixdittrich92 in #1534
Bug Fixes
- [FIX] Fix mistake in FASTConvLayer and tf reparameterization by @felixdittrich92 in #1506
- [Fix] sar_resnet31 TF + PT by @felixdittrich92 in #1513
- [Fix] crop orientation KIE by @felixdittrich92 in #1548
- [Fix] allign orientation train script to current orientation model (counter clockwise instead of clockwise) & make OrientationPredictor dynamic by @felixdittrich92 in #1559
- [Fix / transforms] RandomHorizontalFlip & RandomCrop by @felixdittrich92 in #1572
- [FIX] parseq onnx export by @felixdittrich92 in #1585
- [Fix] close PIL images when loading images to tensor/numpy by @helpmefindaname in #1598
- [conda] Fix meta.yaml package function name by @felixdittrich92 in #1603
- [bug] remove TF multiprocessing workers by @felixdittrich92 in #1635
- [IO] Pdf File close after opening by @justinjosephmkj in #1624
- [bug] exclude scores if rot and eval straight by @felixdittrich92 in #1639
- Fixed assume_straight_pages for custom models by @Fabioomega in #1681
Improvements
- [docs] documentation for changing predictors batch sizes by @felixdittrich92 in #1514
- feat: ✨ torch fast_tiny checkpoint by @odulcy-mindee in #1518
- [models] Add benchmark fast_tiny and reparameterize by default by @felixdittrich92 in #1519
- feat: ✨ PT fast base checkpoint by @odulcy-mindee in #1526
- feat: ✨ PT fast small checkpoint by @odulcy-mindee in #1529
- [API] update api for multi file and pdf support by @felixdittrich92 in #1522
- [feature] Add word crop general orientation to output by @felixdittrich92 in #1546
- feat: ✨ torch
mobilenet_v3_small_orientation
chkpt by @odulcy-mindee in #1557 - [metrics] speed up polygon iou (for --rotation) by keeping balanced memory footprint by @felixdittrich92 in #1561
- [orientation] augment angle while training by @felixdittrich92 in #1567
- [orientation] Part 1: Add page orientation predictor by @felixdittrich92 in #1566
- feat: ✨ torch mobilenet_v3_small_crop_orientation by @odulcy-mindee in #1571
- feat: ✨ improve mobilenet_v3_page_orientation checkpoint by @odulcy-mindee in #1573
- [transforms] Add RandomResize (like ZoomOut) by @felixdittrich92 in #1574
- [references] Update detection augmentations by @felixdittrich92 in #1577
- [TF] add fast models and benchmarks by @felixdittrich92 in #1583
- [transforms] small random resize improvement by @felixdittrich92 in #1584
- [docs] Add onnx inference doc by @felixdittrich92 in #1601
- feat: ✨ torch db_mobilenet_v3_large checkpoint by @odulcy-mindee in #1632
- [detection] move padding removal directly to detection by @felixdittrich92 in #1627
- feat: ✨ tf mobilenet_v3_small_page_orientation checkpoint by @odulcy-mindee in #1636
- [builder] Add objectness scores by @felixdittrich92 in #1625
- [orientation] page orientation improvements by @felixdittrich92 in #1553
- [Datasets] Add hindi & bangla vocabs by @felixT2K in #1687
Miscellaneous
- [misc] apply 0.8.1 post release modifications by @felixdittrich92 in #1498
- Replace unidecode with text-unidecode. by @jonatankawalek in #1509
- [misc] update dev deps by @felixdittrich92 in #1510
- [benchmark] fast base pytorch by @felixdittrich92 in #1523
- [benchmark] fast small benchmark by @felixdittrich92 in #1527
- [Misc] drop py 3.8 support by @felixdittrich92 in #1457
- [CI] update CI actions by @felixdittrich92 in #1558
- Exclude deps & split into optional parts by @felixdittrich92 in #1551
- [references] remove missed parts of old obj det by @felixdittrich92 in #1568
- [tests/onnx] Add onnx and model out check by @felixdittrich92 in #1569
- [Fix] Pin py3.11 for MacOS latest / update publish version checks by @felixdittrich92 in #1503
- [build] Finally to py 3.9 by @felixdittrich92 in #1647
New Contributors
- @jonatankawalek made their first contribution in #1509
- @helpmefindaname made their first contribution in #1598
- @justinjosephmkj made their first contribution in #1624
- @Fabioomega made their first contribution in #1681
Full Changelog: v0.8.1...v0.9.0
v0.8.1
Note: doctr 0.8.1 requires either TensorFlow >= 2.11.0 or PyTorch >= 1.12.0.
What's Changed
-
Fixed conda receipt and CI jobs for conda and pypi releases
-
Fixed some broken links
-
Pre-Release: FAST text detection model from FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation -> Checkpoints will be provided with the next release
v0.8.0
Note: doctr 0.8.0 requires either TensorFlow >= 2.11.0 or PyTorch >= 1.12.0.
What's Changed
Breaking Changes 🛠
db_resnet50_rotation
(PyTorch) andlinknet_resnet18_rotation
(TensorFlow) are removed (All models can handle rotated documents now).show(doc)
changed to.show()
New features
- All models have pretrained checkpoints now by @odulcy-mindee
- All detection models was retrained on rotated samples by @odulcy-mindee
- Improved orientation detection for documents rotated between -90 and 90 degrees by @felixdittrich92
- Conda deployment job & receipt added by @frgfm
- Official docTR docker images are added by @odulcy-mindee => docker-images
- New benchmarks and documentation improvements by @felixdittrich92
WildReceipt
dataset added by @HamzaGbada- EarlyStopping callback added to all training scripts by @SkaarFacee
- Hook mechanism added to
ocr_predictor
to maniplulate the detection predictions in the middle of the pipeline to your needs by @felixdittrich92
from doctr.model import ocr_predictor
class CustomHook:
def __call__(self, loc_preds):
# Manipulate the location predictions here
# 1. The outpout structure needs to be the same as the input location predictions
# 2. Be aware that the coordinates are relative and needs to be between 0 and 1
return loc_preds
my_hook = CustomHook()
predictor = ocr_predictor(pretrained=True)
# Add a hook in the middle of the pipeline
predictor.add_hook(my_hook)
# You can also add multiple hooks which will be executed sequentially
for hook in [my_hook, my_hook, my_hook]:
predictor.add_hook(hook)
What's Changed
Breaking Changes 🛠
- [prototype] compute orientation on segmentation map by @felixdittrich92 in #1336
New Features
- feat: ✨ Official docker images for docTR by @odulcy-mindee in #1322
- Add wildreceipt dataset by @HamzaGbada in #1359
- Added early stopping feature by @SkaarFacee in #1397
- [PT / TF] Add TextNet - FAST backbone by @felixdittrich92 in #1425
- feat: Adds conda recipe & corresponding CI jobs by @frgfm in #1414
- [prototype] Extend detection result customization by @felixdittrich92 in #1449
Bug Fixes
- [FIX] antialising in PreProcessor by @felixdittrich92 in #1324
- [Fix] prob computation for parseq and vitstr models by @felixdittrich92 in #1327
- [FIX] clip overflowing probs by @felixdittrich92 in #1335
- [Fix] PT - convert BF16 tensor to float before calling .numpy() by @chunyuan-w in #1342
- [Fix] Prob comp in vitstr and parseq for empty words by @felixT2K in #1345
- [Fix] TF - add bf16 numpy dtype conversion by @felixT2K in #1346
- [Fix] fix growing mem usage pytorch crnn by @felixdittrich92 in #1357
- [Fix] tf augmentations by @felixT2K in #1360
- Fix broken weasyprint link by @simonw in #1367
- feat: ✨ use
tqdm
instead offastprogress
in reference scripts by @odulcy-mindee in #1389 - [FIX] Fix mypy errors by @felixdittrich92 in #1419
- [FIX] db loss TF and PT also for training with rotated samples by @felixdittrich92 in #1396
- [FIX] Dice loss computation in both backends by @felixdittrich92 in #1442
- [FIX] Fix streamlit demo by @felixdittrich92 in #1447
- [Fix / Misc] Fix conda CI build and publish job and update actions by @felixdittrich92 in #1453
- [Fix] Catch Divide by zero explicit by @felixdittrich92 in #1471
Improvements
- feat: ✨ PT ViTSTR Small Checkpoint by @odulcy-mindee in #1319
- feat: ✨ PT Parseq Checkpoint by @odulcy-mindee in #1320
- [scripts] Add backbone freeze for recognition scripts and update augmentations also for DDP script by @felixdittrich92 in #1328
- [PyTorch] replace no_grad with inference_mode by @felixdittrich92 in #1323
- [transforms] update random apply to work also with targets by @felixdittrich92 in #1333
- [TF] unify detection augmentations by @felixdittrich92 in #1351
- feat: ✨ PT SAR Resnet31 Checkpoint by @odulcy-mindee in #1362
- feat: ✨ PT ViTSTR Base checkpoint by @odulcy-mindee in #1361
- TF change antialias to true by @felixT2K in #1348
- feat: ✨ PT Linknet Resnet18 Checkpoint by @odulcy-mindee in #1387
- [demo] remove limitation and update demo by @felixdittrich92 in #1390
- feat: ✨ PT Linknet Resnet50 Checkpoint by @odulcy-mindee in #1391
- feat: ✨ PT Linknet Resnet 34 Checkpoint by @odulcy-mindee in #1393
- [Fixes / docs] Add more vocabs / Fix Style / HF hub / API Dep by @felixdittrich92 in #1412
- fix: 🐛 add sqlite dependency by @odulcy-mindee in #1421
- feat: ✨ new TF Linknet Resnet checkpoints by @odulcy-mindee in #1424
- feat: ✨ PT db_resnet34 checkpoint by @odulcy-mindee in #1433
- [references] TF / PT crop & document orientation classifier train scripts by @felixdittrich92 in #1432
- [PT] remove submodule from textnet arch by @felixdittrich92 in #1436
- [references] Add poly scheduler for detection training by @felixdittrich92 in #1444
- [references] Add interval saving for detection trainings by @felixdittrich92 in #1454
- feat: ✨ PT db_resnet50 checkpoint by @odulcy-mindee in #1465
- read labels in utf-8 and log input string on vocab error by @eikaramba in #1479
- feat: ✨ tf db_resnet50 checkpoint by @odulcy-mindee in #1480
- feat: ✨ TF db mobilenet v3 large new ckpt by @odulcy-mindee in #1483
- [Docs] extend doc with DocumentBuilder options by @felixdittrich92 in #1486
- feat: ✨ TF db mobilenet v3 large new ckpt by @odulcy-mindee in #1487
Miscellaneous
- chore: apply post release modifications v0.7.0 by @felixdittrich92 in #1309
- docs: ✏️ fix images on pypi by @odulcy-mindee in #1310
- Update Dockerfile (GPU Support, Workdir, Permissions) by @ffalkenberg in #1313
- [misc] rename helper function for bf16 to float32 casting by @felixdittrich92 in #1347
- hebrew letters by @uriva in #1355
- docs: ✏️ add
WILDRECEIPT
in docs and fixREADME.md
by @odulcy-mindee in #1363 - [misc] increase to 0.8.0 and temp pin onnx by @felixT2K in #1365
- [Fix] Typo in README.md by @eltociear in #1374
- Relax Pillow and OpenCV version bounds. by @nh2 in #1373
- [misc & build] replace isort pydocstyle and black with ruff by @felixdittrich92 in #1379
- [Misc] rename char classifiation scripts and dependency pin by @felixdittrich92 in #1469
- [Docs] add PyTorch / TensorFlow benchmarks by @felixdittrich92 in #1321
- [misc] rename channel by @felixdittrich92 in #1488
New Contributors
- @ffalkenberg made their first contribution in #1313
- @chunyuan-w made their first contribution in #1342
- @uriva made their first contribution in #1355
- @simonw made their first contribution in #1367
- @nh2 made their first contribution in #1373
- @SkaarFacee made their first contribution in #1397
- @eikaramba made their first contribution in #1479
Full Changelog: v0.7.0...v0.8.0
v0.7.0
Note: doctr 0.7.0 requires either TensorFlow >= 2.11.0 or PyTorch >= 1.12.0.
Note: We will release the missing PyTorch checkpoints with 0.7.1
What's Changed
Breaking Changes 🛠
- We changed the
preserve_aspect_ratio
parameter toTrue
by default in #1279
=> To restore the old behaviour you can passpreserve_aspect_ratio=False
to thepredictor
instance
New features
- Feat: Make detection training and inference Multiclass by @aminemindee in #1097
- Now all TensorFlow models have pretrained weights by @odulcy-mindee
- The docs was updated and model corresponding benchmarks was added by @felixdittrich92
- Two new recognition models was added (ViTSTR and PARSeq) in both frameworks by @felixdittrich92 @nikokks
Add of the KIE predictor
The KIE predictor is a more flexible predictor compared to OCR as your detection model can detect multiple classes in a document. For example, you can have a detection model to detect just dates and adresses in a document.
The KIE predictor makes it possible to use detector with multiple classes with a recognition model and to have the whole pipeline already setup for you.
from doctr.io import DocumentFile
from doctr.models import kie_predictor
# Model
model = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Analyze
result = model(doc)
predictions = result.pages[0].predictions
for class_name in predictions.keys():
list_predictions = predictions[class_name]
for prediction in list_predictions:
print(f"Prediction for {class_name}: {prediction}")
The KIE predictor results per page are in a dictionary format with each key representing a class name and it's value are the predictions for that class.
What's Changed
Breaking Changes 🛠
- Feat: Make detection training and inference Multiclass by @aminemindee in #1097
New Features
- feat: ✨ PyTorch Recognition Model Multi-GPU support by @odulcy-mindee in #1164
- [Feat] Add PARSeq model TF and PT by @nikokks in #1205
- [Feat] Predictor precision PT backend by @felixdittrich92 in #1204
- feat: ✨ ClearML support for TensorFlow by @odulcy-mindee in #1257
Bug Fixes
- fix classification model cuda move by @odulcy-mindee in #1125
- fix: 🔧 docker api use GitHub repository by @odulcy-mindee in #1148
- Error in unpacking archive of SROIE dataset by @HamzaGbada in #1178
- [Fix] remove autogen version.py fix docs build and fix version identifier by @felixT2K in #1180
- [FIX] Error in unpacking archive of CORD dataset by @HamzaGbada in #1190
- chore(deps-dev): update docutils requirement from <0.20 to <0.21 by @dependabot in #1198
- speed up VIT models and fix patch size by @felixdittrich92 in #1219
- [Fix] PARSeq pytorch fixes by @felixdittrich92 in #1227
- [Fix] PARSeq tensorflow fixes by @felixdittrich92 in #1228
- [fix/chore] fix bug in tf det eval script / update dep version specifier by @felixdittrich92 in #1232
- fix: 🐛 fix bug when training object detection by @aminemindee in #1254
- [Fix] fix obj det train and suppress endless warning prints by @felixdittrich92 in #1267
- [Fix] add ignore keys if classes differ - KIE training by @felixdittrich92 in #1271
- change the way model is saved in ddp by @venkatapathy in #1289
Improvements
- Improve pypdfium2 integration again by @mara004 in #1096
- [build] replaces flake8 with ruff by @felixT2K in #1179
- [datasets] Add IIIT HWS dataset by @felixT2K in #1199
- feat: ✨ TF linknet_resnet18 checkpoint by @odulcy-mindee in #1231
- [tests/bug] improve tests and fix a minor bug by @felixdittrich92 in #1229
- [PyTorch] update transforms pytorch (classification / det / rec) by @felixdittrich92 in #1253
- [docs] custom model load by @felixdittrich92 in #1263
- feat: ✨ TF ViTSTR Small checkpoint by @odulcy-mindee in #1273
- [predictor] aspect ratio true by default by @felixdittrich92 in #1279
- feat: ✨ TF SAR Resnet31 checkpoint by @odulcy-mindee in #1281
Miscellaneous
- chore: apply post release modifications v0.6.0 by @felixdittrich92 in #1081
- chore: dev version downgrade from 0.7.0 to 0.6.1 by @felixdittrich92 in #1082
- chore(deps-dev): update black requirement from <23.0,>=22.1 to >=22.1,<24.0 by @dependabot in #1140
- chore(deps-dev): update docutils requirement from <0.18 to <0.20 by @dependabot in #1101
- docs: Minor typo fix by @khanfarhan10 in #1150
- Update utils.py by @weiwangmeta in #1177
- [tests/TF/build] enable missing classification onnx tests and set tensorflow lower bound to 2.11 by @felixT2K in #1182
- [build] update pytorch dependency by @felixT2K in #1188
- [build] drop py3.6/3.7 support and update CI default to py3.8/3.9 by @felixT2K in #1184
- [CI] change old cache action and skip TF classification onnx export temporarily by @felixT2K in #1201
- [Fix] add missing mean/std defaults, add missing weight init for sar by @felixT2K in #1212
- [classification] vit and magc_resnet checkpoints by @felixdittrich92 in #1221
- [tests] update test cases by @felixT2K in #1233
- chore: apply PIL major changes and increase min version specifier by @felixT2K in #1237
- [chore]: Pypdfium2 compatibility fix by @felixT2K in #1239
- [chore]: Replace
tensorflow_addons
by @felixdittrich92 in #1252 - [style] Fix markdown style warnings by @felixdittrich92 in #1260
- [docs] update export page to ONNX by @felixdittrich92 in #1261
- [PyPi] Fix image display by @felixdittrich92 in #1268
- [chore] increase version and update maintainers by @felixT2K in #1264
- [demo] update models list for Tf / PT backend by @felixdittrich92 in #1280
- [chore] update to new torchvision API in models as well by @felixT2K in #1291
- [chore]: clean dependencies by @felixT2K in #1287
- feat: ✨ TF Parseq checkpoint by @odulcy-mindee in #1305
- feat: ✨ TF ViTSTR Base checkpoint by @odulcy-mindee in #1306
- [docs] update benchmark page by @felixdittrich92 in #1234
New Contributors
- @dependabot made their first contribution in #1140
- @eltociear made their first contribution in #1119
- @khanfarhan10 made their first contribution in #1150
- @weiwangmeta made their first contribution in #1177
- @HamzaGbada made their first contribution in #1178
- @felixT2K made their first contribution in #1180
- @nikokks made their first contribution in #1205
- @odulcy made their first contribution in #1246
- @venkatapathy made their first contribution in #1289
Full Changelog: v0.6.0...v0.7.0
v0.6.0
Highlights of the release:
Note: doctr 0.6.0 requires either TensorFlow >= 2.9.0 or PyTorch >= 1.8.0.
Full integration with Huggingface Hub (docTR meets Huggingface)
- Loading from hub:
from doctr.io import DocumentFile
from doctr.models import ocr_predictor, from_hub
image = DocumentFile.from_images(['data/example.jpg'])
# Load a custom detection model from huggingface hub
det_model = from_hub('Felix92/doctr-torch-db-mobilenet-v3-large')
# Load a custom recognition model from huggingface hub
reco_model = from_hub('Felix92/doctr-torch-crnn-mobilenet-v3-large-french')
# You can easily plug in this models to the OCR predictor
predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
result = predictor(image)
- Pushing to the hub:
from doctr.models import recognition, login_to_hub, push_to_hf_hub
login_to_hub()
my_awesome_model = recognition.crnn_mobilenet_v3_large(pretrained=True)
push_to_hf_hub(my_awesome_model, model_name='doctr-crnn-mobilenet-v3-large-french-v1', task='recognition', arch='crnn_mobilenet_v3_large')
Documentation: https://mindee.github.io/doctr/using_doctr/sharing_models.html
Predefined datasets can be used also for recognition task
from doctr.datasets import CORD
# Crop boxes as is (can contain irregular)
train_set = CORD(train=True, download=True, recognition_task=True)
# Crop rotated boxes (always regular)
train_set = CORD(train=True, download=True, use_polygons=True, recognition_task=True)
img, target = train_set[0]
Documentation: https://mindee.github.io/doctr/using_doctr/using_datasets.html
New models (both frameworks)
- classification: VisionTransformer (ViT)
- recognition: Vision Transformer for Scene Text Recognition (ViTSTR)
Bug fixes recognition models
- MASTER and SAR architectures are now operational in both frameworks (TensorFlow and PyTorch)
ONNX support (experimential)
- All models can now be exported into ONNX format (only TF mobilenet left for 0.7.0)
NOTE: full production pipeline with ONNX / build is planned for 0.7.0 (the models can be only exported up to the logits without any post processing included)
Further features
- our demo is now also PyTorch compatible, thanks to @odulcy-mindee
- it is now possible to detect the language of the extracted text, thanks to @aminemindee
What's Changed
Breaking Changes 🛠
- feat: ✨ allow beam width > 1 in the CRNN postprocessor by @khalidMindee in #630
- [Fix] TensorFlow SAR_Resnet31 implementation by @felixdittrich92 in #925
New Features
- [onnx] classification models export by @felixdittrich92 in #830
- feat: Added Vietnamese entry in VOCAB by @calibretaliation in #878
- feat: Added Czech to the set of vocabularies in datasets/vocabs.py by @Xargonus in #885
- feat: Add ability to upload PT/TF models to Huggingface Hub by @felixdittrich92 in #881
- [feature][tf/pt] integrate from_hub for all tasks by @felixdittrich92 in #892
- [feature] Part 2 from use datasets for recognition by @felixdittrich92 in #891
- [datasets] Add MJSynth (Synth90K) by @felixdittrich92 in #827
- [docu]: add documentation for datasets by @felixdittrich92 in #905
- add a Slack Community badge by @fharper in #936
- Feat/add language detection by @aminemindee in #1023
- add ViT as classification model TF and PT by @felixdittrich92 in #1050
- [models] add ViTSTR TF and PT and update ViT to work as backbone by @felixdittrich92 in #1055
Bug Fixes
- [PyTorch][references] fix pretrained with different vocabs by @felixdittrich92 in #874
- [classification] Fix cfgs by @felixdittrich92 in #883
- docs: Fixed typo in installation instructions by @frgfm in #901
- [Fix] imgur5k test by @felixdittrich92 in #903
- fix: Fixed load_pretrained_params in PyTorch when ignoring keys by @frgfm in #902
- [Fix]: Documentation add missing in vocabs and correct tab in sharing models by @felixdittrich92 in #904
- Fix links in readme by @jsn5 in #937
- [Fix] PyTorch MASTER implementation by @felixdittrich92 in #941
- [Fix] MJSynth dataset: filter corrupted or missing images by @felixdittrich92 in #956
- [Fix] SVT dataset: clip box values and add shape and label check by @felixdittrich92 in #955
- [Fix] Tensorflow MASTER implementation by @felixdittrich92 in #949
- [FIX] MASTER AMP and onnxruntime issue with master PT by @felixdittrich92 in #986
- pytest-api test: fix ping server step by @odulcy-mindee in #997
- docs/index: fix two minor typos by @mara004 in #1002
- Fix orientation details export by @aminemindee in #1022
- Changed return type of multithread_exec to iterator by @mtvch in #1019
- [datasets] Fix recognition parts of SynthText and IMGUR5K by @felixdittrich92 in #1038
- [Fix] rotation classifier input move to model device by @felixdittrich92 in #1039
- [models] Vit: fix intermediate size scale and unify TF to PT by @felixdittrich92 in #1063
Improvements
- chore: Applied post release modifications v0.5.1 by @felixdittrich92 in #870
- [refactor][fix]: Part1 from use datasets for recognition task by @felixdittrich92 in #889
- ci: Add swagger ping in API CI job by @frgfm in #906
- [docs] Add naming conventions for upload models to hf hub by @felixdittrich92 in #921
- docs: Improved error message of encode_string by @frgfm in #929
- [Refactor] PyTorch SAR_Resnet31 make it ONNX exportable (again) by @felixdittrich92 in #930
- Add support page in README by @jonathanMindee in #946
- [references] Add eval recognition and update eval detection scripts by @felixdittrich92 in #933
- update pypdfium2 dep and improve code quality by @felixdittrich92 in #953
- docs: Moved need help section after code snippet by @frgfm in #959
- chore: Updated TF requirements to fix grouped convolutions on CPU by @frgfm in #963
- style: Fixed mypy and moved tool configs to pyproject.toml by @frgfm in #966
- Updating the readme by @Atomme1 in #938
- Update docs in
using_doctr
by @odulcy-mindee in #993 - feat: add a basic example of text detection by @ianardee in #999
- Add pytorch demo by @odulcy-mindee in #1008
- [build] move requirements to pyproject.toml by @felixdittrich92 in #1031
- Migrate static data from github to monitoring middleware. by @marvinmindee in #1033
- Changes needed to be able to use doctr on AWS Lambda by @mtvch in #1017
- [Fix] unify recognition dataset parts return signature by @felixdittrich92 in #1041
- Updated README.md for custom fonts by @carl-krikorian in #1051
- [refactor] detection script by @felixdittrich92 in #1060
- [models] ViT add checkpoints and some rework to use pretrained ViT backbone in ViTSTR by @felixdittrich92 in #1072
- upgrade pypdfium2 by @felixdittrich92 in #1075
- ViTSTR disable pretrained backbone by default by @felixdittrich92 in #1080
Miscellaneous
- [Refactor] commit tags by @felixdittrich92 in #871
- Update
io/pdf.py
to new pypdfium2 API by @mara004 in #944 - docs: Documentation the reason for keras version specifier by @frgfm in #958
- [datasets] update IC / SROIE / FUNSD / CORD by @felixdittrich92 in #983
- [datasets] revert whitespace filtering and fix svhn reco by @felixdittrich92 in #987
- fix: update tensorflow-addons to match tensorflow version by @ianardee in #998
- move transformers implementation to modules by @felixdittr...
v0.5.1
This minor release includes: improvement of the documentation thanks to @felixdittrich92, bugs fixed, support of rotation extended to Tensorflow backend, a switch from PyMuPDF to pypdfmium2 and a nice integration to the Hugginface Hub thanks to @fg-mindee !
Note: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.
Highlights
Improvement of the documentation
The documentation has been improved adding a new theme, illustrations, and docstring has been completed and developed.
This how it renders:
Rotated text detection extended to Tensorflow backend
We provide weights for the linknet_resnet18_rotation
model which has been deeply modified: We implemented a new loss (based on Dice Loss and Focal Loss), we changed the computation of the targets so that polygons are shrunken the same way they are in the DBNet which improves highly the precision of the segmenter and we trained the model preserving the aspect ratio of the images.
All these improvements led to much better results, and the pretrained model is now very robust.
Preserving the aspect ratio in the detection task
You can now choose to preserve the aspect ratio in the detection_predictor:
>>> from doctr.models import detection_predictor
>>> predictor = detection_predictor('db_resnet50_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)
This option can also be activated in the high level end-to-end predictor:
>>> from doctr.model import ocr_predictor
>>> model = ocr_predictor('linknet_resnet18_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)
Integration within the HugginFace Hub
The artefact detection model is now available on the HugginFace Hub, this is amazing:
On DocTR, you can now use the .from_hub()
method so that those 2 snippets are equivalent:
# Pretrained
from doctr.models.obj_detection import fasterrcnn_mobilenet_v3_large_fpn
model = fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)
and:
# HF Hub
from doctr.models.obj_detection.factory import from_hub
model = from_hub("mindee/fasterrcnn_mobilenet_v3_large_fpn")
Breaking changes
Replacing the PyMuPDF dependency with pypdfmium2 which is license compatible
We replaced for the PyMuPDF dependency with pypdfmium2 for a license-compatibility issue, so we loose the word and objects extraction from source pdf which was done with PyMuPDF. It wasn't used in any models so it is not a big issue, but anyway we will work in the future to re-integrate such a feature.
Full changelog
What's Changed
Breaking Changes 🛠
- fix: polygon orientation + line aggregation by @charlesmindee in #801
- refactor: Switched from PyMuPDF to pypdfium2 by @fg-mindee in #829
New Features
- feat: Added RandomHorizontalFLip in TF by @SiddhantBahuguna in #779
- Imgur5k dataset integration by @felixdittrich92 in #785
- feat: Added support of GPU for predictors in PyTorch by @fg-mindee in #808
- Add SynthWordGenerator to text reco training scripts by @felixdittrich92 in #825
- fix: Fixed some ResNet architecture imprecisions by @fg-mindee in #828
- feat: Added shadow augmentation for all backends by @fg-mindee in #811
- feat: Added loading method for PyTorch artefact detection models from HF Hub by @fg-mindee in #836
- feat: add rotated linknet_resnet18 tensorflow ckpts by @charlesmindee in #817
Bug Fixes
- fix: Fixed rotation of img + target by @fg-mindee in #784
- fix: show sample when batch size is 1 by @charlesmindee in #787
- ci: Fixed PR label check job by @fg-mindee in #792
- ci: Fixed typo in the script ref by @fg-mindee in #794
- [datasets] fix description by @felixdittrich92 in #795
- fix: linknet target computation by @charlesmindee in #803
- ci: Fixed issue templates by @fg-mindee in #806
- fix: Reverted mistake in demo by @fg-mindee in #810
- Restore remap boxes by @Rob192 in #812
- fix: Fixed SAR model for training and inference in PyTorch by @fg-mindee in #831
- fix: Fixed expand_line for horizontal & vertical cases by @fg-mindee in #842
- fix: Fixes inplace target modifications for AbstractDatasets by @fg-mindee in #848
- fix: Fixed landing page and title underlines by @fg-mindee in #860
- docs: Fixed HTML title by @fg-mindee in #864
Improvements
- docs: Updated headers of python files by @fg-mindee in #781
- [datasets] unify np_dtype and fix comments by @felixdittrich92 in #782
- fix: Clip in rotation transform + eval_straight mode for training by @charlesmindee in #786
- refactor: Avoids instantiating orientation predictor when unnecessary by @fg-mindee in #809
- feat: add straight-eval arg in evaluate script by @charlesmindee in #793
- feat: add dice loss in linknet by @charlesmindee in #816
- feat: add shrinked target in linknet + dilation in postprocessing by @charlesmindee in #822
- feat: replace bce by focal loss in linknet loss by @charlesmindee in #824
- docs: add rotation in docs by @charlesmindee in #846
- feat: add aspect ratio for ocr predictor by @charlesmindee in #835
- feat: add target to resize transform for aspect ratio training (detection task) by @charlesmindee in #823
- update bug report ticket with Active backend field by @felixdittrich92 in #853
- Theme + css #1 by @felixdittrich92 in #856
- docs: Adds illustration in the docstrings of doctr.datasets by @felixdittrich92 in #857
- docs: Updated docstrings of io, transforms & utils by @felixdittrich92 in #859
- docs: Updated folder hierarchy of doc source and nootbooks to rst file by @felixdittrich92 in #862
- Doc models #5 by @felixdittrich92 in #861
- fix: linknet hyperparameters postprocessing + demo for rotation model by @charlesmindee in #865
Miscellaneous
- chore: Applied post release modifications by @fg-mindee in #780
- Switch to new pypdfium2 API by @mara004 in #845
New Contributors
Full Changelog: v0.5.0...v0.5.1
v0.5.0: Skew-aware OCR & extended model/dataset zoo
This release adds support of rotated documents, and extends both the model & dataset zoos.
Note: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.
Highlights
🙃 😃 Rotation-aware text detection 🙃 😃
It's no secret: this release focus was to bring the same level of performance to rotated documents!
docTR is meant to be your best tool for seamless document processing, and it couldn't do without supporting a very natural & common augmentation of input documents. This large project was subdivided into three parts:
Straightening pages before text detection
Developing a heuristic-based method to estimate the page skew, and rotate it before forwarding it to any deep learning model. Our thanks to @Rob192 for his contribution on this part 🙏
This behaviour can be enabled to avoid retraining the text detection models. However, the heuristics approach has its limits in terms of robustness.
Text detection training with rotated images
The core of this project was to enable our text detection models to produce non-degraded heatmaps & localization candidates when processing a rotated page.
Crop orientation resolution
Finally, once the localization candidates have been extracted, there is no saying that this localization candidate will read from left to right. In order to remove this doubt, a lightweight image orientation classifier was added to refine the crops that will be sent to text recognition!
🦓 A wider pretrained classification model zoo 🦓
The stability of trainings in deep learning for complex tasks has mostly been helped by leveraging transfer learning. As such, OCR tasks usually require a backbone as a feature extractor. For this reason, all checkpoints of classification models in both PyTorch & TensorFlow have been updated 🚀
Those were trained using our synthetic character classification dataset, for more details cf. Character classification training
🖼️ New public datasets join the fray
Thanks to @felixdittrich92, the list of supported datasets has considerably grown 🥳
This includes widely popular datasets used for benchmarks on OCR-related tasks, you can find the full list over here 👉 #587
Synthetic text recognition dataset
Additionally, we followed up on the existing CharGenerator
by introducing WordGenerator
:
- generates an image of word of length randomly sampled within a specified range, with characters randomly sampled from the specified vocab.
- you can even pass a list of fonts so that each word font family is randomly picked among them
Below are some samples using a font_size=32
:
📑 New notebooks
Two new notebooks have made their way into the documentation:
- producing searchable PDFs from docTR analysis results
- introduction to document artefact detection (QR code, bar codes, ID pictures, etc.) with docTR
Breaking changes
Revamp of classification models
With the retraining of all classification backbones, several changes have been introduced:
- Model naming:
linknet16
-->linknet_resnet18
- Architecture changes: all classification backbones are available with a classification head now.
Enforcing relative coordinates in datasets
In order to unify our data pipelines, we forced the conversion to relative coordinates on all datasets!
0.4.1 | 0.5.0 |
---|---|
>>> from doctr.datasets import FUNSD >>> ds = FUNSD(train=True, download=True) >>> img, target = ds[0] >>> print(target['boxes'].dtype, target['boxes'].max()) (dtype('int64'), 862) |
>>> from doctr.datasets import FUNSD >>> ds = FUNSD(train=True, download=True) >>> img, target = ds[0] >>> print(target['boxes'].dtype, target['boxes'].max()) (dtype('float32'), 0.98341835) |
Full changelog
Breaking Changes 🛠
- refacto: 🔧 postprocessing with rotated boxes by @charlesmindee in #641
- refactor: Refactored LinkNet by @fg-mindee in #733
- refactor: Renamed DataLoader arg "workers" into "num_workers" by @fg-mindee in #737
- refactor: Unified return_preds flags across all tasks by @fg-mindee in #741
- refactor: Introduces img + target transforms in Datasets by @fg-mindee in #750
- refactor: refactoring rotated boxes by @charlesmindee in #731
- refactor: Enforced relative coordinates for all dataset geometries by @fg-mindee in #775
New Features
- SynthText dataset integration by @felixdittrich92 in #624
- [notebooks] add export_as_pdfa notebook by @felixdittrich92 in #650
- ICDAR2003 dataset integration by @felixdittrich92 in #653
- feat: Implements erosion & dilation in PyTorch & TF by @fg-mindee in #669
- Rotate page by @Rob192 in #488
- feat: Added option to use AMP with TF scripts by @fg-mindee in #682
- feat: Added support of FasterRCNN for PyTorch by @fg-mindee in #691
- ICDAR2013 dataset integration by @felixdittrich92 in #662
- feat: Added LR finder option in PyTorch training scripts by @fg-mindee in #703
- feat: Added line reading for source PDFs by @fg-mindee in #707
- feat: Added plot_samples support to visualize the images along with the targets by @SiddhantBahuguna in #704
- SVHN dataset integration by @felixdittrich92 in #634
- feat: Added checkpoint for obj_detection by @SiddhantBahuguna in #713
- feat: add classification module for crop orientation by @charlesmindee in #721
- feat: Added inference+post processing script for artefact detection by @SiddhantBahuguna in #728
- feat: Added latency evaluation scripts for all tasks by @fg-mindee in #746
- docs: Added colab link in the Read me for artefact detection by @SiddhantBahuguna in #755
- feat: Added LR Finder for TensorFlow scripts by @fg-mindee in #747
- feat: Added latency evaluation & benchmark for image classification by @fg-mindee in #757
- feat: Adds GaussianBlur, random font for CharGenerator and improves training scripts by @fg-mindee in #758
- feat: Added WordGenerator dataset by @fg-mindee in #760
- feat: Added dedicated evaluation scripts for text detection by @fg-mindee in #761
- feat: Refactored & retrained all classification models by @fg-mindee in #763
- feat: add rotated ckpts for pytorch DBNet + fix line resolution for rotated pages by @charlesmindee in #743
- feat: Added torchvision photometric augmentations in artefact detection training by @SiddhantBahuguna in #764
- feat: Added random noise augmentation to object detection by @SiddhantBahuguna in #654
- feat: add rotation option to both detection training scripts by @charlesmindee in #765
- feat: Added ChannelShuffle transformation and fixes RandomCrop by @fg-mindee in #768
- feat: Added Gaussian Noise implementation in Tensorflow by @SiddhantBahuguna in #771
- feat: Added Random Horizontal Flip augmentation by @SiddhantBahuguna in #773
- ci: Added release helper actions by @fg-mindee in #776
Bug Fixes
- docs: Fixed documentation build by @fg-mindee in #644
- fix: 🐛 bug canvas dtype for threshold target by @charlesmindee in #645
- fix: 🐛 assume_straight_pages in predictor by @charlesmindee in #647
- ci: Fixed silent isort failure by @fg-mindee in #655
- fix: Fixed W&B config log by @fg-mindee in #656
- fix: Updates Makefile to match CI by @fg-mindee in #661
- docs: Fixed typo in the docstrings of metrics by @fg-mindee in #664
- fix: rot...
v0.4.1: Enables AMP training and adds support of artefact object detection
This patch release brings the support of AMP for PyTorch training to docTR along with artefact object detection.
Note: doctr 0.4.1 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.
Highlights
Automatic Mixed Precision (AMP) ⚡
Training scripts with PyTorch back-end now benefit from AMP to reduce the RAM footprint and potentially increase the maximum batch size! This comes especially handy on text detection which require high spatial resolution inputs!
Artefact detection 🛸
Document understanding goes beyond textual elements, as information can be encoded in other visual forms. For this reason, we have extended the range of supported tasks by adding object detection. This will be focused on non-textual elements in documents, including QR codes, barcodes, ID pictures, and logos.
Here are some early results:
This release comes with a training & validation set DocArtefacts, and a reference training script. Keep an eye for models we will be releasing in the next release!
Get more of docTR with Colab tutorials 📖
You've been waiting for it, from now on, we will be adding regularly new tutorials for docTR in the form of jupyter notebooks that you can open and run locally or on Google Colab for instance!
Check the new page in the documentation to have an updated list of all our community notebooks: https://mindee.github.io/doctr/latest/notebooks.html
Breaking changes
Deprecated support of FP16 for datasets
Float-precision can be leveraged in deep learning to decrease the RAM footprint of trainings. The common data type float32
has a lower resolution counterpart float16
which is usually only supported on GPU for common deep learning operations. Initially, we were planning to make all our operations available in both to reduce memory footprint in the end.
However, with the latest development of Deep Learning frameworks, and their Automatic Mixed Precision mechanism, this isn't required anymore and only adds more constraints on the development side. We thus deprecated this feature from our datasets and predictors:
0.4.0 | 0.4.1 |
---|---|
>>> from doctr.datasets import FUNSD >>> ds = FUNSD(train=True, download=True, fp16=True) >>> print(getattr(ds, "fp16")) True |
>>> from doctr.datasets import FUNSD >>> ds = FUNSD(train=True, download=True) >>> print(getattr(ds, "fp16")) None |
Detailed changes
New features
- Adds Arabic to supported vocabs in #514 (@mzeidhassan)
- Adds XML export method to DocumentBuilder in #544 (@felixdittrich92)
- Adds flags to control the behaviour with rotated elements in #551 (@charlesmindee)
- Adds unittest to ensure headers are correct in #556 (@fg-mindee)
- Adds isort ordering & dedicated CI check in #557 (@fg-mindee)
- Adds IIIT-5K to supported datasets in #589 (@felixdittrich92)
- Adds support of AMP to all PyTorch training scripts in #604 (@fg-mindee)
- Adds DocArtefacts dataset for object detection on non-textual elements in #583 (@SiddhantBahuguna)
- Speeds up CTC decoding in PyTorch by x10 in #633 (@fg-mindee)
- Added train script for artefact detection in #593 (@SiddhantBahuguna)
- Added GPU support for classification and improve memory pinning in #629 (@fg-mindee)
- Added an object detection metric in #628 (@fg-mindee)
- Split DocArtefacts into subsets and updated its class mapping in #601 (@fg-mindee)
- Added README specific for the API with route examples in #612 (@fg-mindee)
- Added SVT dataset integration in #620 (@felixdittrich92)
- Added links to tutorial notebooks in the documentation in #619 (@fg-mindee)
- Added new architectures to model selection in demo in #600 (@fg-mindee)
- Add det/reco_predictor arch in
OCRPredictor.__repr__
in #595 (@RBMindee) - Improves coverage by adding missing unittests in #545 (@fg-mindee)
- Resolve both lines and blocks by default when building a doc in #548 (@charlesmindee)
- Relocated test/ to tests/ and made contribution process easier in #598 (@fg-mindee)
- Fixed Makefile by converting spaces to tabs in #615 (@fg-mindee)
- Updated flake8 config to spot unused imports & undefined variables in #623 (@fg-mindee)
- Adds 2 new rotation flags in the ocr_predictor in #632 (@charlesmindee)
Bug fixes
- Fixed evaluation script clipping issue in #522 (@charlesmindee)
- Fixed API template issues with new httpx version in #535 (@fg-mindee)
- Fixed TransformerDecoder for PyTorch 1.10 in #539 (@fg-mindee)
- Fixed a bug in resolve_lines in #537 (@charlesmindee)
- Fixed target computation in MASTER model (PyTorch backend) in #546 (@charlesmindee)
- Fixed portuguese entry in VOCAB in #571 (@fmobrj)
- Fixed header check typo in #557 (@fg-mindee)
- Fixed keras version constraint in #579 (@fg-mindee)
- Updated streamlit version in demo app in #611 (@charlesmindee)
- Updated environment collection script in #575 (@fg-mindee)
- Removed console print in builder in #566 (@fg-mindee)
- Fixed docstring and export as xml dim bug in #586 (@felixdittrich92)
- Fixed README instruction for page synthesis in #590 (@fg-mindee)
- Adds missing console log and removed Tensorboard in #626 (@fg-mindee)
- Fixed docstrings of datasets in #603 (@felixdittrich92)
- Fixed documentation build requirements in #549 (@fg-mindee)
Improvements
- Applied post release modifications in #520 (@fg-mindee)
- Updated benchmark entry of crnn_mobilenet_v3 small in #523 (@charlesmindee)
- Updated perf crnn_mobilenet_v3_large performances in doc (TF) in #526 (@charlesmindee)
- Added automatic detection of rotated bbox in training utils in #534 (@fg-mindee)
- Cleaned rotation transforms in #536 (@fg-mindee)
- Updated library name spelling in #541 (@fg-mindee)
- Updates README of detection training in #542 (@K-for-Code)
- Updated package index in #543 (@fg-mindee)
- Updated README in #555 (@fg-mindee)
- Updated CONTRIBUTING and issue templates in #560 (@fg-mindee)
- Removed unused imports and prevents XML attacks in #582 (@fg-mindee)
- Updated references to demo in README in #599 (@fg-mindee)
- Updated readme and help in analyze.py in #596 (@RBMindee)
- Specified that the API template only supports images for now in #609 (@fg-mindee)
- Updated command to install tf/pytorch build in #614 (@charlesmindee)
- Added checkpoint format to gitignore in #613 (@fg-mindee)
- Specified comment in SAR about symbol encoding in #617 (@fg-mindee)
- Drops support of np.float16 in #627 (@fg-mindee)
New Contributors
Our thanks & warm welcome to the following persons for their first contributions: @mzeidhassan @K-for-Code @felixdittrich92 @SiddhantBahuguna @RBMindee @thentgesMindee 🙏
Full Changelog: v0.4.0...v0.4.1
v0.4.0: Full support of PyTorch and a growing pretrained model zoo
This release brings the support of PyTorch out of beta, makes text recognition more robust, and provides light architectures for complex tasks.
Note: doctr 0.4.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.
Highlights
No more width limitation for text recognition
Some documents such as French ID card include very long strings that can be challenging to transcribe:
This release enables a smart split/merge strategy for wide crops to avoid performance drops. Previously the whole crop was analyzed altogether, while right now, it is split into reasonably sized crops, the inference is performed in batch then predictions are merged together.
The following snippet:
from doctr.io import DocumentFile
from doctr.models import ocr_predictor
doc = DocumentFile.from_images('path/to/img.png')
predictor = ocr_predictor(pretrained=True)
print(predictor(doc).pages[0])
used to yield:
Page(
dimensions=(447, 640)
(blocks): [Block(
(lines): [Line(
(words): [
Word(value='1XXXXXX', confidence=0.0023),
Word(value='1XXXX', confidence=0.0018),
]
)]
(artefacts): []
)]
)
and now yields:
Page(
dimensions=(447, 640)
(blocks): [Block(
(lines): [Line(
(words): [
Word(value='IDFRABERTHIER<<<<<<<<<<<<<<<<<<<<<<', confidence=0.49),
Word(value='8806923102858CORINNE<<<<<<<6512068F6', confidence=0.22),
]
)]
(artefacts): []
)]
)
Framework specific predictors
PyTorch support is now no longer in beta, so we made some efforts so that switching from one deep learning backend to another is unified 🙌 Predictors are designed to be the recommended interface for inference with your models!
0.3.1 (TensorFlow) | 0.3.1 (PyTorch) | 0.4.0 |
---|---|---|
>>> from doctr.models import detection_predictor >>> predictor = detection_predictor(pretrained=True) >>> out = predictor(doc, training=False) |
>>> from doctr.models import detection_predictor >>> import torch >>> predictor = detection_predictor(pretrained=True) >>> predictor.model.eval() >>> with torch.no_grad(): out = predictor(doc) |
>>> from doctr.models import detection_predictor >>> predictor = detection_predictor(pretrained=True) >>> out = predictor(doc) |
An evergrowing model zoo 🦓
As PyTorch goes out of beta, we have bridged the gap between PyTorch & TensorFlow pretrained models' availability. Additionally, by leveraging our integration of light backbones, this release comes with lighter architectures for text detection and text recognition:
- db_mobilenet_v3_large
- crnn_mobilenet_v3_small
- crnn_mobilenet_v3_large
The full list of supported architectures is available 👉 here
Demo live on HuggingFace Spaces
If you have enjoyed the Streamlit demo, but prefer not to run in on your own hardware, feel free to check out the online version on HuggingFace Spaces:
Courtesy of @osanseviero for deploying it, and HuggingFace for hosting & serving 🙏
Breaking changes
Deprecated crnn_resnet31 & sar_vgg16_bn
After going over some backbone compatibility and re-assessing whether all combinations should be trained, DocTR is focusing on reproducing the paper's authors' will or improve upon it. As such, we have deprecated the following recognition models (that had no pretrained params): crnn_resnet31
, sar_vgg16_bn
.
Deprecated models.export
Since doctr.models.export
was specific to TensorFlow and it didn't bring much more value than TensorFlow tutorials, we added instructions in the documentation and deprecated the submodule.
New features
Datasets
Resources to access data in efficient ways
- Added entry in vocabs for Portuguese #464 (@fmobrj), English, Spanish & German #467 (@fg-mindee), ancient Greek #500 (@fg-mindee)
IO
Features to manipulate input & outputs
- Added
.synthesize
method toPage
andDocument
#472 (@fg-mindee)
Models
Deep learning model building and inference
- Add dynamic crop splitting for wide inputs to recognition models #465 (@charlesmindee)
- Added MobileNets with rectangular pooling #483 (@fg-mindee)
- Added pretrained params for
db_mobilenet_v3_large
#485 #487 ,crnn_vgg16_bn
#487,db_resnet50
#489,crnn_mobilenet_v3_small
&crnn_mobilenet_v3_small
#517 #516 (@charlesmindee)
Utils
Utility features relevant to the library use cases.
- Added automatic font resolution function #472 (@fg-mindee)
Transforms
Data transformations operations
- Added
RandomCrop
transformation #448 (@charlesmindee)
Test
Verifications of the package well-being before release
- Added a unittest for
RandomCrop
#448 (@charlesmindee) - Added a unittest for crop split/merge in recognition models #465 (@charlesmindee)
- Added unittests for PyTorch OCR model zoo #499 (@fg-mindee)
Documentation
Online resources for potential users
- Added entry for
RandomCrop
#448 (@charlesmindee) - Added explanations about model export / compression #463 (@fg-mindee)
- Added benchmark entry for
db_mobilenet_v3_large
#485 in the documentation (@charlesmindee) - Added badge with hyperlink to HuggingFace Spaces demo #501 (@osanseviero)
References
Reference training scripts
- Added option to select vocab in the training of character classification and text recognition #502 (@fg-mindee)
Others
Other tools and implementations
- Added CI job to validate the demo, the evaluation script and the environment collection scripts #456 (@fg-mindee), the character classification training script #457 (@fg-mindee), the analysis & evaluation scripts in PyTorch #458 (@fg-mindee), the text recognition scripts #469 (@fg-mindee), the text detection scripts #491 (@fg-mindee)
- Added support of PyTorch for the analysis & evaluation scripts #458 (@fg-mindee)
Bug fixes
Datasets
- Fixed submodule import #451 (@fg-mindee )
- Added missing characters in French vocab #467 (@fg-mindee)
Models
- Fixed PyTorch preprocessor shape resolution #453 (@charlesmindee)
- Fixed Tensor cropping for channels_first format #458 #461 (@fg-mindee)
- Replaced recognition models' MobileNet backbones by their rectangular pooling counterparts #483 (@fg-mindee)
- Fixed crop extraction for PyTorch tensors #484 (@charlesmindee)
- Fixed crop filtering on multi-page inference #497 (@fg-mindee)
Transforms
- Fixed rounding errors in
RandomCrop
#473 (@fg-mindee)
Utils
- Fixed page synthesis for characters outside of latin-1 #496 (@fg-mindee)
Documentation
- Fixed READMEs of training scripts #504 #491 (@fg-mindee)
References
- Fixed the requirements of the training scripts #494 #491 (@fg-mindee)
Others
- Fixed the requirements of the streamlit demo #492 (@osanseviero), the API template #494 (@fg-mindee)
Improvements
Datasets
- Merged
DocDataset
&OCRDataset
#474 (@charlesmindee) - Updated
DetectionDataset
label format #491 (@fg-mindee)
Models
- Deprecated
doctr.models.export
#463 (@fg-mindee) - Deprecated
crnn_resnet31
&sar_vgg16_bn
recognition models #468 (@fg-mindee) - Relocated
DocumentBuilder
todoctr.models.builder
, split predictor into framework-specific objects #481 (@fg-mindee) - Added more robust argument checks in
DocumentBuilder
& refactored crop preparation and result processing in ocr predictors #497 (@fg-mindee) - Reflected changes of detection target formats on detection models #491 (@fg-mindee)
Utils
- Improved page synthesis with dynamic font size #472 (@fg-mindee)
Documentation
- Updated README badge & added release-specific documentation index #451 (@fg-mindee)
- Added logo in README & documentation #459 (@charlesmindee)
- Updated hyperlink to documentation in the README #462 (@fg-mindee)
- Updated vocab description in the documentation #467 (@fg-mindee)
- Added favicon in the documentation #466 (@fg-mindee)
- Removed benchmark entry of deprecated models #468 (@fg-mindee)
- Updated README of the text recognition training script #469 (@fg-mindee)
- Updated performance benchmark with crop splitting #471 (@charlesmindee)
- Added page synthesis example in README #472 (@fg-mindee)
- Made copyright mention dynamic, improved the landing & installation pages in the documentation #475 (@fg-mindee)
- Restructured the documentation #519 (@fg-mindee)
Tests
- Removed legacy unittests of
doctr.models.export
#463 (@fg-mindee) - Removed unittests for deprecated models #468 (@fg-mindee)
- Updated unittests with the new
doctr.utils.font
submodule #472 (@fg-mindee) - Reflected changes from predictor refactor #481 (@fg-mindee)
- Extended unittest of crop extraction #484 (@charlesmindee)
- Reflected changes from predictor crop preparation improvement #497 (@fg-mindee)
- Reflect changes from detection target format #491 (@fg-mindee)
References
- Reflected changes of detection dataset target format #491 (@fg-mindee)
Others
- Specified import of file_utils #447 (@zalakbhalani)
- Updated package version #451 (@fg-mindee)
- Updated PIL version constraint to fix vulnerability #460 (@fg-mindee)
- Updated model selection in the demo #468 (@fg-mindee)
- Removed some MacOS CI jobs that were slowing down PR checks #470 (@fg-mindee)
- Reflected page synthesis ...