Skip to content

Commit

Permalink
feat: add rotation in docs
Browse files Browse the repository at this point in the history
  • Loading branch information
charlesmindee committed Mar 9, 2022
1 parent 9b31588 commit fdaa811
Showing 1 changed file with 34 additions and 0 deletions.
34 changes: 34 additions & 0 deletions docs/source/using_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Text Detection

The task consists of localizing textual elements in a given image.
While those text elements can represent many things, in docTR, we will consider uninterrupted character sequences (words). Additionally, the localization can take several forms: from straight bounding boxes (delimited by the 2D coordinates of the top-left and bottom-right corner), to polygons, or binary segmentation (flagging which pixels belong to this element, and which don't).
Our latest detection models works with rotated and skewed documents!

Available architectures
^^^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -27,6 +28,10 @@ The following architectures are currently supported:
* `db_resnet50 <models.html#doctr.models.detection.db_resnet50>`_
* `db_mobilenet_v3_large <models.html#doctr.models.detection.db_mobilenet_v3_large>`_

We also provide 2 models working with any kind of rotated document:
* `linknet_resnet18_rotation <models.html#doctr.models.detection.linknet_resnet18_rotation>`_
* `db_resnet50_rotation <models.html#doctr.models.detection.db_resnet50_rotation>`_

For a comprehensive comparison, we have compiled a detailed benchmark on publicly available datasets:


Expand Down Expand Up @@ -60,6 +65,19 @@ Detection predictors
>>> dummy_img = (255 * np.random.rand(800, 600, 3)).astype(np.uint8)
>>> out = model([dummy_img])

You can pass specific boolean arguments into the predictor:

* `assume_straight_pages`: if you work with straight documents only, it will fit straight bounding boxes to the text areas.
* `preserve_aspect_ratio`: if you want to preserve the aspect ratio of your documents while resizing before sending them to the model.
* `symmetric_pad`: if you choose to preserve the aspect ratio, it will pas the image symmetrically and not from the bottom-right.

For instance, this snippet will instantiate a detection predictor able to detect text on rotated documents while preserving the aspect ratio:

>>> from doctr.models import detection_predictor
>>> predictor = detection_predictor('db_resnet50_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)

NB: for the moment, `db_resnet50_rotation` is pretrained in Pytorch only and `linknet_resnet18_rotation` in Tensorflow only.


Text Recognition
----------------
Expand Down Expand Up @@ -228,6 +246,22 @@ Those architectures involve one stage of text detection, and one stage of text r
>>> out = model([input_page])


You can pass specific boolean arguments into the predictor:

* `assume_straight_pages`
* `preserve_aspect_ratio`
* `symmetric_pad`

Those 3 are going straight to the detection predictor, as mentioned above (in the detection part).

* `export_as_straight_boxes`: If you work with rotated and skewed documents but you still want to export straight boundong boxes and not polygons, set to True.

For instance, this snippet instantiate a end-to-end ocr_predictor working with rotated documents, which preserves the aspect ratio of the doucments, and returns polygons:

>>> from doctr.model import ocr_predictor
>>> model = ocr_predictor('linknet_resnet18_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)


What should I do with the output?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down

0 comments on commit fdaa811

Please sign in to comment.