Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SynthWordGenerator to text reco training scripts #825

Merged
merged 19 commits into from
Feb 22, 2022

Conversation

felixdittrich92
Copy link
Contributor

@felixdittrich92 felixdittrich92 commented Feb 18, 2022

After some feedback now this PR integrates only the WordGenerator into the training scripts 😄

Any feedback is welcome 😃

@codecov
Copy link

codecov bot commented Feb 18, 2022

Codecov Report

Merging #825 (8403853) into main (41237e9) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #825   +/-   ##
=======================================
  Coverage   95.97%   95.97%           
=======================================
  Files         131      131           
  Lines        4988     4988           
=======================================
  Hits         4787     4787           
  Misses        201      201           
Flag Coverage Δ
unittests 95.97% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
doctr/transforms/modules/base.py 94.59% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 41237e9...8403853. Read the comment docs.

@felixdittrich92 felixdittrich92 changed the title [WIP] Improve synth word generation + fix german vocab Improve synth word generation + fix german vocab Feb 18, 2022
@fg-mindee fg-mindee self-assigned this Feb 18, 2022
@fg-mindee fg-mindee added module: datasets Related to doctr.datasets topic: character classification Related to the task of character classification topic: text recognition Related to the task of text recognition labels Feb 18, 2022
Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot Felix!

Would you mind just making this PR about fixing the german vocab please?
And about your suggestion to replace the character by a double S, I think it's a dangerous idea because a vocab element = a theoretical character = a computer character so far. With this, we'd be breaking this rule.

doctr/datasets/generator/base.py Outdated Show resolved Hide resolved
doctr/datasets/generator/pytorch.py Outdated Show resolved Hide resolved
doctr/datasets/vocabs.py Outdated Show resolved Hide resolved
@fg-mindee fg-mindee added type: bug Something isn't working and removed topic: text recognition Related to the task of text recognition topic: character classification Related to the task of character classification labels Feb 18, 2022
@fg-mindee fg-mindee added this to the 0.5.1 milestone Feb 18, 2022
@fg-mindee fg-mindee mentioned this pull request Feb 18, 2022
9 tasks
@felixdittrich92 felixdittrich92 changed the title Improve synth word generation + fix german vocab [WIP] Add SynthWordGenerator to text reco training scripts Feb 18, 2022
@felixdittrich92 felixdittrich92 marked this pull request as draft February 18, 2022 21:50
Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! For now, it would be good to have a similar transformation set as other recognition tasks (cf. my comments). Let me know what you think 👌

doctr/transforms/functional/tensorflow.py Outdated Show resolved Hide resolved
references/recognition/train_tensorflow.py Outdated Show resolved Hide resolved
references/recognition/train_tensorflow.py Outdated Show resolved Hide resolved
@felixdittrich92 felixdittrich92 marked this pull request as ready for review February 20, 2022 22:06
Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the edits! Still a few things to address and we'll be good to go!

references/recognition/README.md Outdated Show resolved Hide resolved
references/recognition/train_pytorch.py Outdated Show resolved Hide resolved
references/recognition/train_pytorch.py Show resolved Hide resolved
references/recognition/train_pytorch.py Outdated Show resolved Hide resolved
references/recognition/train_pytorch.py Outdated Show resolved Hide resolved
@felixdittrich92 felixdittrich92 changed the title [WIP] Add SynthWordGenerator to text reco training scripts Add SynthWordGenerator to text reco training scripts Feb 21, 2022
Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Felix!

Final adjustments to make (my comments also apply to the tensorflow script) :)

references/recognition/train_pytorch.py Outdated Show resolved Hide resolved
references/recognition/train_pytorch.py Outdated Show resolved Hide resolved
references/recognition/train_pytorch.py Outdated Show resolved Hide resolved
references/recognition/train_pytorch.py Outdated Show resolved Hide resolved
references/recognition/train_pytorch.py Outdated Show resolved Hide resolved
@fg-mindee fg-mindee added ext: references Related to references folder framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend topic: text recognition Related to the task of text recognition type: new feature New feature and removed type: bug Something isn't working module: datasets Related to doctr.datasets labels Feb 22, 2022
Copy link
Contributor

@fg-mindee fg-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Felix 🙏

@fg-mindee fg-mindee merged commit 51dc49b into mindee:main Feb 22, 2022
@felixdittrich92 felixdittrich92 deleted the reco-ref branch February 22, 2022 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ext: references Related to references folder framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend topic: text recognition Related to the task of text recognition type: new feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants