-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HOCRDocProprocessor and HocrVisualParser #519
Conversation
Codecov Report
@@ Coverage Diff @@
## master #519 +/- ##
==========================================
+ Coverage 85.81% 86.03% +0.22%
==========================================
Files 90 92 +2
Lines 4582 4769 +187
Branches 852 896 +44
==========================================
+ Hits 3932 4103 +171
- Misses 467 475 +8
- Partials 183 191 +8
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Can we get all conflicts resolved first? Thanks! |
This is a sentence. "This" is another sentence. is split into the following two sentences: This is a sentence. " This" is another sentence.
I rebased on the master branch and resolved the conflicts. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits.
@lukehsiao thank you so much for reviewing such a big PR and your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I think @senwu wanted to take a look to, so I'll wait for him. But LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Let's plan to have a tutorial about this PR. This is a really awesome improvement!!! Thoughts? @HiromuHota @lukehsiao |
I'll definitely update the existing tutorials.
|
That works as well. We need to show two things: 1) some basics about what's in it and how to use it, and 2) end to end run with high quality. |
An update: I've started replacing html with hocr (html -> pdfy -> pdf -> pdftotree -> hocr) in the wikipedia tutorial. |
Description of the problems or issues
Is your pull request related to a problem? Please describe.
This is the second patch that follows #518 .
Does your pull request fix any issue.
N/A.
Description of the proposed changes
Add
HOCRDocProprocessor
andHocrVisualParser
Test plan
I added a few real hOCR example files.
Checklist