Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecursionError occurs while processing #18

Open
ChillarAnand opened this issue Oct 10, 2017 · 5 comments
Open

RecursionError occurs while processing #18

ChillarAnand opened this issue Oct 10, 2017 · 5 comments
Labels

Comments

@ChillarAnand
Copy link
Contributor

  File "/home/chillaranand/projects/ocr/banti_telugu_ocr/banti/linegraph.py", line 47, in process_node
    self.process_node(chld_id)
  File "/home/chillaranand/projects/ocr/banti_telugu_ocr/banti/linegraph.py", line 47, in process_node
    self.process_node(chld_id)
  File "/home/chillaranand/projects/ocr/banti_telugu_ocr/banti/linegraph.py", line 43, in process_node
    logd("Processing in {}".format(idx))
  File "/usr/lib/python3.5/logging/__init__.py", line 1266, in debug
    if self.isEnabledFor(DEBUG):
  File "/usr/lib/python3.5/logging/__init__.py", line 1519, in isEnabledFor
    if self.manager.disable >= level:
RecursionError: maximum recursion depth exceeded in comparison
@rakeshvar
Copy link
Collaborator

I think this is from the image having too many broken components.
Can you send me the image?

@rakeshvar rakeshvar added the bug label Oct 11, 2017
@ChillarAnand
Copy link
Contributor Author

ChillarAnand commented Oct 14, 2017

Image quality looks bad. https://www.dropbox.com/s/xdg2w0o6kn4t4hp/page.png?dl=0

@rakeshvar
Copy link
Collaborator

The image quality is OK. The program is unable to estimate the line height properly. It is being thrown off by the huge empty space around the text. It works on the zealous cropped image. I thought there was code to detect this may be it is in Chamanti OCR.
When you get such error use the following command to see how segmentation is working.

python3 tests/page_test.py sample_images/purugulu_crop.png 

@rakeshvar
Copy link
Collaborator

rakeshvar commented Oct 15, 2017

There is a function in the class Line in banti/page.py called sanity_check() which will check for such bad cases. It is not being used to throw an error now. I can fix this. But it still remains to see why we are not able to segment this properly.

@rakeshvar
Copy link
Collaborator

After reviewing the Fourier Transforms etc. Here is what I found.

  1. There is a bug in that: we need to be able to detect when multiple lines are being detected as one. Then do we reprocess the page or just ignore this line?
  2. In this case however the failure to detect lines properly is because of the huge gap above and below the text. Our program thinks we are in a scenario where there is just one line of text with proportionate space above and below. So to avoid cases like these, we could auto-crop images before processing (in banti or seperately)
  3. We could use the sanity check to see if a line has too many words stacked in it. if it does, then we will need to split it or go back to page level and reprocess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants