Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing 2812 #2818

Merged
merged 2 commits into from
Nov 19, 2023
Merged

Fixing 2812 #2818

merged 2 commits into from
Nov 19, 2023

Conversation

JorjMcKie
Copy link
Collaborator

This fixes #2812.
We did not handle correctly tables present on rotated pages - for details see the issue.
The problem has been addressed by correctly setting the page dimension in the clip for vector graphics detection and by using genuine PyMuPDF code for cell text extraction.

Tables on pages with other than rotation 0 were not detected and extracted correctly.
This was due to incorrectly setting the clip parameter and to pfplumber's issues dealing with characters extracted by PyMuPDF.
We did not properly support tables on rotated pages for  number of causes.
This fix now correctly handles the clip area to look for tables and replaces cell text extraction with original PyMuPDF code.
@JorjMcKie JorjMcKie merged commit 6a1f25c into main Nov 19, 2023
2 checks passed
@JorjMcKie JorjMcKie deleted the Fixing-2812 branch November 19, 2023 15:36
@github-actions github-actions bot locked and limited conversation to collaborators Nov 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

find_tables on landscape page generates reversed text
2 participants