-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to extract table with invisible lines #123
Comments
Have you tried cropping to just the table you want to extract before running the extraction? |
yeah,I try,in vertical direction,Extracting the form is OK,is there ocr technique in pdfplumer?I also try camelot that use it ,but handle Chinese,there is a bug,I am confused! |
So pdfplumber "works best on machine-generated, rather than scanned, PDFs". There is no OCR capability. If you can share the actual PDF you are trying to extract a table from it can help with debugging the issue. |
430027-北科光大-2017年年度报告.pdf |
If you don't mind using an alpha version, you can switch to 0.6-alpha, and use 'snap_y_tolerance' when calling 'page.extract_tables()'. Please refer to 0.6-alpha documents and #51 for details. |
Thanks @luoqygit and @OisinMoran! Cleaning up old issues. Feel free to reopen @cqluohong if you'd like to continue the discussion. |
Hi I'm facing issues extracting the invisible tables too. I can't crop the page to specific coordinates of the table because I'm running the program on multiple PDFs where the table can appear in different positions. I tried it with these table_settings:
|
Like the table below, there are 4 columns per row, but I can't get the correct results when I use pdfpumber to extract.
The text was updated successfully, but these errors were encountered: