-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After the pdfplumber program is packaged into an exe(py2exe), some pdfs cannot recognize the content #615
Comments
Hi @StruggleYang, and thanks for your interest in this library. I'm not very familiar with |
Hi @jsvine , thanks for getting back to me on this issue, like you said, I also suspected it was a log encoding issue. So I've made some attempts to change the log to stdout or to a file, but still the same problem. I have written dedicated test code that only includes pdfplumber to exclude other dependencies from affecting the results. Below is my new test code # coding:utf-8
import time
import pdfplumber
import os
if __name__ == '__main__':
print("hello")
user_path = os.path.expanduser('~')
print(user_path)
with pdfplumber.open(os.path.join(user_path, "working", "pdftest", "test.pdf")) as pdf:
for page in pdf.pages:
all_content = page.extract_text(x_tolerance=0, y_tolerance=0)
tables = page.extract_table()
print(page.page_number)
print(tables)
print(all_content)
print("end")
time.sleep(10) # Make sure to run after packaging to see consistent results
I have the full test code and packaging script, and a pdf that reproduces the problem, if you can help me, I can mail it to you after packaging. Because the document involves privacy, it is not convenient for me to disclose it. If there is no way, then I can only think of other alternatives. But that could be a fresh start. |
Thank you for sharing those details. One other thing to test: Do you see the same problems when using One other way to get closer to the answer: What is the result of |
Thank you for your reply, I will perform the test you said and reply the results. However, it is worth mentioning that I just changed the packaging method to Perhaps because of the different principles of packaging tools, some dependence is lacking when packaging. I will slowly compare their differences when I am free. After performing the test you said, maybe I will close this problem, because I have found that the problem may not be in the current warehouse (I think there is still necessary test to position to the truth). Maybe I will go to |
Closing this issue due to inactivity and factors that appear to be outside of |
Describe the bug
Code to reproduce the problem
I write like most pdfplumber programs.
I don't see any problem as the code runs fine,The problem is inconsistent behavior after packaging.
PDF file
I'm sorry.
The text I read is about privacy and I can't provide it conveniently, but if a maintainer follows up on this issue, I'd be willing to email him.
Expected behavior
For the same file, I want the code runner and package runner to recognize the same results
Actual behavior
As I mentioned above, when the source code is running and when it is packaged as an exe, the results of some file recognition are inconsistent; it is worth mentioning that everything is normal through py2app, only py2exe will have such a problem
Screenshots
Environment
Additional context
Also, I tried it both on my personal Windows 10 computer and in a virtual machine (windows 10) running in MacOS. The results were also disappointing.
The text was updated successfully, but these errors were encountered: