-
Notifications
You must be signed in to change notification settings - Fork 942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
object of type 'PSKeyword' has no len() #617
Comments
I have the same issue for some PDFs (which also unfortunately contain sensitive information).
For the PDFs, the issue appears to be that some of the A quick and dirty hack which fixes the issue for me, is to insert the following two lines into
This solution is evidently not ideal, as it does not answer why there is a I will endeavour to find out why |
A seemingly more proper fix than the hack above, is to add in the case Namely, at line 309 of
I am not familiar enough with the PDF specification to explain why this edge case is required. However, this fixes the issue for me, and seems more proper than the previous fix proposed above. |
|
* Ignore null characters in PSBaseParser Beforehand, null characters were encoded as PSKeyword tokens. This caused issue #617, as pdfdevice.py would attempt to decode the null character PSKeyword, when it expects a byte string, as opposed to a PSKeyword, causing pdfminer.six to crash. As null characters are superfluous within PSBaseParser, ignore them. * Update CHANGELOG.md Co-authored-by: Pieter Marsman <[email protected]>
Bug report
I'm using paperless-ng, a document archiving system, which uses pdfminer under the hood to extract information from the pdfs added to it. I have filed a bug with the author already (=> jonaswinkler/paperless-ng#981), but seeing that this is a pdfminer issue he sent me here.
This is the exception and stack trace:
The version of pdfminer.six used in paperless is
20201018
.Unfortunately the pdfs contains sensitive information so I'm not comfortable with sharing an example publicly. I hope the stack trace allows locating the issue and handling the exception gracefully.
The text was updated successfully, but these errors were encountered: