-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decrypting pdf owner password fails - treats /O value as unicode #557
Comments
I tried to reproduce it with the following, but that worked fine. Is the issue gone? Create PDF
Codefrom PyPDF2 import PdfReader
with open("output.pdf", "rb") as f:
reader = PdfReader(f)
reader.decrypt("any_password") |
Tried it here, and it does now seem to run without reporting an error. However, I was a bit suspicious, since on debugging through, the Debugging down to Eg. in my test case, plantext ended up here as
(Note: the type is a This has codepoints beyond 256 (eg. That seems like it's definitely going to mangle the decrpytion: It's only XORed the low byte of a unicode codepoint we've incorrectly content-sniffed from some binary data, then translated that to utf8. The way it processes things has changed so that it no longer gives an error, but I think it's actually worse now in that it's silently mangling data and reporting success. That said, I'm not too familiar with owner passwords or how you'd even test they're correctly decrypted (I think they may just be client-enforced anyway. And from what I can see, PyPDF2 doesn't actually seem to be validating the password anyway, as you can give it the completely wrong password and decrypyt will not still not raise an error, so it may be a bit of a moot point unless that's actually checked at some point. (by comparison, qpdf will fail with anything except the correct password) (For reference, I also tried saving the decrpyted file with PdfWriter and comparing against qpdf --decrypt. The file does end up different in size etc from qpdf's version , but that may not be unexpected given differences in implementation. It does appear valid, and readable, and "qpdf --check" reports it's valid and unencrypted. However, this is also true when you give completely the wrong password, so potentially it's just stripping out the encrypted section and everything it does inside decrypt is completely irrelevant for owner passwords anyway) In any case, I still think the simplest fix is something like changing
To:
To cover the cases where encrypt["/O"] randomly gets wrongly autodetected as a unicode string, instead of binary bytes. But given nothing done here seems to matter to the end result, it may not be too relevant unless that changes. |
@MartinThoma this was fixed by #749 |
This came up from a question on learnpython that looks to be due to a bug in PyPDF2 on python 3. When decrypting an owner password, RC4_encrypt gets passed the value of encrypt["/O"] as a TextStringObject, so it treats it as unicode when decrypting, meaning it can end up trying to treat multibyte characters as bytes, resulting in a UnicodeDecodeError.
User passwords seem to work fine, presumably since it uses the .original_bytes value of the object, and changing the code to use this instead (ie
real_O = encrypt["/O"].getObject().original_bytes
resolves it for the owner case too.Steps to reproduce:
created a rev3 encrypted pdf with:
Then tried to decrypt with:
Results in:
(Edit): Actually, looking further, it seems like this bug will only be triggered when it incorrectly interprets the "/O" value as being unicode which will depend on the password being used. This seems to be down to
createStringObject
trying to interpret it as such if it possibly can. If the password just happens to generate a block here with no sub-24 control codes (which for a 32 byte string will happen around 4% of the time), it'll interpret it as unicode and return a TextStringObject instead of a ByteStringObject.The text was updated successfully, but these errors were encountered: