You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PDF files are binary files. As per the documentation of buffer.toString, the returned string may not be an accurate representation of binary data if it wasn't a valid UTF8 code point sequence in the first place:
If encoding is 'utf8' and a byte sequence in the input is not valid UTF-8, then each invalid byte is replaced with the replacement character U+FFFD.
Calling buffer.toString('utf8') is only safe and will only retain the original binary representation if the buffer was a valid UTF8 string. PDF files are not UTF8 strings.
When you convert the resulting string back to a Buffer, it is impossible to restore the original binary data from the string.
Version
14.14.0
Platform
Linux matt 4.15.0-163-generic nodejs/node#171-Ubuntu SMP Fri Nov 5 11:55:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
No response
What steps will reproduce the bug?
How often does it reproduce? Is there a required condition?
Always with PDF files
What is the expected behavior?
Buffer of PDF file must be converted into string utf8, then converted again in the same inital buffer
What do you see instead?
The two buffer, one before conversion and one after conversion, are different.
Additional information
File pdf can be downloaded from this url.
When download, rename it to "file.pdf" in order to run the provided snippet code.
The text was updated successfully, but these errors were encountered: