You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My understanding of the structure of PDF files is that there is no way one could guarantee the correct encoding of text extracted from PDF in anything other than Unicode. Not only encodings can be defined for each individual text block within a single PDF file, it can even contain embedded fonts. Can we just do Encoding(out) <- "UTF-8" and remove the argument?
This seems really challenging given the quirkiness of PDF format, but is the big issue to left to implement from rOpenSci onboarding
The text was updated successfully, but these errors were encountered: