Unable to extract text from pdf using pdfplumber #1268

OIDeveloper · 2025-01-31T05:52:03Z

OIDeveloper
Jan 31, 2025

Hi Team,
We are encountering an issue while processing the attached PDF using pdfplumber. Instead of extracting the actual text, we are receiving "cid" values. Could you please advise on how we can resolve this issue?
Attaching pdf along with the text extracted through pdfplumber.

example.pdf

jsvine · 2025-02-09T22:00:50Z

jsvine
Feb 9, 2025
Maintainer

Hi @OIDeveloper, the (cid:#) texts come from the PDF missing a mapping of the font characters to unicode. If you're curious, you can read more here:

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to extract text from pdf using pdfplumber #1268

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Unable to extract text from pdf using pdfplumber #1268

OIDeveloper Jan 31, 2025

Replies: 1 comment

jsvine Feb 9, 2025 Maintainer

OIDeveloper
Jan 31, 2025

jsvine
Feb 9, 2025
Maintainer