Skip to content
This repository has been archived by the owner on Jan 6, 2025. It is now read-only.

Problem with "carriage return" inside a cell #192

Closed
aborruso opened this issue Nov 3, 2018 · 2 comments
Closed

Problem with "carriage return" inside a cell #192

aborruso opened this issue Nov 3, 2018 · 2 comments
Labels
Milestone

Comments

@aborruso
Copy link

aborruso commented Nov 3, 2018

Hi,
first of all thank you for this great tool.

I have a PDF with come "carriage return" inside cells (I'm attaching it).

image

If I run camelot -f csv -o output.csv lattice input.pdf the cells output does not have any "carriage return", then I have a partially unusable output because in example I have [email protected]@fondazionelavoro.it and not [email protected]\[email protected], and than it's difficult to find a way to split the cell content.

I'm using camelot-py-0.3.1. Is there a parameter to pass to solve this kind of problem?

Thank you

@vinayak-mehta
Copy link
Contributor

Hey @aborruso, thanks for the report. Currently, each text line is stripped off newlines before assigning it to a cell, which can be undesired behavior in cases such as above. I'm labeling current behavior as a bug since everything should be extracted from a PDF as is. Expect a fix soon!

@vinayak-mehta
Copy link
Contributor

@aborruso This is fixed on master now. After #229, this will be more configurable.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants