Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preserveLineBreaks but only if they are in pairs? #156

Closed
tansaku opened this issue May 8, 2018 · 2 comments
Closed

preserveLineBreaks but only if they are in pairs? #156

tansaku opened this issue May 8, 2018 · 2 comments
Labels

Comments

@tansaku
Copy link

tansaku commented May 8, 2018

Apologies if this is a silly feature request, but we're working on converting some PDF and word documents. The PDF files seem to have line breaks in the middle of sentences (where they hit the edge of the page), while the word files do not. Both types of files have double (or paired) line breaks where there are paragraphs.

We'd really like to preserve the paragraphs, but eliminate the single line breaks. Would it make sense to have another option like preserveMultipleLineBreaks or something?

@tansaku
Copy link
Author

tansaku commented May 8, 2018

I've managed to achieve the effect that we would like by inserting the following:

text = text.replace(/(^|[^\n])\n(?!\n)/g, '$1');

prior to the following line in extract.js

text = text.replace( WHITELIST_PRESERVE_LINEBREAKS, ' ' );

so maybe a STRIP_ONLY_SINGLE_LINEBREAKS option to have this run before the WHITELIST_PRESERVE_LINEBREAKS operation ...

I could pop in a pull request if this made sense ...?

@dbashford
Copy link
Owner

This isn't an outrageous ask. Probably going to run through another batch of tickets soon and get 2.3 out the door, I can toss this in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants