You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For me, the preserve newline behaviour isn't quite working as I expected (tested with the docx extractor).
I have text like this in a docx file:
2 downlighters; door to hall.
Hall
Double glazed window to front;
With preserveLineBreaks I get this output:
2 downlighters; door to hall. Hall
Double glazed window to front;
After outputting some stuff to the console I can see the newlines are there as expected but then they get parsed out.
Taking a look at how preserveLineBreaks is implemented I see it's a big, hairy regex, so not sure what it is doing at first glance. From my naive point of view it would be nicer to get the raw text output, if I need to filter further I can make my own mind. Or if there is a 'clean' function as a configuration option I could use it to override the default behaviour.
The text was updated successfully, but these errors were encountered:
For me, the preserve newline behaviour isn't quite working as I expected (tested with the docx extractor).
I have text like this in a docx file:
With preserveLineBreaks I get this output:
After outputting some stuff to the console I can see the newlines are there as expected but then they get parsed out.
Taking a look at how
preserveLineBreaks
is implemented I see it's a big, hairy regex, so not sure what it is doing at first glance. From my naive point of view it would be nicer to get the raw text output, if I need to filter further I can make my own mind. Or if there is a 'clean' function as a configuration option I could use it to override the default behaviour.The text was updated successfully, but these errors were encountered: