Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly Break Long Lines with Long and Short Words and breaknonspaceingroup/breakline Options #26

Closed
12357-314 opened this issue Aug 17, 2024 · 6 comments

Comments

@12357-314
Copy link

12357-314 commented Aug 17, 2024

Issue:

If there is a long line where the first "word" is longer than the line length and there is a short word at the end of the line, the short word will be put on its own line instead of going after the split long word. Is this intended behavior?

Inputs:

markdown.md

---
header-includes:
 -  \usepackage{fvextra}
 -  \DefineVerbatimEnvironment{Highlighting}{Verbatim}{
     breaknonspaceingroup,breaklines,breakanywhere,commandchars=\\\{\}}

---

```bash
sha512sum <<<"string"
346254a332b819b42186a57d6f31fc8fdb9ce81ec89c613f7331a434e73a8acfc7c53db66dc44d8dee65cccd6ea6a911e2638a2e5eed1ee2a4527325d8a29d41  -
```

markdown.tex

\documentclass{article}
\usepackage{fvextra}

\DefineVerbatimEnvironment{Highlighting}{Verbatim}
  {breaknonspaceingroup,breaklines,breakanywhere,commandchars=\\\{\}}

\begin{document}
    \begin{Highlighting}[]
        sha512sum <<<"string"
        346254a332b819b42186a57d6f31fc8fdb9ce81ec89c613f7331a434e73a8acfc7c53db66dc44d8dee65cccd6ea6a911e2638a2e5eed1ee2a4527325d8a29d41  -
    \end{Highlighting}
\end{document}

Commands:

pandoc markdown.md -o markdown-md.pdf
pdflatex markdown.tex -o markdown-tex.pdf

Outputs:

Markdown PDF Output:
image

Latex PDF Output:
image

Both the markdown and latex files produce similar PDFs, but I wanted to show this probably isn't a Pandoc issue. This issue is very similar to #11, except using the breaknonspaceingroup option does not seem to have the intended effect on the output.

Expected Results:

sha512sum <<<"string"
346254a332b819b42186a57d6f31fc8fdb9ce81ec89c613f7331a434e73a8acf ⌋
 ,→ c7c53db66dc44d8dee65cccd6ea6a911e2638a2e5eed1ee2a4527325d8a2 ⌋
 ,→ 9d41  -

The line at the end with the single dash - character should not be split onto its own line. It should remain with the previous line because the previous line is not too long. Is there an option that I have missed in the documentation which provides this behavior?

Software Versions:

Pandoc Version:

pandoc --version
pandoc 2.17.1.1
Compiled with pandoc-types 1.22.2.1, texmath 0.12.4, skylighting 0.12.3.1,
citeproc 0.6.0.1, ipynb 0.2
User data directory: /home/user/.local/share/pandoc
Copyright (C) 2006-2022 John MacFarlane. Web:  https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.

fvextra version:

\ProvidesPackage{fvextra}
    [2024/05/16 v1.7.0 fvextra - extensions and patches for fancyvrb]

fvextra was downloaded from the GitHub releases because the version provided by Debian was a little dated (1.5) and I wanted to see if that was the issue, but it was not. The log file produced by latex and pandoc indicate that the fvextra file being used is the one in my home directory and not the one in the system directory.

OS:
Linux Mint Debian Edition 6 (Faye)

Please let me know if there is any other information I should provide.

Thank you

@muzimuzhi
Copy link

muzimuzhi commented Aug 18, 2024

Found it's only reproducible with initial settings showspaces=false, breakcollapsespaces=true.

  • breaklines + breakanywhere/breakbefore/breakafter => reproducible
  • setting showspaces=false or breakcollapsespaces=true in addition => not reproducible

State of breakcollapsespaces option is only used when defining \FV@space to one of variants in macro \FV@DefFVSpace, the code branch in which showspaces is false and breaklines is true.

fvextra/fvextra/fvextra.sty

Lines 1609 to 1630 in 6717547

\def\FV@DefFVSpace{%
\ifbool{FV@showspaces}%
{\ifbool{FV@breaklines}%
{\ifcsname FV@BreakBefore@Token\FV@SpaceCatTen\endcsname
\def\FV@Space{\FV@SpaceColor{\FancyVerbSpace}}%
\else\ifcsname FV@BreakAfter@Token\FV@SpaceCatTen\endcsname
\def\FV@Space{\FV@SpaceColor{\FancyVerbSpace}}%
\else
\def\FV@Space{\FV@SpaceColor{\FancyVerbSpace}\FancyVerbSpaceBreak}%
\fi\fi}%
{\def\FV@Space{\FV@SpaceColor{\FancyVerbSpace}}}}%
{\ifbool{FV@breaklines}%
{\ifcsname FV@BreakBefore@Token\FV@SpaceCatTen\endcsname
\def\FV@Space{\mbox{\FV@SpaceCatTen}}%
\else\ifcsname FV@BreakAfter@Token\FV@SpaceCatTen\endcsname
\def\FV@Space{\mbox{\FV@SpaceCatTen}}%
\else
\ifbool{FV@breakcollapsespaces}%
{\def\FV@Space{\FV@SpaceCatTen}}%
{\def\FV@Space{\mbox{\FV@SpaceCatTen}\FancyVerbSpaceBreak}}%
\fi\fi}%
{\def\FV@Space{\FV@SpaceCatTen}}}}%

@12357-314
Copy link
Author

@muzimuzhi, thank you for your response and pointing to the breakcollapsespaces option as the issue. I see where my error is now. I was not reading the documentation for breakcollapsespaces correctly. I still don't know if I'm reading it correctly, but, regardless, the issue has been fixed. Thank you!

Solution:

The following options were necessary to display the output as expected:

  • The breaklines option is false by default and must obviously be set to true to add the line breaks.

  • The breakanywhere option is false by default, and must be set to true to inert a line break inside a "word" instead of just at spaces, except the documentation says this command does not work with Pandoc's highlighting unless it is used along with the breaknonspacingroup option.

  • The breaknonspaceingroup option is set to false by default, and must be set to true for the breakanywhere option to have effect for the reason given before.

  • The breakcollapsespaces option is true by default, and must be set to false to prevent the behavior reported above. I read it incorrectly before as multiple spaces are collapsed when breaking lines so that there are not trailing spaces at the beginning or end of broken lines. It should be read that a break is inserted at a line that is too long at a space and that space is removed from the text when the line is broken, and if false those spaces are preserved and the line is not broken at the space? The relevant documentation for breakcollapsespaces is quoted below.

    When true (default), a line break within a run of regular spaces
    (showspaces=false) replaces all spaces with a single break, and the wrapped
    line after the break starts with a non-space character. When false, a line
    break within a run of regular spaces preserves all spaces, and the wrapped line
    after the break may start with one or more spaces. This causes regular spaces
    to behave exactly like the visible spaces produced with showspaces; both give
    identical line breaks, with the only difference being the appearance of spaces.

Revision:

markdown.md:

---
header-includes:
 -  \usepackage{fvextra}
 -  \DefineVerbatimEnvironment{Highlighting}{Verbatim}{
     breaklines=true,
     breakanywhere=true,
     breaknonspaceingroup=true,
     breakcollapsespaces=false,
     commandchars=\\\{\}}
---

```bash
sha512sum <<<"string"
346254a332b819b42186a57d6f31fc8fdb9ce81ec89c613f7331a434e73a8acfc7c53db66dc44d8dee65cccd6ea6a911e2638a2e5eed1ee2a4527325d8a29d41  -
```
pandoc markdown.md -o markdown-md.pdf

markdown-md.pdf:
image

Thanks again!

@muzimuzhi
Copy link

I still don't get why there would be line breaks at the spaces right before the trailing - in your example. Maybe we can keep this issue open and wait for canonical answer from the maintainer.

@12357-314
Copy link
Author

Maybe the reason is that the hash is processed separately from the line as a whole. Because the hash doesn't fit on a line by itself, it is split as a token and then the dash is also split off because it didn't fit on the line either initially. I've been thinking about it and this behavior might make more sense if you're trying to visually separate the breaks at spaces and breaks that split words.

---
header-includes:
 -  \usepackage{fvextra}
 -  \DefineVerbatimEnvironment{Highlighting}{Verbatim}{
     breaklines=true,
     breakanywhere=true,
     breaknonspaceingroup=true,
     commandchars=\\\{\}}
---

```bash
sha512sum <<<"string"
346254a332b819b42186a57d6f31fc8fdb9ce81ec89c613f7331a434e73a8acfc7c53db66dc44d8dee65cccd6ea6a911e2638a2e5eed1ee2a4527325d8a29d41  -
```

```bash
3.1415926535897932384626433832795028841971693993751058209749445923078164062820 3.1415926535897932384626433832795028841971693993751058209749445923078164062820 3.1415926535897932384626433832795028841971693993751058209749445923078164062820 3.1415926535897932384626433832795028841971693993751058209749445923078164062820
```

```bash
346254a332b819b42186a57d6f31fc8fdb9ce81ec89c613f7331a434e73a8acfc7c53db66dc44d8dee65cccd6ea6a911e2638a2e5eed1ee2a4527325d8a29d41  - 346254a332b819b42186a57d6f31fc8fdb9ce81ec89c613f7331a434e73a8acfc7c53db66dc44d8dee65cccd6ea6a911e2638a2e5eed1ee2a4527325d8a29d41 - 346254a332b819b42186a57d6f31fc8fdb9ce81ec89c613f7331a434e73a8acfc7c53db66dc44d8dee65cccd6ea6a911e2638a2e5eed1ee2a4527325d8a29d41 - 
```

image

The second example there shows more clearly when a break is being put in between words vs when a break splits words in general, and that might be useful, but the last example doesn't make a lot of sense to me because it seems like the last line is treated differently than the other ones.

I'll reopen the issue to see if this is something that needs to be addressed further.

@12357-314 12357-314 reopened this Aug 19, 2024
@gpoore
Copy link
Owner

gpoore commented Aug 20, 2024

This is due to how LaTeX handles hyphenation.

With breakcollapsespaces=true, breaks at spaces follow LaTeX's standard approach. With breakcollapsespaces=false, spaces are treated as non-breaking with a trailing \discretionary, so they act like a regular character followed by optional hyphenation. Meanwhile, breakanywhere=true inserts \discretionary between pairs of non-space characters, which amounts to optional hyphenation.

Fixing this involves modifying hyphenation penalties/demerits. Reference: https://tex.stackexchange.com/a/51264/10742. One of the interesting things about the original example of the hash followed by the hyphen is that if you replace - with a sequence of hyphens - - - - -, only the last one is put on a line by itself. Setting \finalhyphendemerits=0 fixes this. There is a second potential issue, which doesn't appear in the original example, but does appear in something like this (and also in the longer examples above):

\begin{Verbatim}[breaklines,breakanywhere]
sha512sum  <<<"string"346254a332b819b42186a57d6f31fc8fdb9ce81ec89c613f7331a434e73a8acfc7c53db66dc44d8dee65cccd6ea6a911e2638a2e5eed1ee2a4527325d8a29d41  <<<"string"346254a332b819b42186a57d6f31fc8fdb9ce81ec89c613f7331a434e73a8acfc7c53db66dc44d8dee65cccd6ea6a911e2638a2e5eed1ee2a4527325d8a29d41  <<<"string"346254a332b819b42186a57d6f31fc8fdb9ce81ec89c613f7331a434e73a8acfc7c53db66dc44d8dee65cccd6ea6a911e2638a2e5eed1ee2a4527325d8a29d415cccd6ea6a911e2638a2e5eed1ee2a   <<<"string"346254a332b819b42186a57d6f31fc8fdb9ce81ec89c613f7331a434e73a8acfc7c53db66dc44d8dee65cccd6ea6a911e2638a2e5eed1ee2a4527325d8a29d41
\end{Verbatim}

Breaks at spaces have much higher precedence than breakanywhere, so there are several largely empty lines. This can be fixed by setting \linepenalty=10000 (in which case \finalhyphendemerits=0 seems to become irrelevant).

You can experiment with this by using \AtBeginEnvironment{Verbatim}{\finalhyphendemerits=0\linepenalty=10000\relax}.

I think \finalhyphendemerits=0 should probably be added as a permanent hardcoded setting, and then perhaps there should be a new setting like breakanywherepreferspaces which leaves everything as-is when true, but when set false invokes \linepenalty=10000.

gpoore added a commit that referenced this issue Aug 22, 2024
… broken line with breakbefore, breakafter, or breakanywhere (#26); added option breakpreferspaces (#26)
@gpoore
Copy link
Owner

gpoore commented Aug 22, 2024

The unnecessary line break is now fixed in the development version on GitHub. You can download the new fvextra.sty to use the fixed version immediately, or a new release will be on CTAN within a few days.

I also added a new option breakpreferspaces (default true) that determines whether breaks are preferentially inserted at spaces. When this is false, all possible break locations are treated equally.

@gpoore gpoore closed this as completed Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants