Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't replace ASCII minus sign with U+2122 #1

Closed
silverwind opened this issue Feb 7, 2018 · 12 comments
Closed

Don't replace ASCII minus sign with U+2122 #1

silverwind opened this issue Feb 7, 2018 · 12 comments

Comments

@silverwind
Copy link

Suggesting to not transform - (ASCII minus sign) to (U+2122) in the rendered HTML as most programs does not support this symbol as a prefix for flags.

Ref: nodejs/node#18580

@Alhadis
Copy link
Owner

Alhadis commented Feb 7, 2018

Haha. 😀 Man, where to begin...

Short explanation

I'm simply using the exact characters Troff outputs for UTF8-aware terminals. If I changed minus signs (U+2212) to hyphen-minuses (U+002D), it would only be because Groff/Mandoc agreed to change their character mappings too.

I remember reading some talk on the Groff mailing list about doing such a thing, but I'm not sure whether they stand. :) I'll just point to earlier discussions and let you read for yourself how much of an issue this really is. ;)

You'd have thought punctuation could be so hard, eh? 😁 You don't even wanna know the history behind the single-quote key. Suffice to say Adobe screwed up royally there... But that's why you'll commonly see quoting ``like this''

@Alhadis
Copy link
Owner

Alhadis commented Feb 7, 2018

They were even stylised to look look curvy quotes:
20180208_080509

But let's face it, `this' `looks' `silly'. 

@silverwind
Copy link
Author

Lol, that's quite the discussion there

I'm simply using the exact characters Troff outputs for UTF8-aware terminals

Might not be that simple. On my unicode-aware terminal (iTerm2) with locale en_US.UTF-8, those \- still seem to render as ASCII, using man.

When I run /usr/bin/groff -Wall -mtty-char -Tascii -mandoc -c ./doc/node.1 (I took the command from /etc/man.conf), I also get ASCII minuses, thought that may be because of the -Tascii argument.

`this' `looks' `silly'

Totally, I cringe every time I see this.

@Alhadis
Copy link
Owner

Alhadis commented Feb 7, 2018

When I run /usr/bin/groff -Wall -mtty-char -Tascii -mandoc -c ./doc/node.1 (I took the command from /etc/man.conf), I also get ASCII minuses.

That's the same pipeline which generates the output of man XYZ whenever it's used to format Roff source for terminal display. And yes, it is because of the -Tascii argument that affects character support. Try replacing that command with this:

/usr/bin/groff -Tutf8 -mandoc -Wall -c ./doc/node.1

@Alhadis
Copy link
Owner

Alhadis commented Feb 7, 2018

I forgot to mention, too... troff(and therefore nroff) won't automagically detect environment locale each time it runs. You tell it what locale based on the format of the output device you'd like the Roff document to be formatted for.

Here output device isn't strictly referring to physical printing hardware. It can mean a driver that interfaces with the actual typesetting equipment (usually software), which leads to printing or rendering the fully typeset result.

So in other words, you need to think of "Plain ascii text" and "Plain utf8 text" as though they were two entirely unrelated or unconnected document formats. Checking environment variables and deciding all ascii jobs should now be interpreted as utf8 jobs because LC_ALL="en_AU.UTF-8" really isn't something a program should be doing.

Graphical devices: used by Typesetter Roff (troff)

-T argument Description
dvi TeX DVI format
html
xhtml
HTML and XHTML output
lbp Canon CAPSL printers (LBP-4 and LBP-8 series laser printers)
lj4 HP LaserJet4 compatible (or other PCL5 compatible) printers
ps PostScript output
pdf Portable Document Format (PDF) output

Terminals and cell-based character displays; used by New-roff (nroff)

-T argument Description
ascii 7bit ASCII
cp1047 Latin-1 character set for EBCDIC hosts
latin1 ISO 8859-1
utf8 Unicode character set in UTF-8 encoding.
This mode has the most useful fonts for TTY display,
so it is the best mode for TTY

Devices for the X windowing system to preview documents at various resolutions. Used by gxditview which is a viewing program. None of these are really that useful.

-T argument Description
X75 75dpi resolution, 10pt document base font
X75-12 75dpi resolution, 12pt document base font
X100 100dpi resolution, 10pt document base font
X100-12 100dpi resolution, 12pt document base font

@silverwind
Copy link
Author

If I set -Tutf8, these flags break in a funny way:

     −‐v8‐options
             Print V8 command‐line options.

@Alhadis
Copy link
Owner

Alhadis commented Feb 9, 2018

@silverwind That's because the first dash in --v8-options is U+2212, not a U+2D. The latter is what you get when typing the dash-key on your keyboard; the former is a mathematical "minus sign". The distinction is primarily typographical.

What command are you invoking to obtain that? I can't reproduce it:

$ groff -Tutf8 -mandoc doc/node.1 | less
     --v8-options
             Print V8 command-line options.

$ mandoc -Tutf8 -a doc/node.1
     --v8-options
             Print V8 command-line options.

$ groff -Tutf8 -mandoc -Z doc/node.1 | ~/Labs/Roff.js/bin/html-tty | less
     <b>--v8-options</b>
             Print V8 command-line options.

@silverwind
Copy link
Author

$ groff -Wall -mtty-char -Tutf8 -mandoc -c ./doc/node.1 | grep v8
     node [−‐v8‐options]
             V8 Inspector integration allows attaching Chrome DevTools and
             Process V8 profiler output generated using the V8 option −‐prof.
     −‐v8‐options
             Print V8 command‐line options.
             Note: V8 options allow words to be separated by both dashes (‐)
     −‐v8‐pool‐size num
             Set V8’s thread pool size which will be used to allocate back‐
             ground jobs.  If set to 0 then V8 will choose an appropriate size
             the value provided is larger than V8’s maximum, then the largest
$ groff --version
GNU groff version 1.19.2
Copyright (C) 2004 Free Software Foundation, Inc.
GNU groff comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of groff and its subprograms
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.

called subprograms:

GNU troff (groff) version 1.19.2
GNU grops (groff) version 1.19.2

This is the version of groff that comes preinstalled on macOS. I also tried on Linux on version 1.22.3 where I also get consistent ASCII, so it's probably a bug in the macOS version.

@silverwind
Copy link
Author

Yep, it's a bug. I installed 1.22.3 from homebrew and it works. Maybe something to consider when writing those manpages as the typical macOS user will be using the system's groff.

@Alhadis
Copy link
Owner

Alhadis commented Feb 9, 2018

Ouch. I forgot macOS uses an antique version of Groff. You'll notice this issue disappears if you upgrade to the latest version:

brew install groff

What you're witnessing is actually a known issue that's plagued Groff for a loooong time. Only in more recent years did Groff start replacing the "UTF-8 minus" with the "ASCII dash" when formatting pages for terminal display.

I believe macOS's man command uses -Tascii for formatting output, though. Did you modify man.conf to use -Tutf8?

@silverwind
Copy link
Author

Yes, it uses ASCII, so it should be fine:

$ grep "/groff" /etc/man.conf
TROFF		/usr/bin/groff -Tps -mandoc -c
NROFF		/usr/bin/groff -Wall -mtty-char -Tascii -mandoc -c
JNROFF		/usr/bin/groff -Tnippon -mandocj -c

@Alhadis
Copy link
Owner

Alhadis commented Feb 9, 2018

Good to hear. 😉

Going to close this issue as nodejs/node#18559 is probably the better place to discuss this.

@Alhadis Alhadis closed this as completed Feb 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants