Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

\in\not vs \notin #39814

Closed
OliverEvans96 opened this issue Feb 24, 2021 · 11 comments · Fixed by #41870
Closed

\in\not vs \notin #39814

OliverEvans96 opened this issue Feb 24, 2021 · 11 comments · Fixed by #41870
Labels
docsystem The documentation building system unicode Related to unicode characters and encodings

Comments

@OliverEvans96
Copy link

OliverEvans96 commented Feb 24, 2021

Hello,

I've found a discrepancy between two ways to write ∉ at the REPL Julia v1.6.0-beta1 on Arch Linux.

Both are recognized at the help?> prompt and suggested to be typed the same way, but only \notin has methods - \in\not is undefined.

Here's a summary of the differences

Typed Displayed Raw bytes (Vector{UInt8}) help?> typing hint Number of methods
\notin<tab> [0xe2, 0x88, 0x89] "∉" can be typed by \in<tab>\not<tab> 2
\in<tab>\not<tab> ∉ [0xe2, 0x88, 0x88, 0xcc, 0xb8] "∉" can be typed by \in<tab>\not<tab> 0

I'm curious whether others observe the same issue on other systems or Julia versions.

Cheers,
Oliver

@JeffBezanson
Copy link
Member

These are considered different names under the unicode normalization that we use. Is there a normalization form that equates them?

@mbauman
Copy link
Member

mbauman commented Feb 24, 2021

Either way, it looks like our help hint is wrong for \notin. I'm guessing that's how you stumbled across this issue?

@OliverEvans96
Copy link
Author

Yes, I suppose it was. I didn't know how to type that character, so I pasted it in help?> and got bad advice

@OliverEvans96
Copy link
Author

All of the above also seems to apply to other "not" symbols. e.g.

help?>  #\ne
"" can be typed by =\not<tab>

help?>  #\napprox
"≉" can be typed by \approx<tab>\not<tab>

@clason
Copy link

clason commented Feb 24, 2021

Hmm, it works here (julia 1.6.0-rc1 as well as 1.7.0-DEV.606, macOS Intel, iTerm2) \in<tab>\not<tab> gives (and similarly for =\not<tab>).

The difference between the two methods is that the first directly inserts the character, while the second needs to delete the \in you've inserted first. Maybe your terminal can't handle the necessary terminal control codes? Which terminal are you using?

@clason
Copy link

clason commented Feb 24, 2021

@OliverEvans96 ☝️

(I forgot that editing doesn't notify people...)

@OliverEvans96
Copy link
Author

I'm using gnome-terminal

@clason
Copy link

clason commented Feb 24, 2021

Can you test a different one (kitty, alacritty, pangoterm)?

@mbauman
Copy link
Member

mbauman commented Feb 24, 2021

Hunh, that's odd. Here's what I see on 1.6.0-rc1 and 1.7.0-DEV.606, macOS/x86/iTerm2 Build 3.4.4:

help?>  # \notin
"∉" can be typed by \in<tab>\not<tab>

search: ∉

  ∉(item, collection) -> Bool
...
julia>  # \in\not
ERROR: UndefVarError: ∉ not defined

On 1.5 the hints match how I wrote it.

@clason
Copy link

clason commented Feb 24, 2021

Oh, I also get the not defined; I assumed that was intentional. (I was testing on the standard REPL, not help mode.) This indicates that these are two different unicode chars (single vs. composite, with not all terminals understanding the latter).

@mbauman
Copy link
Member

mbauman commented Feb 24, 2021

Aha, I bet the hints changed in #36382, where we use NFD normalization to look up the latex completion. Watch out, it's what GitHub apparently uses, too. Julia source code uses NFC. The former normalizes \notin to the combining characters; the latter does not.

julia> codeunits(REPL.REPLCompletions.latex_symbols["\\notin"])
3-element Base.CodeUnits{UInt8,String}:
 0xe2
 0x88
 0x89

julia> codeunits(Unicode.normalize(String([0xe2, 0x88, 0x89]), :NFC))
3-element Base.CodeUnits{UInt8,String}:
 0xe2
 0x88
 0x89

julia> codeunits(Unicode.normalize(String([0xe2, 0x88, 0x89]), :NFD))
5-element Base.CodeUnits{UInt8,String}:
 0xe2
 0x88
 0x88
 0xcc
 0xb8

@mbauman mbauman added docsystem The documentation building system unicode Related to unicode characters and encodings labels Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docsystem The documentation building system unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants