Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Unicode characters like ß not working in terminal #19763

Closed
RobIsHere opened this issue Feb 2, 2017 · 13 comments
Closed

Some Unicode characters like ß not working in terminal #19763

RobIsHere opened this issue Feb 2, 2017 · 13 comments
Assignees
Labels
*duplicate Issue identified as a duplicate of another issue(s)

Comments

@RobIsHere
Copy link

RobIsHere commented Feb 2, 2017

  • VSCode Version: 1.9.0
  • OS Version: Win 10

Steps to Reproduce:

  1. Open Terminal
  2. Type bash on prompt
  3. Type ß on german keyboard

The cursor jumps 4 places forward when typing ß in bash.exe called via powershell and reverting the mess with backspace is somehow out of sync. It only deletes two and then it thinks it is at the prompt. But it isn't yet.

@Tyriar
Copy link
Member

Tyriar commented Feb 3, 2017

Hi @RobIsHere, some questions:

  • Can you post a screenshot of the broken state (immediately after typing ß)?
  • Can you post your settings.json file?
  • Was this happening in v1.8.1?

@Tyriar Tyriar self-assigned this Feb 3, 2017
@Tyriar Tyriar added l10n-platform Localization platform issues (not wrong translations) terminal General terminal issues that don't fall under another label info-needed Issue requires more information from poster labels Feb 3, 2017
@RobIsHere
Copy link
Author

Of course. I hope this helps:

  1. The actually typed character is ß which ends up in the terminal as this A-with-dash and a space afterwards.
  2. The second screen shows what happens when i type ß three times. Notice the three spaces at the end.
  3. The third screen shows what happens when i press backspace to delete everything. Some characters remain there
  4. The fourth screen shows other unicode-valid characters like öÄü (german umlauts. we need to use them very often). With some characters where the space is usually with ß there is another symbol now.
    (ÄÖÜß => space, äöü => some symbol instead)

This is with 1.8.1. In 1.9.0 it was the same. (I cannot upgrade now for the screenshots because of the blocked terminal settings, sorry!)

screen

screen2

screen3

screen4

@Tyriar
Copy link
Member

Tyriar commented Feb 3, 2017

I think this is probably a duplicate of #19440, you use altgr to enter these characters right?

@RobIsHere
Copy link
Author

No, Tryriar. The keys are without any modifiers. Just shift for Uppercase of ÖAÜ.
They are used in german quite often.

You can see them here near the Null "0" Key and near the "L" on the right side:
https://en.wikipedia.org/wiki/File:KB_Germany.svg

@Tyriar
Copy link
Member

Tyriar commented Feb 3, 2017

Thanks for clarifying 😃

@Tyriar Tyriar added bug Issue identified by VS Code Team member as probable bug upstream Issue identified as 'upstream' component related (exists outside of VS Code) and removed info-needed Issue requires more information from poster labels Feb 3, 2017
@rmunn
Copy link
Contributor

rmunn commented Mar 15, 2017

This looks like the classic "UTF-8 interpreted as CP1252" bug. For example, Ä (U+00E4 LATIN SMALL LETTER A WITH DIAERESIS), encoded in UTF-8, is the two-byte sequence 0xC3 0xA4. If that byte sequence is interpreted as CP1252, it will show up as à (U+00C3 LATIN CAPITAL LETTER A WITH TILDE) followed by ¤ (U+00A4 CURRENCY SIGN).

I would bet that Powershell is using the current Windows codepage for its input, instead of UTF-8. @RobIsHere, what happens if you run chcp 65001 to select the insufficiently-documented Windows UTF-8 codepage first, and then run Bash?

@rmunn
Copy link
Contributor

rmunn commented Mar 15, 2017

BTW, any time you see  (U+00C2) or à (U+00C3) followed by other accented characters, the first thing you should think is "UTF-8 wrongly interpreted as Latin-1 / ISO-8859-1 / CP-1252". 99% of the time,  and à are dead giveaways for that particular, far-too-widespread, bug. (I'll spare you the rant on why Microsoft should really have made CP65001 the default in Windows 10.)

@be5invis
Copy link
Contributor

be5invis commented Mar 15, 2017

@rmunn Because programs written in 1990s will crash.
Trust me, there are, a lot. For some of them we cannot even find their source code.
Like a plugin used by US Government that Office still needs to support, its developer has bankrupted, and the source code is lost. Millions of US Gov offices are still using it.

Cherish your life, stay away from C++.

@be5invis
Copy link
Contributor

pps. Default encoding (codepage) of console in English language is cp437.

@RobIsHere
Copy link
Author

screen3

@RobIsHere
Copy link
Author

In this scenario ß works as expected.

´´´
"terminal.integrated.shell.windows": "ssh.exe",
"terminal.integrated.shellArgs.windows": [
"root@server",
"-t",
"cd /root/projects; bash --login"
],

´´´

@rprichard
Copy link

Are you using WSL bash or Cygwin bash? I don't have a German keyboard, so I tried copying and pasting U+00DF (ß) into a VSCode terminal pane, and it worked. FWIW, I know that winpty doesn't handle "special characters" the same way as the normal console. AFAICT, if a key isn't on my keyboard, it's not possible for me to type it into WSL bash using the normal console. If I try to paste it, nothing happens.

Here's my first attempt with VSCode:

PS C:\Users\rprichard> echo ${env:LANG}
en_US.UTF-8
PS C:\Users\rprichard> bash
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

rprichard@DESKTOP-RT2N8VC:/mnt/c/Users/rprichard$ echo $LANG
en_US.UTF-8
rprichard@DESKTOP-RT2N8VC:/mnt/c/Users/rprichard$ uname -a
Linux DESKTOP-RT2N8VC 4.4.0-43-Microsoft #1-Microsoft Wed Dec 31 14:42:53 PST 2014 x86_64 x86_64 x86_64 GNU/Linux
rprichard@DESKTOP-RT2N8VC:/mnt/c/Users/rprichard$ echo ß | od -tx1
0000000 c3 9f 0a
0000003
rprichard@DESKTOP-RT2N8VC:/mnt/c/Users/rprichard$ showkey -a

Press any keys - Ctrl-D will terminate this program

ß       195 0303 0xc3
        159 0237 0x9f
^D        4 0004 0x04

I belive winpty decodes UTF-8 from VSCode and then calls WriteConsoleInputW with a single KEY_EVENT INPUT_RECORD with a 0x00DF Event.KeyEvent.uChar.UnicodeChar. C3 9F is the UTF-8 representation for U+00DF. Something (the WSL subsystem?) converts the single UTF-16 INPUT_RECORD back into UTF-8.

@Tyriar
Copy link
Member

Tyriar commented Apr 21, 2017

Duplicate #19665

@Tyriar Tyriar closed this as completed Apr 21, 2017
@Tyriar Tyriar added *duplicate Issue identified as a duplicate of another issue(s) and removed bug Issue identified by VS Code Team member as probable bug l10n-platform Localization platform issues (not wrong translations) terminal General terminal issues that don't fall under another label upstream Issue identified as 'upstream' component related (exists outside of VS Code) labels Apr 21, 2017
@vscodebot vscodebot bot locked and limited conversation to collaborators Nov 18, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
*duplicate Issue identified as a duplicate of another issue(s)
Projects
None yet
Development

No branches or pull requests

5 participants