Some kind of unicode regression #4704

djdv · 2020-02-23T23:41:33Z

Environment

Windows build number: 10.0.18363.0
Windows Terminal version (if applicable): 4420950337ccdd7bb5bb66d84538d2ea4195c101
Powershell 7-rc3

Steps to reproduce

After 4420950, characters like 🏠 appear to get corrupted.

I'm using this symbol in my Powershell prompt and it has stopped working as expected.
7d6738c:

4420950:

$script:pathDelim = [IO.Path]::DirectorySeparatorChar.ToString()
(@($pwd.Path.Split($pathDelim) | select-object -skip 1) -join  $pathDelim).replace([environment]::UserName, "🏠

The text was updated successfully, but these errors were encountered:

DHowett-MSFT · 2020-02-24T08:04:26Z

Yeah, this is fairly serious. 😄

Thanks!

I bet it's part of the conpty change in 4420950.

function things {
write-host "🏠"
write-host "a🏠"
write-host "🏠b"
write-host "c🏠d"
}

/cc @miniksa

miniksa · 2020-02-26T17:42:31Z

function things {
write-host "`u{1f3e0}"
write-host "a`u{1f3e0}"
write-host "`u{1f3e0}b"
write-host "c`u{1f3e0}d"
}

miniksa · 2020-02-26T17:44:54Z

SURROGATE PAIRS!

shakes fist into air

…moji rendering. #4704

…moji rendering. #4704 (#4731) ## Summary of the Pull Request - Surrogate pairs are being split in half with the run splitting check. ## References - Related to #4708 but not going to fix it. ## PR Checklist * [x] Closes #4704 * [x] I work here. * [x] I am a core contributor. ## Detailed Description of the Pull Request / Additional comments - The adjustment of the run heights in the correction function reports back a text index and a scaling factor. However, I didn't remember at the time that the text is being stored as UTF-16. So the index given can be pointing to the high surrogate of a pair. Thus adding 1 to split "after" the text character, then backing up by 1 isn't valid in if the index given was for a high surrogate. The quick fix is to advance by two if it's a high surrogate and one otherwise. ## Validation Steps Performed - Used the sample code from #4704 to print the house emoji in various situations into the buffer.

…ffects emoji rendering. #4704 (#4731)" This reverts commit d7ea526.

djdv · 2020-03-01T21:45:33Z

@DHowett-MSFT , @miniksa
Thanks for the render fix!
However, this is still presenting an issue.
If I set glyphs like that to be part of the prompt, I'm not able to actually use the shell (input seems to do nothing).
Rendering them seems fine however.
Example: https://youtu.be/5Dswxk6XdTs

EDIT: spoke too soon
This doesn't happen under #4747 (ddadcc5) but does in current master (4608fd0 ).

miniksa · 2020-03-02T17:43:56Z

@DHowett-MSFT Dustin Howett FTE , @miniksa Michael Niksa FTE
Thanks for the render fix!
However, this is still presenting an issue.
If I set glyphs like that to be part of the prompt, I'm not able to actually use the shell (input seems to do nothing).
Rendering them seems fine however.
Example: youtu.be/5Dswxk6XdTs

EDIT: spoke too soon
This doesn't happen under #4747 (ddadcc5) but does in current master (4608fd0 ).

OK, great. I'm trying to bring #4747 in ASAP, so hopefully this will be fully resolved for you then.

@DHowett-MSFT

## Summary of the Pull Request - Improves the correction of the scaling and spacing that is applied to glyphs if they are too large or too small for the number of columns that the text buffer is expecting ## References - Supersedes #4438 Co-authored-by: Mili (Yi) Zhang <[email protected]> - Related to #4704 (#4731) ## PR Checklist * [x] Closes #696 * [x] Closes #4375 * [x] Closes #4708 * [x] Closes a crash that @DHowett-MSFT complained about with `"x" * ($Host.UI.RawUI.BufferSize.Width - 1) + "`u{241b}"` * [x] Eliminates an exception getting thrown with the U+1F3E0 emoji in `_CorrectGlyphRun` * [x] Corrects several graphical issues that occurred after #4731 was merged to master (weird repeats and splits of runs) * [x] I work here. * [x] Tested manually versus given scenarios. * [x] Documentation written into comments in the code. * [x] I'm a core contributor. ## Detailed Description of the Pull Request / Additional comments - The `_CorrectGlyphRun` function now walks through and uses the `_glyphClusters` map to determine the text span and glyph span for each cluster so it can be considered as a single unit for scaling. - The total number of columns expected across the entire cluster text/glyph unit is considered for the available spacing for drawing - The total glyph advances are summed to see how much space they will take - If more space than necessary to draw, all glyphs in the cluster are offset into the center and the extra space is padded onto the advance of the last glyph in the range. - If less space than necessary to draw, the entire cluster is marked for shrinking as a single unit by providing the initial text index and length (that is back-mapped out of the glyph run) up to the parent function so it can use the `_SetCurrentRun` and `_SplitCurrentRun` existing functions (which operate on text) to split the run into pieces and only scale the one glyph cluster, not things next to it as well. - The scale factor chosen for shrinking is now based on the proportion of the advances instead of going through some font math wizardry - The parent that calls the run splitting functions now checks to not attempt to split off text after the cluster if it's already at the end. This was @DHowett-MSFT's crash. - The split run function has been corrected to fix the `glyphStart` position of the back half (it failed to `+=` instead of `=` which resulted in duplicated text, sometimes). - Surrogate pair emoji were not allocating an appropriate number of `_textClusterColumns`. The constructor has been updated such that the trailing half of surrogate pairs gets a 0 column width (as the lead is marked appropriately by the `GetColumns()` function). This was the exception thrown. - The `_glyphScaleCorrections` array stored up over the calls to `_CorrectGlyphRun` now uses a struct `ScaleCorrection` as we're up to 3 values. - The `ScaleCorrection` values are named to clearly indicate they're in relation to the original text span, not the glyph spans. - The values that are used to construct `ScaleCorrection`s within `_CorrectGlyphRun` have been double checked and corrected to not accidentally use glyph index/counts when text index/counts are what's required. ## Validation Steps Performed - Tested the utf82.txt file from one of the linked bugs. Looked specifically at Burmese through Thai to ensure restoration (for the most part) of the behavior - Ensured that U+1f3e0 emoji (🏠) continues to draw correctly - Checked Fixedsys Excelsior font to ensure it's not shrinking the line with its ligatures - Checked ligatureness of Cascadia Code font - Checked combining characters U+0300-U+0304 with a capital A

ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Feb 23, 2020

ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Feb 24, 2020

DHowett-MSFT added this to the Terminal v1.0 milestone Feb 24, 2020

DHowett-MSFT assigned miniksa Feb 24, 2020

zadjii-msft mentioned this issue Feb 24, 2020

Shaping and display of certain language is completely incorrect. #4708

Closed

DHowett-MSFT added the Priority-1 A description (P1) label Feb 25, 2020

miniksa added a commit that referenced this issue Feb 26, 2020

Don't split surrogate pairs when breaking runs for scaling. Affects e…

f892200

…moji rendering. #4704

miniksa mentioned this issue Feb 26, 2020

Avoid splitting surrogate pairs when breaking runs for scaling #4731

Merged

3 tasks

ghost added the In-PR This issue has a related PR label Feb 26, 2020

ghost closed this as completed in #4731 Feb 26, 2020

ghost added Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. and removed In-PR This issue has a related PR labels Feb 26, 2020

DHowett-MSFT pushed a commit that referenced this issue Feb 28, 2020

Revert "Don't split surrogate pairs when breaking runs for scaling. A…

8fb90dc

…ffects emoji rendering. #4704 (#4731)" This reverts commit d7ea526.

miniksa mentioned this issue Feb 28, 2020

Improve glyph scaling correction #4747

Merged

10 tasks

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some kind of unicode regression #4704

Some kind of unicode regression #4704

djdv commented Feb 23, 2020 •

edited

Loading

DHowett-MSFT commented Feb 24, 2020

miniksa commented Feb 26, 2020

miniksa commented Feb 26, 2020

djdv commented Mar 1, 2020 •

edited

Loading

miniksa commented Mar 2, 2020

Some kind of unicode regression #4704

Some kind of unicode regression #4704

Comments

djdv commented Feb 23, 2020 • edited Loading

Environment

Steps to reproduce

DHowett-MSFT commented Feb 24, 2020

miniksa commented Feb 26, 2020

miniksa commented Feb 26, 2020

djdv commented Mar 1, 2020 • edited Loading

miniksa commented Mar 2, 2020

djdv commented Feb 23, 2020 •

edited

Loading

djdv commented Mar 1, 2020 •

edited

Loading