-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recent Windows 10 ReadConsoleOutput change breaks non-ASCII text #105
Comments
@miniksa for visibility. I believe he changed some stuff in that API to make it consistent with ConhostV1. |
There's a fix that didn't make RS2 RTM. I'm looking into what we can do about it. |
It's probably easy for winpty to detect the broken e.g. winpty could call
This workaround would add a lot of complexity, so I'd prefer some other solution if possible. API note: Aside: winpty may also need to stop using such tiny fonts. |
As a workaround, reading lines one by one seems to fix strange line break output. diff --git a/src/agent/Win32ConsoleBuffer.cc b/src/agent/Win32ConsoleBuffer.cc
index ed93f40..dc4efce 100755
--- a/src/agent/Win32ConsoleBuffer.cc
+++ b/src/agent/Win32ConsoleBuffer.cc
@@ -157,8 +157,18 @@ void Win32ConsoleBuffer::setCursorPosition(const Coord &coord) {
void Win32ConsoleBuffer::read(const SmallRect &rect, CHAR_INFO *data) {
// TODO: error handling
SmallRect tmp(rect);
- if (!ReadConsoleOutputW(m_conout, data, rect.size(), Coord(), &tmp) &&
- isTracingEnabled()) {
+ CHAR_INFO *buffer = data;
+ Coord bufferSize = Coord(rect.width(), 1);
+ BOOL success = TRUE;
+ for (SHORT y = rect.Top; y <= rect.Bottom; y++, buffer += rect.width()) {
+ tmp.Top = y;
+ tmp.Bottom = y;
+ if (!ReadConsoleOutputW(m_conout, buffer, bufferSize, Coord(), &tmp)) {
+ success = FALSE;
+ break;
+ }
+ }
+ if (!success && isTracingEnabled()) {
StringBuilder sb(256);
auto outStruct = [&](const SMALL_RECT &sr) {
sb << "{L=" << sr.Left << ",T=" << sr.Top |
I think a better way to accomplish this is to set ( winpty/src/agent/LargeConsoleRead.cc Line 50 in 4978cf9
I'd expect that change to help mitigate the situation -- actually, if it works, it's probably a good idea. I'm not sure what is expected to appear at the end of the Line 369 in 4978cf9
CHAR_INFO values from the Terminal::sendLine width.
Edit: fix "followed by two NULs that the terminal ignores" -- it could be one NUL or any number of NULs. |
@miniksa Can you confirm that when this issue (i.e. the double-column |
This workaround would reduce the likelihood of a successful @shirosaki FWIW, that change would break Scraper::findSyncMarker, which assumes it can read an entire screen buffer column (3000 lines) efficiently and atomically. |
@rprichard The problem is that ReadConsoleOutput* is going to give you different information depending on whether the original text was written with WriteConsoleOutputA, WriteConsoleOutputW, WriteFile, WriteConsoleA, or WriteConsoleW AND whether or not a Raster font or TrueType font is used at the time. I can't definitively tell you what the appropriate pattern for double-byte characters will be nor what the trailing fields will be filled with. The definitive solution would be to make a new API that is correct all the time or, my personal preference, to build a PTY mechanism directly into Windows and deprecate all the arguably terrible Windows Console APIs. But those solutions will take time. In the mean time, as @zadjii-msft alluded to above, I recently wrote a massive test and ensured that the v2 console and the v1 console should be exactly the same and follow the below absolutely terrible too-many-dimensional matrix created from decades of bugs that are now preserved for compatibility. You are welcome to use this as a basis to try to figure out the right thing to provide through WinPty. Hopefully this will evolve into an MSDN article and/or blog post one day, time permitting (cough @bitcrazed cough). There also might be some differences between what is listed below (the v1 behavior and the now-fixed v2 behavior) and what you see in some builds of Windows. That's because it got broken at some point and then fixed again at another point. This test is now in place and should be applicable going forward for all v2 consoles, but I'm not certain which builds the fix is in and which it is not in. I just know that this is what we should program to and fix anything that isn't working this way. Also fair warning: I want to deprecate/remove Raster fonts from v2 in a future edition of the console, so please don't build too deeply on top of those. They don't scale for High DPI, they don't work for multilingual text, and they're just plain bad. I would recommend that you choose one of these patterns that gives you the information that you need and uses a TrueType font selected (or accounts for the both font potentials using ScenariosReading with ReadConsoleOutput*WriteConsoleOutput* + ReadConsoleOutput*
Other writes + ReadConsoleOutput*Other writes = CRT write (printf, etc.) A or W -OR- WriteConsoleOutputCharacter* -OR- WriteConsole*
Reading with ReadConsoleOutputCharacter*WriteConsoleOutputW + ReadConsoleOutputCharacter*
Other writes + ReadConsoleOutputCharacter*Other writes = CRT write (printf, etc.) A or W -OR- WriteConsoleOutputCharacter* -OR- WriteConsole* -OR- WriteConsoleOutputA
PatternsEach table below shows what you would get if you had written to the console with the following string and settings:
The
The The Pattern 1
Pattern 2
Pattern 3
Pattern 4
Pattern 5
Pattern 6
Pattern 7
Pattern 8
Pattern 9
Pattern 10
Pattern 11
Pattern 12
Pattern 13
Pattern 14
|
Just a quick note as confirmation: We then made a copy of it to port it back from the Windows Insider builds to the Creators Update as MSFT: 11721571. I've heard that the Creators Update KB fix has apparently just gone live in KB4020102 (OS build number 15063.332). |
@miniksa Thanks for your detailed reply, and for fixing the issue in Windows. It wasn't completely clear to me, but I think in the above patterns, for winpty always currently reads using I think winpty could accommodate a raster font by converting the Patterns 1 and 2 seem like misuses of the console API. Pattern 1 is placing wide characters into single cells, which will tend to cause alignment problems (e.g. an 80 column line with 160 columns of text in it). If my assumption above about write regions is correct, then pattern 2 is providing extraneous The 0xFFFF invalid character in patterns 4 and 8 is weird; I'm guessing it's there for backwards compatibility? winpty doesn't currently handle it, but I guess it should. It currently expects the trailing and leading wide char to equal. FWIW, winpty also tries to recognize UTF-16 surrogate pairs, though I'm not sure that was a good idea in retrospect. I wonder what happens with the 65001 codepage combined with If you want to document more hairiness, I'm sure there are edge cases involving I/O to a single cell of a two-cell character. I think I just updated my 15063 VM tonight, and the KB4020102 update automatically installed, which bumped the (Note to self: the patterns describe the fixed state, not the buggy state.) |
@rprichard Apologies. I should have specified more about how the test writes. For all writes, we're starting with a cleared out buffer (all set to space characters Then from there, For the CRT write tests, we clear the buffer the same way and set the cursor to 0,0 again. For writing W versions, we call For the Then regarding raster fonts, you are correct. If the character doesn't exist in the currently selected raster font, it will generally convert it into the default character One thing to note that I think you may have misunderstood: all of these patterns are what you will see when attempting to read back 16 characters no matter what write mechanism was used. The write mechanisms vary as specified above. But reading back 16 items from the Read APIs will result in these patterns. To that end, Patterns 1 and 2 are representing a Read back, not what was specified on Write. The write buffer did indeed look like Pattern 3 (but 12 long instead of 16) for writing the W version of the text with the The This sort of organic development without context over time is also how we ended up with some patterns like 8 and 9 with an A byte stomped on top of a W character in the CHAR_INFO structure on read... In the last few years, I rearranged this so the buffer internally is always stored as Unicode text and it is translated as necessary on the way in/out through the APIs and when being given to GDI. It also always uses the The windows console doesn't support UTF-16 surrogate pairs right now. I want to do that in the future, but it's more accurate to say we support UCS-2 than to say we support UTF-16. UTF-8 (codepage 65001 on the A APIs) is also not officially supported. It works some times and on some of the APIs, but it's not complete and there are gaps/holes. I would expect it to work in strange and interesting ways. Officially, the console supports 2 byte UCS-2 through the W versions of the API. On the A API, we support code pages that are 1 byte for the "Western" world and we support 4 specific 2 byte codepages for "CJK" regions: 932, 936, 949, and 950. Anything else was never officially implemented or supported and your mileage may vary significantly. Do you have a specific example of what you mean by your single cell I/O of a two-cell character? I can log a bug/task internally to investigate, test, and further document that. This is basically all happening on-demand as we discover scenarios/problems. So if you have a specific scenario/problem, please let me know! OK cool. If you don't need to implement a workaround, even better. Hopefully this still provides some good insight into what's happening and why and how for future reference. Thanks for your patience and cooperation! |
@miniksa Sorry, I forgot about this.
I was thinking of things like:
IIRC, in scenario 1, the console will pretend to read a space (U+0020) character. In scenario 2, I think it will replace the other half of the two-column character(s) with a space. I wouldn't be surprised if the answer is more complicated. :-P For scenario 3, I'm guessing the last column is replaced with a space, and the character is wrapped around to the next line? What if the screen buffer is only 1 column wide (which also implies a gigantic font)? For synchronization purposes, winpty issues a single |
A change to the console in new versions of Windows 10 (e.g. 15048, but not 15014), breaks winpty by effectively shifting cells at the start of a line to the end of the previous line when certain characters appear, in certain fonts.
This winpty issue caused this downstream issue in VSCode, microsoft/vscode#19665.
Follow these links for more details:
The text was updated successfully, but these errors were encountered: