-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial support for VT DCS sequences #6328
Add initial support for VT DCS sequences #6328
Conversation
Based on my previous PRs and the fact that English is not even my native language, it should be pretty obvious that I'm not capable of producing write-ups like @j4james or @miniksa does . But I'm still gonna give it a try. So bear with me. Yeah, yeah I know what you think. I'm supposed to be the performance dude and @j4james is the VT hero. So here's the story. One day I was randomly browsing issues & PRs in the WT repo like every "normal" people would do. I suddenly came cross this sixel thing. I was like immediately shouting internally "that is awesome!" . Given that the performance of WT is now so much better than v0.6 (it even beats iTerm2 in some very unfair competitions conducted by me), I think it may be time for me to move on to some other area. So here we are. Let's do this. BTW I'm writing this whole thing on my Macbook Pro. Just wanna add more irony. Not because I'm experiencing hardware problems on my own PC. |
New misspellings found, please review:
To accept these changes, run the following commands
✏️ Contributor please read thisBy default the command suggestion will generate a file named based on your commit. That's generally ok as long as you add the file to your commit. Someone can reorganize it later.
If the listed items are:
See the 🔬 You can test your commits without appending to a PR by creating a new branch with that extra change and pushing it to your fork. The :check-spelling action will run in response to your push -- it doesn't require an open pull request. By using such a branch, you can limit the number of typos your peers see you make. 😉
|
New misspellings found, please review:
To accept these changes, run the following commands
✏️ Contributor please read thisBy default the command suggestion will generate a file named based on your commit. That's generally ok as long as you add the file to your commit. Someone can reorganize it later.
If the listed items are:
See the 🔬 You can test your commits without appending to a PR by creating a new branch with that extra change and pushing it to your fork. The :check-spelling action will run in response to your push -- it doesn't require an open pull request. By using such a branch, you can limit the number of typos your peers see you make. 😉
|
src/terminal/parser/stateMachine.cpp
Outdated
{ | ||
_trace.TraceOnAction(L"DcsPassThrough"); | ||
_trace.TraceOnExecute(wch); | ||
// TODO: actually do something |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the TODOest TODO I've ever written in my life...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: actually do something | |
// TODO:GH#448 actually support sixel graphics |
I'm fine with this TODO comment here, let's just link it back up to the right thread 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a TODO
that deserves a followup github item? If so, file it and change it to // TODO GH#9999: Send the passthrough sequence to the engine
If it's a todo we should do now, we should do it before we merge this PR. 😄
@@ -284,6 +325,12 @@ class Microsoft::Console::VirtualTerminal::OutputEngineTest final | |||
VERIFY_ARE_EQUAL(mach._state, StateMachine::VTStates::CsiParam); | |||
mach.ProcessCharacter(L'J'); | |||
VERIFY_ARE_EQUAL(mach._state, StateMachine::VTStates::Ground); | |||
|
|||
VERIFY_ARE_EQUAL(mach._parameters.size(), 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly do not understand why the actual parameter values are not even checked in this case. I mean that's sorta the whole point of this case, to get the correct parameters.
src/terminal/parser/ascii.hpp
Outdated
@@ -41,5 +41,6 @@ namespace Microsoft::Console::VirtualTerminal | |||
US = 0x1F, // Unit Separator | |||
SPC = 0x20, // Space, first printable character | |||
DEL = 0x7F, // Delete | |||
ST = 0x9c, // String terminator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Why was this added to the AsciiChars
enum? It's not technically ASCII is it? And neither CSI or DCS are defined here. It just seems a bit unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm..I added it here without be aware that it's not ASCII. That being said, should we have a place where CSI
, DCS
and ST
is enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly about it either way. If the only place those characters are currently used are in methods like _isDcsIndicator
and _isStringTerminator
then they're kind of self-documenting anyway.
src/terminal/parser/stateMachine.cpp
Outdated
_trace.TraceOnEvent(L"DcsEntry"); | ||
if (_isC0Code(wch)) | ||
{ | ||
_ActionExecute(wch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is meant to be _ActionIgnore
if the vt100.net state diagram is to be believed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out there a diagram out there. For those of you who're interested, it's here. It helped me a lot towards a more accurate implementation.
src/terminal/parser/stateMachine.cpp
Outdated
else if (_isDcsTerminationInitiator(wch)) | ||
{ | ||
_EnterDcsTermination(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I don't think we want this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still needed if we want strictly follow that ESC \
is the same as ST
in the VT spec here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any time you get an ESC
(except in the OscString
and DcsPassThrough
states), that's going to abort the current sequence and start parsing a new sequence. In the case of ESC \
, that new sequence will be dispatched via the ActionEscDispatch
and ultimately ignored, but the outcome is still that the original DCS
sequence is terminated.
The only time we need to specifically check for ESC \
is in the pass-through state, because that tells us that the sequence has terminated successfully, and we need to actually do something with the data. We don't want to do that when we're in the ignore state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside ActionEscDispatch
, \
is not recognized as a valid terminator, thus it return false
to indicate the dispatch is a failure. Is there anything we can and should do about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. Maybe we just need to recognise it as a valid op and return true without doing anything.
src/terminal/parser/stateMachine.cpp
Outdated
@@ -1292,10 +1686,12 @@ void StateMachine::ProcessCharacter(const wchar_t wch) | |||
_ActionExecute(wch); | |||
_EnterGround(); | |||
} | |||
else if (_isEscape(wch) && _state != VTStates::OscString) | |||
else if (_isEscape(wch) && _state != VTStates::OscString && _state != VTStates::DcsPassThrough && _state != VTStates::DcsIntermediate && _state != VTStates::DcsIgnore) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed above, I think this should only apply to the DcsPassThrough
state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me DcsTermination
is also needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first thought was that you were right, but now that I've had a chance to do some testing, I think that's probably wrong. Consider this test case:
printf "\ePq\e\e[41mRED\e[m\n"
We've started a DCS sequence, and got into the pass-through state. We receive one escape, which could be the start of an ST, but it's then followed by a second escape. At that point we're clearly aborting the DCS, but what happens next?
Going by the state diagram, I'd expect an escape to start an escape sequence from anywhere, so that second escape is just the start of a CSI which sets the background color to red (and that's what XTerm does). For that to work we can't include DcsTermination
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something but I don't see a lot of test cases for complicated escape sequences like these. Is it generally a good practice to write test cases for possibly malformed, garbled or malicious sequences? That's of course not the scope of this PR specifically. Just wondering if it's a possible direction to improve VT tests.
By the way originally I thought DCS was just some non-essential part of the VT spec. Then I saw the state diagram and DCS is taking up almost half the space of entire diagram. I begin to understand what kind of a rabbit hole I'm falling into... |
New misspellings found, please review:
To accept these changes, run the following commands
✏️ Contributor please read thisBy default the command suggestion will generate a file named based on your commit. That's generally ok as long as you add the file to your commit. Someone can reorganize it later.
If the listed items are:
See the 🔬 You can test your commits without appending to a PR by creating a new branch with that extra change and pushing it to your fork. The :check-spelling action will run in response to your push -- it doesn't require an open pull request. By using such a branch, you can limit the number of typos your peers see you make. 😉
|
} | ||
else | ||
{ | ||
_EventEscape(wch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised that we probably also need a call to _EnterEscape
here, before the call to _EventEscape
. If the escape wasn't followed by \
, then we're essentially cancelling the current sequence and started a new one, and we need to reset any parameters and intermediates that we may already have accumulated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurred to me that the OSC
sequences should be doing the same thing as this, but I see that the _EventOscTermination
handler doesn't care what follows the escape - it just dispatches the sequence anyway. That's definitely a bug. Not that that's your problem - just noting that we need to raise an issue for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I observed a lot of similarity between OSC and DCS. Perhaps the logic of those two should be more unified in some way. I created _isIntermediateInvalid
and _isParameterInvalid
to make things a little bit generic than before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that'll become more important if/when we do APC and PM, but DCS has a few differences from the others, so it may not be as easy to share things. What you've done so far with the generic methods is at least a step in the right direction, and we can always do more refactoring later if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way why don't we support C1Osc? Is there a reason or it should not be allowed to enter from ground? From the state graph, OSC, APC & PM can all be entered from ground.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at this side of things in much detail, and I think there is some debate about the use of C1 controls with UTF-8, but if we're supporting some of them, then I would have thought we should be supporting all of them (and that includes the likes of IND
, NEL
, HTS
, etc.).
I was actually wondering whether this was something we could handle as a sort of preprocessing step that converts all of the C1 controls into their two character 7-bit equivalent. That way we can drop all the extra code that's checking for 90 and 9B introducers, and 9C terminators, because those conditions would now be handled as regular 7-bit escapes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's a good idea, to convert C1 controls to their 7 bit equivalent. I currently do not have enough experience with VT to file a PR but I'm getting there. Have learned a lot throughout the review period of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chester, would you mind turning this thread into a followup issue? It seems like we should fix OSC Termination to handle ESC more correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we should. I plan to do that next. Filed #7317 for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we should. I plan to do that next. Filed #7317 for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Soft block on the TODO comments and some code change suggestions.
Otherwise, Chester, this looks excellent. Thanks so much for doing it. James, thanks for your guidance & review. 😄
// Routine Description: | ||
// - Determines if a character is a C1 DCS (Device Control Strings) | ||
// This is a single-character way to start a control sequence, as opposed to "ESC P". | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a mention of the comment above _isC1Csi
for how we handle values in different encodings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(it's sufficient to say "See the comment above _isC1Csi for more information on how this is impacted by codepages.")
src/terminal/parser/stateMachine.cpp
Outdated
} | ||
|
||
// Routine Description: | ||
// - Determines if a character should be initiate the end of a DCS sequence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// - Determines if a character should be initiate the end of a DCS sequence. | |
// - Determines if a character should initiate the end of a DCS sequence. |
src/terminal/parser/stateMachine.cpp
Outdated
{ | ||
_trace.TraceOnAction(L"DcsPassThrough"); | ||
_trace.TraceOnExecute(wch); | ||
// TODO: actually do something |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a TODO
that deserves a followup github item? If so, file it and change it to // TODO GH#9999: Send the passthrough sequence to the engine
If it's a todo we should do now, we should do it before we merge this PR. 😄
// Routine Description: | ||
// - Moves the state machine into the DcsEntry state. | ||
// This state is entered: | ||
// 1. When the DcsEntry character is seen after an Escape entry (only from the Escape state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// 1. When the DcsEntry character is seen after an Escape entry (only from the Escape state) | |
// 1. When the DcsEntry character is seen after an Escape entry (only from the Escape state) | |
// 2. When we see a C1 DCS introducer from the Ground state |
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's right. If we add this, CSI should also have this. I plan to refactor this by fixing #7313, so I feel it's OK to leave this to that PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's right. If we add this, CSI should also have this. I plan to refactor this by fixing #7313, so I feel it's OK to leave this to that PR.
src/terminal/parser/stateMachine.cpp
Outdated
// 4. Store parameter data | ||
// 5. Collect Intermediate characters | ||
// 6. Pass through everything else | ||
// DCS sequences are structurally almost the same as CSI sequences, just with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// DCS sequences are structurally almost the same as CSI sequences, just with a | |
// DCS sequences are structurally almost the same as CSI sequences, just with an |
void StateMachine::_EventDcsIgnore(const wchar_t wch) noexcept | ||
{ | ||
_trace.TraceOnEvent(L"DcsIgnore"); | ||
if (_isStringTerminator(wch)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we also need to handle a non-ST exit from DCS? If you get into a DCSIgnore state should you be able to recover with ESC \
? (EDIT: no. see below.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may have to be another state (DcsIgnoreTerminator) because we'd need to return to DcsIgnore if we got ESC <anything>
where anything
was not \
.
Feel free to correct me if I'm wrong :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I understand, DcsIgnore is "ignore all contents until ST", right?
I know that CsiIgnore is the opposite of this (ignore only specific characters, then drop to ground)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OKAY, I understand now. When we're in DcsIgnore esc is handled normally. We get an ESC \
and it's handled at ProcessCharacter:1684 because we're not in Osc/DcsPass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah James and I had the same dicussion somewhere else. I guess this is OK.
} | ||
else | ||
{ | ||
_EventEscape(wch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chester, would you mind turning this thread into a followup issue? It seems like we should fix OSC Termination to handle ESC more correctly.
@@ -1292,10 +1681,12 @@ void StateMachine::ProcessCharacter(const wchar_t wch) | |||
_ActionExecute(wch); | |||
_EnterGround(); | |||
} | |||
else if (_isEscape(wch) && _state != VTStates::OscString) | |||
else if (_isEscape(wch) && _state != VTStates::OscString && _state != VTStates::DcsPassThrough) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a helper, _canStateBeTerminatedWithEscape()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will be in my following PR where the code structure is changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I started this review a few weeks ago, so some of the comments might already be out-of-date, but I think I'm cool with this in general.
Thanks for your patience!
src/terminal/parser/stateMachine.cpp
Outdated
{ | ||
_trace.TraceOnAction(L"DcsPassThrough"); | ||
_trace.TraceOnExecute(wch); | ||
// TODO: actually do something |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: actually do something | |
// TODO:GH#448 actually support sixel graphics |
I'm fine with this TODO comment here, let's just link it back up to the right thread 👍
src/terminal/parser/stateMachine.cpp
Outdated
_trace.TraceOnEvent(L"DcsPassThrough"); | ||
if (_isStringTerminator(wch)) | ||
{ | ||
// TODO: The Dcs sequence has successfully terminated. This is where we'd be dispatching the DCS command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably link this up to the aforementioned "do something" issue
@msftbot merge this in 4 minutes |
Hello @DHowett! Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:
If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you". |
This will probably fail to build because they've revved the version of VS on the agents, but I will keep an eye out. Chester, thanks for doing this. It's great! 😄 |
Admin merge override as the only delta between working and non-working was a comment + our build agents got upgraded to a horked version of VS. |
🎉 Handy links: |
C1 control characters are now first converted to their 7 bit equivalent. This allows us to unify the logic of C1 and C0 escape handling. This also adds support for SOS/PM/APC string. * Unify the logic for C1 and C0 escape handling by converting C1 to C0 beforehand. This adds support for various C1 characters, including IND(8/4), NEL(8/5), HTS(8/8), RI(8/13), SS2(8/14), SS3(8/15), OSC(9/13), etc. * Add support for SOS/PM/APC escape sequences. Fixes #7032 * Use "Variable Length String" logic to unify the string termination handling of OSC, DCS and SOS/PM/APC. This fixes an issue where OSC action is successfully dispatched even when terminated with non-ST character. Introduced by #6328, the DCS PassThrough is spared from this issue. This PR puts them together and add test cases for them. References: https://vt100.net/docs/vt510-rm/chapter4.html https://vt100.net/emu/dec_ansi_parser Closes #7032 Closes #7317
This PR introduces a mechanism via which DCS data strings can be passed through directly to the dispatch method that will be handling them, so the data can be processed as it is received, rather than being buffered in the state machine. This also simplifies the way string termination is handled, so it now more closely matches the behaviour of the original DEC terminals. * Initial support for DCS sequences was introduced in PR #6328. * Handling of DCS (and other) C1 controls was added in PR #7340. * This is a prerequisite for Sixel (#448) and Soft Font (#9164) support. The way this now works, a `DCS` sequence is dispatched as soon as the final character of the `VTID` is received. Based on that ID, the `OutputStateMachineEngine` should forward the call to the corresponding dispatch method, and its the responsibility of that method to return an appropriate handler function for the sequence. From then on, the `StateMachine` will pass on all of the remaining bytes in the data string to the handler function. When a data string is terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass on one final `ESC` character to let the handler know that the sequence is finished. The handler can also end a sequence prematurely by returning false, and then all remaining data bytes will be ignored. Note that once a `DCS` sequence has been dispatched, it's not possible to abort the data string. Both `CAN` and `SUB` are considered valid forms of termination, and an `ESC` doesn't necessarily have to be followed by a `\` for the string terminator. This is because the data string is typically processed as it's received. For example, when outputting a Sixel image, you wouldn't erase the parts that had already been displayed if the data string is terminated early. With this new way of handling the string termination, I was also able to simplify some of the `StateMachine` processing, and get rid of a few states that are no longer necessary. These changes don't apply to the `OSC` sequences, though, since we're more likely to want to match the XTerm behavior for those cases (which requires a valid `ST` control for the sequence to be accepted). ## Validation Steps Performed For the unit tests, I've had to make a few changes to some of the `OutputEngineTests` to account for the updated `StateMachine` processing. I've also added a new `StateMachineTest` to confirm that the data strings are correctly passed through to the string handler under all forms of termination. To test whether the framework is actually usable, I've been working on DRCS Soft Font support branched off of this PR, and haven't encountered any problems. To test the throughput speed, I also hacked together a basic Sixel parser, and that seemed to perform reasonably well. Closes #7316
Those are terrific references. I gather this request, after much back-and-forth, has been pulled and merged. I have some questions to aid my understanding. I guess I should ask somewhere else and reference this pull/feature request. |
As the title suggests, this commit adds initial support for the VT DCS
sequences. The parameters are parsed but not yet used. The pass through
data is yet to be handled. This effectively fixes #120 by making Sixel
graphics sequences ignored instead of printed.
Tests added.
References #448
Closes #120