Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize parsing of OSC_STRING to minimize string concatenation. #1822

Merged
merged 5 commits into from
Dec 11, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 15 additions & 9 deletions src/EscapeSequenceParser.ts
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,8 @@ const PRINTABLES = r(0x20, 0x7f);
const EXECUTABLES = r(0x00, 0x18);
EXECUTABLES.push(0x19);
EXECUTABLES.push.apply(EXECUTABLES, r(0x1c, 0x20));
const DEFAULT_TRANSITION = ParserAction.ERROR << 4 | ParserState.GROUND;
// Pseudo-character placeholder for printable non-ascii characters.
const NON_ASCII_PRINTABLE = 0xA0;

/**
* VT500 compatible transition table.
Expand All @@ -79,10 +80,10 @@ export const VT500_TRANSITION_TABLE = (function (): TransitionTable {
const states: number[] = r(ParserState.GROUND, ParserState.DCS_PASSTHROUGH + 1);
let state: any;

// table with default transition [any] --> DEFAULT_TRANSITION
// table with default transition
for (state in states) {
// NOTE: table lookup is capped at 0xa0 in parse to keep the table small
for (let code = 0; code < 160; ++code) {
for (let code = 0; code <= NON_ASCII_PRINTABLE; ++code) {
table.add(code, state, ParserAction.ERROR, ParserState.GROUND);
}
}
Expand Down Expand Up @@ -184,6 +185,7 @@ export const VT500_TRANSITION_TABLE = (function (): TransitionTable {
table.addMany(PRINTABLES, ParserState.DCS_PASSTHROUGH, ParserAction.DCS_PUT, ParserState.DCS_PASSTHROUGH);
table.add(0x7f, ParserState.DCS_PASSTHROUGH, ParserAction.IGNORE, ParserState.DCS_PASSTHROUGH);
table.addMany([0x1b, 0x9c], ParserState.DCS_PASSTHROUGH, ParserAction.DCS_UNHOOK, ParserState.GROUND);
table.add(NON_ASCII_PRINTABLE, ParserState.OSC_STRING, ParserAction.OSC_PUT, ParserState.OSC_STRING);
return table;
})();

Expand Down Expand Up @@ -391,7 +393,7 @@ export class EscapeSequenceParser extends Disposable implements IEscapeSequenceP
}

// normal transition & action lookup
transition = (code < 0xa0) ? (table[currentState << 8 | code]) : DEFAULT_TRANSITION;
transition = table[currentState << 8 | (code < 0xa0 ? code : NON_ASCII_PRINTABLE)];
switch (transition >> 4) {
case ParserAction.PRINT:
print = (~print) ? print : i;
Expand Down Expand Up @@ -423,10 +425,6 @@ export class EscapeSequenceParser extends Disposable implements IEscapeSequenceP
case ParserState.GROUND:
print = (~print) ? print : i;
break;
case ParserState.OSC_STRING:
osc += String.fromCharCode(code);
transition |= ParserState.OSC_STRING;
break;
case ParserState.CSI_IGNORE:
transition |= ParserState.CSI_IGNORE;
break;
Expand Down Expand Up @@ -517,7 +515,15 @@ export class EscapeSequenceParser extends Disposable implements IEscapeSequenceP
osc = '';
break;
case ParserAction.OSC_PUT:
osc += data.charAt(i);
for (let j = i + 1; ; j++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify the check by doing an including check against printables (range check? and maybe unicode if we have to cover those)? This would avoid the table lookup which is kinda costly.
Note an including check does not have to cover all chars, we might get away with the most common. All others will still be handled by the switch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:

    case ParserAction.OSC_PUT:
      for (let j = i + 1; ; j++) {
        if (j >= l
            || (code = data.charCodeAt(j)) <= 0x20
            || (code >= 0x7f && code <= 0x9f)) {
          osc += data.substring(i, j);
          i = j - 1;
          break;
        }
      }
      break;

Copy link
Member

@jerch jerch Dec 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait this is not quite correct, put is defined as 0x20 - 0x7f, seems 0x20 and 0x7f should be added to the string. I wonder about 0x7f (DEL) though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely should be < 0x20 rather than <= 0x20.

Less sure about code >= 0x7f vs code > 0x7f. Clearly 0x7F isn't "printable". One might argue that while it should be allowed in an OSC string it is sufficiently dubious that we don't want to hardwire it the code, but instead have it be controlled by the transition table. But as long as we don't have a mechanism for overriding the transition table (and are not planning one) we might as well go with the more efficient code > 0x7f. I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we should simply add it for now (S.E.P. for the parser 😄). DEC VTs handled this differently for the GROUND state as decribed here https://vt100.net/emu/dec_ansi_parser#STGRO.

The transition table can be overridden as ctor argument. But this is currently not exposed, we had no reason so far to do so. In theory the VT models would need slightly different tables (DEC changed alot for >VT300), the table currently resembles VT500 generation (thus not perfectly VT100 compliant).

if (j >= l
|| (code = data.charCodeAt(j)) < 0x20
|| (code > 0x7f && code <= 0x9f)) {
osc += data.substring(i, j);
i = j - 1;
break;
}
}
break;
case ParserAction.OSC_END:
if (osc && code !== 0x18 && code !== 0x1a) {
Expand Down