Optimize parsing of OSC_STRING to minimize string concatenation. #1822

PerBothner · 2018-12-09T21:41:04Z

This avoids string concatenation for for each character of OSC data.
This updated version of the patch (compared to an earlier suggestion in #1813 (comment)) also handles the case when the OSC command is split across multiple calls to parse, which is more likely for really long requests).
(The handling of characters above 0x9f is a bit clunky - I think it would be better to use a magic fake character code representing characters above 0x9f, and look up that character in the table - however, that is a bigger and less local change.)

jerch

Sweet thx, did a quick benchmark - OSC now runs 25% faster. More below.

jerch · 2018-12-09T23:53:06Z

src/EscapeSequenceParser.ts

+      transition = (code < 0xa0
+        ? (table[currentState << 8 | code])
+        : currentState === ParserState.OSC_STRING
+        ? (ParserAction.OSC_PUT << 4) | ParserState.OSC_STRING


Hmm - can OSC contain higher chars at all? vt100.net does not cover this as it does not cover unicode at all.
If we have to handle unicode here maybe extend the error state? Note that I am not very fond of the way the parser handles unicode atm, maybe we even find a better way than misusing the error state.

can OSC contain higher chars at all? vt100.net does not cover this as it does not cover unicode at all.

More relevant is what xterm does. The code is a bit hard to read, but I think we're safe treating at least codes above 0xFF as printable. Codes between 0xA0 and 0xFF are probably ok too, though xterm's parse tables are indexed by 8-bit (Latin-1?) codes, not 7-bit ANSII. (Other modern well-maintained terminal emulators such as gnome-terminal are probably also worth looking at.)

(I assume you know xterm.js does a pretty poor job with vttest, so if we're concerned about compatibility there are higher priorities.)

A quick hack for characters above 0x9f is to use some suitable printable character instead:

table[currentState << 8 | (code < 0xa0 ? code : 97)]

Well the error state treats any code >0x9f as printable but limits printables to certain states. This could be extended by the OSC_STRING state.
About vttest - yeah, we have several issues open regarding this (e.g. #1434 this is a really wicked one), also we have test cases that compare the buffer state with xterm (https://github.com/xtermjs/xterm.js/tree/master/fixtures/escape_sequence_files many are still disabled since xterm.js does not the right thing yet). Basically vttest compliance boils down to InputHandler doing things slightly different still.

Perhaps a special pseudo-character for code above 0x9f:

const NON_ASCII_PRINTABLE = 0xA0; table.add(NON_ASCII_PRINTABLE, ParserState.OSC_STRING, ParserAction.OSC_PUT, ParserState.OSC_STRING);

and then:

table[currentState << 8 | (code < 0xa0 ? code : NON_ASCII_PRINTABLE)]

Yepp I like that idea 👍
Maybe we can even apply this to the other unicode aware states later on, would greatly simplify the error state (and restore its original purpose lol).

jerch · 2018-12-09T23:55:51Z

src/EscapeSequenceParser.ts

@@ -517,7 +517,16 @@ export class EscapeSequenceParser extends Disposable implements IEscapeSequenceP
          osc = '';
          break;
        case ParserAction.OSC_PUT:
-          osc += data.charAt(i);
+          for (let j = i + 1; ; j++) {


Could we simplify the check by doing an including check against printables (range check? and maybe unicode if we have to cover those)? This would avoid the table lookup which is kinda costly.
Note an including check does not have to cover all chars, we might get away with the most common. All others will still be handled by the switch.

How about this:

case ParserAction.OSC_PUT: for (let j = i + 1; ; j++) { if (j >= l || (code = data.charCodeAt(j)) <= 0x20 || (code >= 0x7f && code <= 0x9f)) { osc += data.substring(i, j); i = j - 1; break; } } break;

Oh wait this is not quite correct, put is defined as 0x20 - 0x7f, seems 0x20 and 0x7f should be added to the string. I wonder about 0x7f (DEL) though.

Definitely should be < 0x20 rather than <= 0x20.

Less sure about code >= 0x7f vs code > 0x7f. Clearly 0x7F isn't "printable". One might argue that while it should be allowed in an OSC string it is sufficiently dubious that we don't want to hardwire it the code, but instead have it be controlled by the transition table. But as long as we don't have a mechanism for overriding the transition table (and are not planning one) we might as well go with the more efficient code > 0x7f. I guess.

Yeah I think we should simply add it for now (S.E.P. for the parser 😄). DEC VTs handled this differently for the GROUND state as decribed here https://vt100.net/emu/dec_ansi_parser#STGRO.

The transition table can be overridden as ctor argument. But this is currently not exposed, we had no reason so far to do so. In theory the VT models would need slightly different tables (DEC changed alot for >VT300), the table currently resembles VT500 generation (thus not perfectly VT100 compliant).

jerch · 2018-12-10T00:04:14Z

(The handling of characters above 0x9f is a bit clunky - I think it would be better to use a magic fake character code representing characters above 0x9f, and look up that character in the table - however, that is a bigger and less local change.)

I am not very happy with the way the parser handles higher chars atm in the error state. Feel free to come up with a better approach.

Also I think to do the inner looping in the switch state is the better approach than my shortcuts I did for PRINT and CSI_PARAM at the loop top (the latter even doesnt gain much). Maybe we should refactor this as well (would also cut alot of instructions from other states, that have to test against print before doing their stuff). If you want to tackle this, feel free to do so in another PR.

Btw you can test the parser perf with https://github.com/xtermjs/xterm-benchmark, just clone it next to the xterm.js folder an run:

#> cd xterm-benchmark
#> npm install
#> npm run tsc
#> node lib/cli.js lib/xterm_perfcases/parser.js

npm run tsc might throw errors, the test will still work though.

jerch · 2018-12-10T00:27:12Z

Few more thoughts on OSC in general:
When I wrote the parser I had a hard time to decide whether OSC should be handled like PRINT and DCS (dispatching subchunks), but decided to go with a single dispatch approach and thus doing the concat in the parser. Main reason for this was that the OSC commands were rather short when I implemented it (initially >10ys ago lol). Since then ppl started to use OSC for more advanced stuff with bigger payloads. So I am not sure anymore whether this is suitable or OSC should get a more advanced handling similar to DCS. Also the finalization in OSC_END with the number parsing is kinda a hack on the parser and not officially part of it.

PerBothner · 2018-12-10T03:05:41Z

I added a commit with the changes we discussed.

jerch

👍 LGTM, just a minor fix needed, see below. Gonna test it tom.

jerch · 2018-12-10T03:16:14Z

src/EscapeSequenceParser.ts

@@ -79,7 +80,7 @@ export const VT500_TRANSITION_TABLE = (function (): TransitionTable {
  const states: number[] = r(ParserState.GROUND, ParserState.DCS_PASSTHROUGH + 1);
  let state: any;

-  // table with default transition [any] --> DEFAULT_TRANSITION
+  // table with default transition
  for (state in states) {
    // NOTE: table lookup is capped at 0xa0 in parse to keep the table small
    for (let code = 0; code < 160; ++code) {


Changing the transition table access in parse to code <= 0xa0 would need to include 0xa0 here as well, otherwise the other unicode aware states will be handled with IGNORE/GROUND transition.

jerch · 2018-12-10T03:31:31Z

@PerBothner Created an issue #1823 to track the ideas we discussed above for other parser parts.

Tyriar · 2018-12-10T15:35:31Z

A lot of tests are broken:

2018-12-10T03:06:24.3283958Z   57 failing
2018-12-10T03:06:24.3286165Z 
2018-12-10T03:06:24.3289851Z   1) Buffer stringIndexToBufferIndex combining é in a sentence:
2018-12-10T03:06:24.3290434Z      AssertionError: expected 'Sitting in the café drinking coffee.' to equal 'Sitting in the cafe drinking coffee.'
2018-12-10T03:06:24.3290789Z   
2018-12-10T03:06:24.3290965Z 
2018-12-10T03:06:24.3292229Z   2) Buffer stringIndexToBufferIndex multiline combining é:
2018-12-10T03:06:24.3292790Z      AssertionError: expected 'ééééééééééééééé' to equal 'eeeeeeeeeeeeeee'

https://dev.azure.com/xtermjs/xterm.js/_build/results?buildId=877

jerch · 2018-12-10T15:38:21Z

@Tyriar Should be fixed now.

jerch · 2018-12-10T19:16:55Z

@PerBothner Thx alot for the PR, works like charm and throughput went from 18 to 27 MB/s 👍. Esp. bigger payloads will benefit from this (only tested with a 2 char payload string).

Tyriar

Thanks @PerBothner 👍

Optimize parsing of OSC_STRING to minimize string concatenation.

1c4a71f

PerBothner mentioned this pull request Dec 9, 2018

hooks for custom control sequences #1813

Closed

jerch requested changes Dec 9, 2018

View reviewed changes

Further tweaks to optimize parsing of OSC_STRING.

2cb293c

jerch approved these changes Dec 10, 2018

View reviewed changes

jerch mentioned this pull request Dec 10, 2018

cleanup/enhancement of parser #1823

Closed

11 tasks

fix default transition for 0xa0

2d8f420

jerch closed this Dec 10, 2018

jerch reopened this Dec 10, 2018

jerch added type/enhancement Features or improvements to existing features area/performance area/parser labels Dec 10, 2018

jerch added this to the 3.10.0 milestone Dec 10, 2018

jerch and others added 2 commits December 10, 2018 20:17

Merge branch 'master' into optimize-osc-parse

6a2e6a5

Merge branch 'master' into optimize-osc-parse

bbbfc04

Tyriar approved these changes Dec 11, 2018

View reviewed changes

Tyriar merged commit a758f8c into xtermjs:master Dec 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize parsing of OSC_STRING to minimize string concatenation. #1822

Optimize parsing of OSC_STRING to minimize string concatenation. #1822

PerBothner commented Dec 9, 2018

jerch left a comment •

edited

Loading

jerch Dec 9, 2018 •

edited

Loading

PerBothner Dec 10, 2018

jerch Dec 10, 2018 •

edited

Loading

PerBothner Dec 10, 2018

jerch Dec 10, 2018

jerch Dec 9, 2018

PerBothner Dec 10, 2018

jerch Dec 10, 2018 •

edited

Loading

PerBothner Dec 10, 2018

jerch Dec 10, 2018

jerch commented Dec 10, 2018 •

edited

Loading

jerch commented Dec 10, 2018 •

edited

Loading

PerBothner commented Dec 10, 2018

jerch left a comment

jerch Dec 10, 2018

jerch commented Dec 10, 2018

Tyriar commented Dec 10, 2018

jerch commented Dec 10, 2018

jerch commented Dec 10, 2018

Tyriar left a comment

Optimize parsing of OSC_STRING to minimize string concatenation. #1822

Optimize parsing of OSC_STRING to minimize string concatenation. #1822

Conversation

PerBothner commented Dec 9, 2018

jerch left a comment • edited Loading

Choose a reason for hiding this comment

jerch Dec 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerch Dec 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerch Dec 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerch commented Dec 10, 2018 • edited Loading

jerch commented Dec 10, 2018 • edited Loading

PerBothner commented Dec 10, 2018

jerch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerch commented Dec 10, 2018

Tyriar commented Dec 10, 2018

jerch commented Dec 10, 2018

jerch commented Dec 10, 2018

Tyriar left a comment

Choose a reason for hiding this comment

jerch left a comment •

edited

Loading

jerch Dec 9, 2018 •

edited

Loading

jerch Dec 10, 2018 •

edited

Loading

jerch Dec 10, 2018 •

edited

Loading

jerch commented Dec 10, 2018 •

edited

Loading

jerch commented Dec 10, 2018 •

edited

Loading