Core: `Token.stringify` will now call `util.encode` #1844

RunDevelopment · 2019-03-28T23:12:29Z

This PR changes the behavior of Token.stringify so that the function will now encode strings itself rather than relying on pre-encoded token streams.

This also simplified util.encode because it doesn't need to be able to handle token streams anymore.

It's also more efficient like this because encode doesn't have to create a deep copy of the token stream.

Btw. No language or plugin uses encode, so this should be a purely internal change.

ExE-Boss

Maybe get rid of util.encode(…) altogether by moving the string replacement code into Token.stringify(…).

components/prism-core.js

Golmote

LGTM!

Golmote · 2019-04-22T19:12:25Z

Hmm on a second thought, Token is accessible from the outside isn't it? This could be considered a BC breaking change, even though Token was probably undocumented.

ExE-Boss · 2019-04-22T19:18:44Z

The method is marked as private in the documentation: #1782.

Golmote · 2019-04-22T19:55:38Z

Which is a good thing, but this doc is not merged yet, so we have to account for what the users have been given so far, which is mainly this page of the website... where Prism.Token is mentioned (but it's API is not described).
I just don't wanna repeat the situation we had a bit ago, when Prism broke for people because they were relying on unspecified behavior. Better safe than sorry.

ExE-Boss · 2019-04-22T20:00:39Z

Unfortunately, @types/prismjs had documented it, albeit very wrongly.

The current version has a warning about it being private.

ExE-Boss

I’d say let’s merge this.

RunDevelopment · 2019-05-25T23:33:54Z

I'm actually with @Golmote on this one. This is technically a breaking change. Let's merge this after JSDoc is through, so people can know how the API changed.

(PrismJS v2.0 when?)

RunDevelopment · 2020-09-18T16:32:06Z

@mAAdhaTTah @Golmote
You guys think we can merge this now? Public doc is available, so I don't feel bad changing purely internal functions.

I also recently read an article where this was mentioned. So I made a little benchmark and by using a simpler escapeHTML function Prism.highlight was 1.13x faster on average. 13% is pretty impressive because Prism.highlight also does the whole tokenization plus the stringification.

The simplified function used:
(This version can be used to escape attribute values as well.)

var ATTR_REGEX = /[&<"]/;
/**
 * This is a fast HTML escaping function.
 *
 * All credit goes to Vladimir Kutepov (@frenzzy on GitHub).
 *
 * @param {string} html
 * @returns {string}
 */
function escapeHTML(html) {
	var match = ATTR_REGEX.exec(html);
	if (!match) return html;
	var index = 0;
	var lastIndex = 0;
	var out = '';
	var escape = '';
	for (index = match.index; index < html.length; index++) {
		switch (html.charCodeAt(index)) {
			case 34: // "
				escape = '&quot;';
				break;
			case 38: // &
				escape = '&amp;';
				break;
			case 60: // <
				escape = '&lt;';
				break;
			default:
				continue;
		}
		if (lastIndex !== index) out += html.substring(lastIndex, index);
		lastIndex = index + 1;
		out += escape;
	}
	return lastIndex !== index ? out + html.substring(lastIndex, index) : out;
}

Benchmark log:

Found 9 cases with 27 files in total.
Test 2 candidates with Prism.highlight
Estimated duration: 2m 42s

------------------------------------------------------------

c

  https://raw.githubusercontent.com/git/git/master/mergesort.c (1 kB)
  | local              0.17ms ±  2%   44smp
  | PrismJS@master     0.21ms ±  0%   50smp 1.25x
  https://raw.githubusercontent.com/git/git/master/mergesort.h (1 kB)
  | local              0.03ms ±  1%   48smp
  | PrismJS@master     0.04ms ±  0%   54smp 1.19x
  https://raw.githubusercontent.com/git/git/master/remote.c (58 kB)
  | local              6.62ms ±  1%   49smp
  | PrismJS@master     9.53ms ±  1%   44smp 1.44x
  https://raw.githubusercontent.com/git/git/master/remote.h (10 kB)
  | local              0.60ms ±  0%   51smp
  | PrismJS@master     0.77ms ±  0%   54smp 1.28x

------------------------------------------------------------

css

  ../../assets/style.css (7 kB)
  | local              1.34ms ±  1%   53smp 1.07x
  | PrismJS@master     1.26ms ±  0%   52smp

------------------------------------------------------------

css!+css-extras (css)

  ../../assets/style.css (7 kB)
  | local              1.54ms ±  1%   52smp
  | PrismJS@master     2.20ms ±  0%   50smp 1.43x

------------------------------------------------------------

javascript

  ../../assets/utopia.js (11 kB)
  | local              1.93ms ±  1%   52smp
  | PrismJS@master     2.24ms ±  0%   53smp 1.16x
  ../../components.json (27 kB)
  | local              5.48ms ±  0%   48smp
  | PrismJS@master     5.77ms ±  0%   50smp 1.05x
  ../../package-lock.json (206 kB)
  | local             37.60ms ±  2%   25smp 1.01x
  | PrismJS@master    37.41ms ±  5%   27smp
  https://cdnjs.cloudflare.com/ajax/libs/prism/1.20.0/prism.js (29 kB)
  | local              5.33ms ±  1%   49smp
  | PrismJS@master     6.28ms ±  1%   46smp 1.18x
  https://cdnjs.cloudflare.com/ajax/libs/prism/1.20.0/prism.min.js (14 kB)
  | local              4.64ms ±  1%   48smp
  | PrismJS@master     5.22ms ±  0%   50smp 1.13x
  https://code.jquery.com/jquery-3.4.1.js (274 kB)
  | local             96.38ms ±  4%   16smp
  | PrismJS@master   103.66ms ±  5%   15smp 1.08x
  https://code.jquery.com/jquery-3.4.1.min.js (86 kB)
  | local             69.70ms ±  8%   19smp
  | PrismJS@master    71.89ms ±  5%   21smp 1.03x

------------------------------------------------------------

json

  ../../components.json (27 kB)
  | local              2.65ms ±  1%   50smp
  | PrismJS@master     3.36ms ±  1%   49smp 1.27x
  ../../package-lock.json (206 kB)
  | local             28.48ms ±  3%   34smp 1.19x
  | PrismJS@master    23.99ms ±  2%   31smp

------------------------------------------------------------

markup

  ../../download.html (4 kB)
  | local              0.37ms ±  1%   51smp
  | PrismJS@master     0.41ms ±  0%   52smp 1.13x
  ../../index.html (19 kB)
  | local              2.03ms ±  0%   52smp
  | PrismJS@master     2.31ms ±  0%   50smp 1.14x
  https://github.com/PrismJS/prism (193 kB)
  | local             34.04ms ±  5%   28smp
  | PrismJS@master    41.18ms ±  4%   24smp 1.21x

------------------------------------------------------------

markup!+css+javascript (markup)

  ../../download.html (4 kB)
  | local              0.72ms ±  1%   49smp 1.03x
  | PrismJS@master     0.70ms ±  0%   53smp
  ../../index.html (19 kB)
  | local              2.93ms ±  2%   46smp
  | PrismJS@master     3.19ms ±  1%   48smp 1.09x
  https://github.com/PrismJS/prism (193 kB)
  | local             53.13ms ±  6%   19smp 1.17x
  | PrismJS@master    45.59ms ±  4%   21smp

------------------------------------------------------------

ruby

  https://raw.githubusercontent.com/rails/rails/master/actionview/lib/action_view/base.rb (12 kB)
  | local              0.58ms ±  0%   53smp
  | PrismJS@master     0.68ms ±  0%   54smp 1.18x
  https://raw.githubusercontent.com/rails/rails/master/actionview/lib/action_view/layouts.rb (16 kB)
  | local              0.66ms ±  0%   53smp
  | PrismJS@master     0.78ms ±  0%   54smp 1.17x
  https://raw.githubusercontent.com/rails/rails/master/actionview/lib/action_view/template.rb (14 kB)
  | local              0.79ms ±  0%   53smp
  | PrismJS@master     0.92ms ±  0%   53smp 1.17x

------------------------------------------------------------

rust

  https://raw.githubusercontent.com/rust-lang/regex/master/src/compile.rs (42 kB)
  | local              6.55ms ±  0%   48smp
  | PrismJS@master     7.87ms ±  1%   45smp 1.20x
  https://raw.githubusercontent.com/rust-lang/regex/master/src/lib.rs (28 kB)
  | local              0.59ms ±  0%   52smp
  | PrismJS@master     0.60ms ±  0%   55smp 1.01x
  https://raw.githubusercontent.com/rust-lang/regex/master/src/utf8.rs (9 kB)
  | local              1.47ms ±  0%   54smp
  | PrismJS@master     1.66ms ±  0%   55smp 1.12x

------------------------------------------------------------

summary
                  best  worst  relative
  local             22      5     1.00x
  PrismJS@master     5     22     1.13x

JaKXz

If this rebases and passes tests I think we can release it as v1.26? the benchmarks look promising :)

RunDevelopment · 2021-12-15T11:51:55Z

I think we can release it as v1.26?

I tagged this as v2.0 because this is potentially a breaking change to some people. As mentioned above, @types/prismjs does document Token.stringify, so we have to assume that some people rely on its current behavior. (Yes, I changed my stance on this one.)

I also don't see any urgency to merge this. This PR doesn't block anything, nor is the improvement felt by our users.

the benchmarks look promising :)

That benchmark isn't for this change but the faster HTML escape function. Despite removing a deep copy of every token stream, this PR doesn't actually improve performance.

RunDevelopment · 2022-09-02T10:03:18Z

Implemented in v2.

RunDevelopment added 2 commits March 29, 2019 00:04

Token.stringify will now call encode

08636b4

Strict equals

6ada653

RunDevelopment added enhancement needs review labels Mar 28, 2019

ExE-Boss reviewed Mar 31, 2019

View reviewed changes

components/prism-core.js Show resolved Hide resolved

Golmote approved these changes Apr 22, 2019

View reviewed changes

ExE-Boss mentioned this pull request May 25, 2019

Diff: Added support for syntax highlighting inside diffs #1889

Merged

ExE-Boss approved these changes May 25, 2019

View reviewed changes

RunDevelopment mentioned this pull request May 25, 2019

Add a util.decode function #1910

Open

RunDevelopment added the core label Aug 26, 2019

mAAdhaTTah added this to the 2.0 milestone Sep 3, 2019

RunDevelopment mentioned this pull request Aug 11, 2020

HTML entities not escaped correctly when highlighting HTML code. #2516

Closed

RunDevelopment removed this from the 2.0 milestone Sep 18, 2020

Merge branch 'master' into stringify-calls-encode

09f799f

RunDevelopment mentioned this pull request Nov 21, 2020

[Request] Faster HTML escaping? highlightjs/highlight.js#2885

Closed

RunDevelopment added this to the 2.0 milestone May 4, 2021

JaKXz approved these changes Dec 14, 2021

View reviewed changes

RunDevelopment closed this Sep 2, 2022

RunDevelopment deleted the stringify-calls-encode branch September 2, 2022 10:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: `Token.stringify` will now call `util.encode` #1844

Core: `Token.stringify` will now call `util.encode` #1844

RunDevelopment commented Mar 28, 2019 •

edited

Loading

ExE-Boss left a comment

Golmote left a comment

Golmote commented Apr 22, 2019

ExE-Boss commented Apr 22, 2019

Golmote commented Apr 22, 2019

ExE-Boss commented Apr 22, 2019

ExE-Boss left a comment

RunDevelopment commented May 25, 2019

RunDevelopment commented Sep 18, 2020

JaKXz left a comment

RunDevelopment commented Dec 15, 2021

RunDevelopment commented Sep 2, 2022

Core: Token.stringify will now call util.encode #1844

Core: Token.stringify will now call util.encode #1844

Conversation

RunDevelopment commented Mar 28, 2019 • edited Loading

ExE-Boss left a comment

Choose a reason for hiding this comment

Golmote left a comment

Choose a reason for hiding this comment

Golmote commented Apr 22, 2019

ExE-Boss commented Apr 22, 2019

Golmote commented Apr 22, 2019

ExE-Boss commented Apr 22, 2019

ExE-Boss left a comment

Choose a reason for hiding this comment

RunDevelopment commented May 25, 2019

RunDevelopment commented Sep 18, 2020

JaKXz left a comment

Choose a reason for hiding this comment

RunDevelopment commented Dec 15, 2021

RunDevelopment commented Sep 2, 2022

Core: `Token.stringify` will now call `util.encode` #1844

Core: `Token.stringify` will now call `util.encode` #1844

RunDevelopment commented Mar 28, 2019 •

edited

Loading