-
Notifications
You must be signed in to change notification settings - Fork 923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
+= to concat strings is heavily underperforming #67
Comments
I would guess v8 is optimizing the code and executing it at compile-time (the same program with a console.log is similarly fast). Quickjs runs it on my machine in 1.071 seconds. An equivalent ruby program is executed in 1.209 seconds, so I think it's more a testament to how good v8 is than quickjs being slow - execution speeds are comparable to other non-jit interpreters. |
So this is kind of interesting. At first I thought this is because QuickJS lacks the unique-reference in-place concatenation optimisation. As it turns out, not quite: the optimisation is already there, but a number of implementation flaws (and some design flaws) in QuickJS prevent it from being actually useful, at least in interpreted code. The problems are as follows:
|
Good analysis. Our conclusion at the time was that it was not worth
continuing on this track because there will still be corner cases which
will give a large slowdown. Instead, a rope based string representation
may be a simpler solution.
…On 8/11/21 1:40 PM, Felix S. wrote:
So this is kind of interesting. At first I thought this is because
QuickJS lacks the unique-reference in-place concatenation optimisation
<https://blog.ganssle.io/articles/2019/11/string-concat.html>. As it
turns out, not quite: the optimisation is already there, but a number of
implementation flaws (and some design flaws) in QuickJS prevent it from
being actually useful, at least in interpreted code.
The problems are as follows:
*
*No opcode for in-place concatenation of lexically-scoped variables*
QuickJS compiles JavaScript functions into bytecode, which is then
interpreted by a stack-based virtual machine. The central
performance bottleneck is obviously |result += str1 + str2;|. This
is the unoptimised bytecode this statement compiles to:
|;; result += str1 + str2; 42: get_loc_check 0: result get_var str1
get_var str2 add add dup put_loc_check 0: result drop |
The |get_loc_check| opcode puts the value of a lexically-scoped
variable on the stack, while |put_loc_check| removes a value from
the stack and puts it in a lexically-scoped variable. The
equivalents for function-scoped (|var|) variables are |get_loc| and
|put_loc|. Values are generally reference-counted (with cycle
detection): putting a value on the stack increments its reference
count, removing it decrements it. As such, the string cannot be
concatenated in-place, because there are two references to it: one
from the stack, the other from the local variable. We would like to
have an opcode which can update the variable in place, without
incrementing the reference count as an intermediate step. Such an
opcode exists for |var| variables: its name is |add_loc|. There is
no equivalent |add_loc_check| for |let| variables.
*
*Simplistic optimiser performs peephole analysis only*
We can avoid the above flaw by declaring |result| with the |var|
keyword. But all this does is change |get_loc_check| and
|put_loc_check| into |get_loc| and |put_loc|. An optimisation step
is necessary to transform the opcode stream to use |add_loc|.
QuickJS does, in fact, have an optimiser. It is able to simplify the
|dup| / |put_loc| / |drop| sequence into just a |put_loc|. To be
able to use |add_loc|, in full generality it ought to match on
|get_loc| (X) / {Y} / |add| / |put_loc| (X) and transform it to {Y}
/ |add_loc| (X), for (Y) an arbitrary sequence of opcodes that
touches neither the local variable (X) itself nor the stack slot in
which its value is put by |get_loc|. The optimiser is nowhere near
that sophisticated; it can only perform bounded-lookahead pattern
matching against the opcode stream. There is no data flow analysis,
just dumb substitution of fixed opcode patterns.
We can avoid this flaw by putting the concatenation of |str1| and
|str2| in an intermediate variable. Adding a variable to another
variable generates an opcode sequence |get_loc| (X) / |get_loc| (Y)
/ |add| / |put_loc| (X) that QuickJS optimises into |get_loc| (Y) /
|add_loc| (X):
|;; var tmp = str1 + str2; get_var str1 get_var str2 add put_loc2 2:
tmp ;; result += tmp; get_loc2 2: tmp add_loc 0: result |
*
*Spurious /internal/ reference-count increments*
Using the |add_loc| opcode avoids putting the string on the stack,
so you would think that it would avoid incrementing the reference
count. Except that the opcode’s implementation internally increments
the reference count anyway:
op1 = JS_ConcatString(ctx, JS_DupValue(ctx, *pv), op1);
if (JS_IsException(op1))
goto exception;
set_value(ctx, pv, op1);
While the call to |JS_ConcatString| is active, there are two
references to the string: one from the temporary (created by
|JS_DupValue|), the other from the variable. |JS_ConcatString| drops
the former, |set_value| drops the latter.
We can address this with a kludge:
op1 = JS_ConcatString(ctx, *pv, op1);
if (JS_IsException(op1))
goto exception;
*pv = op1;
Kind of ugly, but it gets the job done. Now the reference count
stays at 1 the whole time.
*
*No memory allocation amortisation*
Now the punchline: even if the reference count is 1, this may still
not be enough to reuse the allocation of one of the strings.
str = js_malloc_rt(rt,sizeof(JSString) + (max_len << is_wide_char) +1 - is_wide_char);
When allocating memory for strings, QuickJS requests no more space
than required to store the string. The allocator is allowed to
provide more storage; when QuickJS detects this, the allocation is
reused. But far too often, there simply isn’t enough room, and the
opportunity never arises. This triggers the slow path where a new
string is allocated and the contents are copied manually: not even
|js_realloc| is used to reuse the old object’s data.
By adding |js_realloc| on top of the above patches, I managed to get
execution time to drop on my machine from nearly 6 seconds to about
300 ms. (Your performance may vary: I compile QuickJS either with
TinyCC or with ASan and UBSan enabled, either of which may slow
things down.)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#67 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABRQQIERDTW54YJMARWGZELT4JOUNANCNFSM45SZ7GCA>.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>.
|
I've been in awe at the excellent performance of quickjs, I really appreciate when performance emerges from simplicity and lightness, as opposed to complex layers upon layers of JIT and runtime analysis. The only issue I'm running into is the fact that a good number of JS programmers assume that string concatenation is optimized. An example of this is the official Pug parser. A 1,000-line Pug file takes over 55 seconds to parse in quickjs versus ~100 ms in Node/V8. Consider this a "+1" for this issue, I suppose! I love quickjs and I'd love to be able to trust it will work on any codebase, but until someone has time to implement the rope-based string representation I'll have to restrict my usage to the programs that won't exhibit problematic slowdowns. Thanks again for quickjs! |
Doing tests I found that using
joinStrings.js:
|
Hello @fstirlitz I didn't saw your changes in your fork at https://github.com/fstirlitz/quickjs , could you please attach or copy and paste your |
@bellard was it fixed in 2024.Jan release? |
Tested with hermes engine. They do optimize I agree the rope based string approach may be a correct direction for general string optimization. The optimization is undoubtedly required, for example: let longText;
...
// To append a newline to the end of long text, we have to duplicate the whole text.
// The time complexity is O(n) for appending one character, which is unacceptable!
longText += '\n';
// The below example takes 66531ms on my PC!
// Hermes finishs in 142ms
// And I think quickjs will behave better with similar optimization
let str = '';
for (let i = 0; i < 1000_000; i++) {
str += 'a';
} But do we need a general string optimization and does that perform better in practice? Here're my thoughts:
So in my opinions: WRITE |
I just learn that V8 doesn't adopt rope string too. Maybe we can implement similar optimization on quickjs too. Here're my brief thoughts.
Four string formats in total. Let's discuss the operations on these formats: Type 1
Slice: Copy the slice range to new string. Type 2
Slice:
Type 3
Concat: Remain Type 4 |
Thank you for your summary. Do string slices really deserve optimizing? I am not sure real world code would benefit from such an optimisation, adding more string types is somewhat painful and error prone. |
Thanks for reply. I think slice does have some weight in string operations. For example, I think you are right about it may introduce more difficulty and complexity in code implementation. But I think, they can be done progressively. We can do it simpler at first. For example, before reading If simple optimizations are not good enough, we can progressively introduce more deeper optimizations. Actually, I'm implementing them on my own when I have free time. It's a rough implementation, but I will update the benchmark comparison here after I finish. We will see if it is worthy. |
Using concat method or array push this operation takes around 0.015 - 0.030 seconds.
If you want to test, you can run next command:
Notice qjs takes a few seconds to run the code.
The text was updated successfully, but these errors were encountered: