Performance drop using cloneDeep of lodash #471

karol-bisztyga · 2021-03-17T11:03:47Z

Bug Description

There is a significant performance drop using cloneDeep of lodash, it takes approximately 25-45 times longer to execute when running with Hermes on android.

I have run gradle clean and confirmed this bug does not occur with JSC

Hermes version: hermes-engine: 0.5.1
React Native version (if any): 0.63.4
OS version (if any): Android 10 x86 Pixel 3 simulator
Platform (most likely one of arm64-v8a, armeabi-v7a, x86, x86_64): x86_64

Minimal reproducing repository

https://github.com/karol-bisztyga/hermes-repro
There are 2 commits:

init - use it to test without Hermes
with hermes - use it to test with Hermes

For reproducing this I used a randomly generated JSON file, which is here, it is a 1MB file(just FYI, you don't have to worry about this, it's hardcoded in the repro already).

Tests results

Running sample code from the repro provided I got following results(that stand for times of execution of doing JSON.parse, clondeDeep and both combined):

- no hermes:
{ parse: [ 32, 11, 16, 14, 17, 15, 14, 17, 14, 12 ],
      clone: [ 218, 209, 114, 122, 117, 114, 117, 123, 111, 118 ],
      parseClone: [ 235, 137, 141, 132, 125, 142, 131, 131, 141, 136 ] }
- with hermes:
{ parse: [ 24, 18, 22, 17, 17, 13, 18, 13, 18, 12 ],
      clone: [ 5667, 5318, 5500, 5395, 5576, 5760, 5553, 5349, 5669, 5455 ],
      parseClone: [ 5405, 5388, 5787, 5452, 5426, 5360, 5381, 5364, 5455, 5390 ] }

The Expected Behavior

Similar time of execution of clondeDeep with and without Hermes

Note

I didn't know how to categorize it so I went for the Bug, sorry if that's not the best way, I couldn't find a category for the performance drop.

The text was updated successfully, but these errors were encountered:

dulinriley · 2021-03-18T02:42:55Z

I think "enhancement" combined with "performance" is a better set of tags.
"bug" is reserved for Hermes producing incorrect output or crashing, which doesn't seem to be the case here (assuming the values you list are times, not the output).

karol-bisztyga · 2021-03-18T06:48:51Z

Those are times of execution, I'll update the description as it might've been misleading, thanks for clarifying.

Summary: Assuming that the `cloneDeep` function has been introduced in order to avoid memory leaks/inefficient memory usage, an investigation checking whether it still occurs without cloning has been performed. Apparently this is not an issue anymore. The main reason for removing this function is our will to switch to Hermes. Using `cloneDeep` with this engine, a significant performance drop can be observed, the problem is described [here](facebook/hermes#471). Test Plan: Test results are presented in [this](https://phabricator.ashoat.com/D884#21096) and two following comments. We also ran a test logging in to the comm app with an account that has a lot of data to be fetched and parsed(and, up to this point, clonned). Reviewers: palys-swm, ashoat Reviewed By: palys-swm, ashoat Subscribers: ashoat, KatPo, zrebcu411, Adrian, atul, subnub Differential Revision: https://phabricator.ashoat.com/D916

andreialecu · 2021-04-08T16:04:00Z

Sorry to hijack this thread, but I believe this could be related.

It appears that polyfilling Intl is extremely slow as well using Hermes. I forked @karol-bisztyga's repro to show an issue with formatting dates.

Hermes is 3 times slower than JSC, and it makes working with i18n/timezoned dates almost impossible. It takes around 5ms to format one single date(!).

Copying my comment from: #23 (comment)

Sample benchmark using luxon and @formatjs on Android via Hermes:

  for (let y = 0; y < 5; y++) {
    const start = Date.now();
    for (let i = 0; i < 100; i++) {
      DateTime.local().toLocaleString();
    }
    console.log(Date.now() - start);
  }

Output:

LOG  423
LOG  450
LOG  575
LOG  450
LOG  421

I ran the same benchmark using JSC instead of Hermes on Android and the times are 150-170ms per 100 runs.

So it seems to be a Hermes performance issue as well, probably same root cause as #471

Here's a repro on both engines:
https://github.com/andreialecu/hermes-repro/tree/jsc-intl
https://github.com/andreialecu/hermes-repro/tree/hermes-intl

The same benchmark on an iOS emulator (iOS has native Intl so it does not need a polyfill) using JavaScriptCore logs:

 LOG  8
 LOG  0
 LOG  1
 LOG  1
 LOG  0

dulinriley · 2021-04-08T16:53:32Z

@andreialecu thanks for letting us know, but we don't know for certain yet what the root cause of the performance issue is with the original example from the post. So it's hard to say if they're related or not yet.

We're currently trying to isolate the particular lines of code that take the most time. If you'd like to know this for your own app, you can use https://reactnative.dev/docs/next/profile-hermes to get a profile of the JS that is run. That might help us figure out the root cause of slow performance

andreialecu · 2021-04-08T17:07:31Z

Hey @dulinriley, thanks for the reply.

I actually used the hermes profiler to track down the issue to formatjs's polyfills.

A profile session on https://github.com/andreialecu/hermes-repro/tree/hermes-intl would likely help troubleshoot this on your end.

dulinriley · 2021-04-10T01:05:55Z

Ok so we spent some time profiling this. There are two main functions with lodash taking most of the time (> 70%):

assocIndexOf: this looks through a stack to find a key match. It's sort of like a hand-rolled Map object with a specific ordering
eq: This is literally just x === y || (x !== x && y !== y)

Given the latter is fast everywhere, we find that it's actually a function calling issue. Inlining the latter gives about a 40% speedup.

The former is used by a stack of circular references in cloneDeep. Basically it's about 4500 deep by the end of the clone, and it never has a circular reference so there's no match. It would be much faster to use a Map or Set for this, not sure why lodash doesn't use one.

We're not sure yet how to speed up searching through the medium size (4500) array of key/value pairs, nor are we sure why v8 is faster than us (in jitless mode).

What we can do is work on our inliner, since the eq function should be trivially inlinable even statically: it can't throw, has no strictness changes, no arguments usage, and no captured variables. We don't do it today because we only do this for functions called exactly once.

We still don't have any fixes ready to land yet, but in the meantime if this is blocking you, you can try copying the cloneDeep implementation and remove the stack checks. That brings the runtime on my machine from 500 ms down to 50 ms. It works fine for non-circular objects like a JSON object.

avp · 2021-04-10T02:36:00Z

I spent some more time and I think I have a more detailed diagnosis.

lodash has a failover mechanism which allows it to turn ListCache into a Map specifically to avoid the bad case of iterating over all the entries in the Array. It triggers within the stackSet call.

So, what prevents that mechanism from triggering?

  function stackSet(key, value) {
    var data = this.__data__;
    if (data instanceof ListCache) {
      var pairs = data.__data__;
      if (!Map || (pairs.length < LARGE_ARRAY_SIZE - 1)) {
        pairs.push([key, value]);
        this.size = ++data.size;
        return this;
      }
      data = this.__data__ = new MapCache(pairs);
    }
    data.set(key, value);
    this.size = data.size;
    return this;
  }

Notice that we have a check for !Map in addition to the size check on LARGE_ARRAY_SIZE. Other engines are populating Map, while Hermes is not.

Here's how Map is populated:

      Map = getNative(root, 'Map'),

getNative eventually calls into baseIsNative:

  function baseIsNative(value) {
    if (!isObject(value) || isMasked(value)) {
      return false;
    }
    var pattern = isFunction(value) ? reIsNative : reIsHostCtor;
    return pattern.test(toSource(value));
  }

Here it effectively calls toString on value (which here is the Hermes-implemented native Map), and runs a pattern match on it. The pattern is going to be reIsNative (because Map is indeed a function).

Thus we come to reIsNative:

  /** Used to detect if a method is native. */
  var reIsNative = RegExp('^' +
    funcToString.call(hasOwnProperty).replace(reRegExpChar, '\\$&')
    .replace(/hasOwnProperty|(function).*?(?=\\\()| for .+?(?=\\\])/g, '$1.*?') + '$'
  );

The regexp which is being matched against is calling toString on Object.hasOwnProperty in particular, and replacing the word hasOwnProperty with some wildcards.

In other engines it ends up looking like this:

/^function.*?\(\) \{\n    \[native code\]\n\}$/

In Hermes, however, we show how many arguments the function's .length property expects:

/^function.*?\(a0\) \{ \[native code\] \}$/

This means that we fail the check for Map (which expects no arguments, and therefore does not match the regexp) and lodash never gets to use the Map implementation which Hermes provides.

I changed the line that sets up Map to simply set it unconditionally and the regression was mostly mitigated.

We'll look into what's necessary to fix this.

tmikov · 2021-04-10T23:56:53Z

So, ideally this should be fixed with a patch to Lodash?

andreialecu · 2021-04-11T10:58:27Z

It seems Hermes could still be up to 3x slower than jsc in some tasks. The jsc vs Hermes benchmark I linked above shows a big performance gap between the two.

It could be because of inlining, so I'm hopeful things can improve.

tmikov · 2021-04-12T02:14:00Z

@andreialecu can you please file a separate issue? The performance of Intl is most likely not related to this.

andreialecu · 2021-04-12T07:10:42Z

Well, it's not exactly about the performance of Intl, because that doesn't exist in Hermes. It's about the performance of some polyfills, so it's a JS based implementation that is slow.

I have opened #495 however. The findings in this thread related to function inlining are probably relevant.

Huxpro · 2021-04-15T05:27:28Z

Hey folks, we've changed Hermes to print any NativeFunction without formal parameters to fix this. You should get a perf boost on RN 0.65 with next Hermes release.

Ideally, this should be a fix on Lodash. However, the reality seems to suggest that this practice might be too common that it's not worthy for us to diverge with all other engines for the sake of practicality.

We extracted the benchmark from @karol-bisztyga's original one to make it easier to run on engines directly, and I was able to confirm that this change (582a529) improved Hermes performance on the benchmark from 787 to 43, which is in the same ballpark with other engines when interpreting (numbers are the median of 3 runs on my mac pro):

v8 --jitless: 30
jsc --useJIT=false: 44
qjs: 60

karol-bisztyga · 2021-04-15T08:13:17Z

Thanks everyone for the attention, research, and solution 🙌 This is amazing! 🚀

karol-bisztyga added the bug Something isn't working label Mar 17, 2021

dulinriley added enhancement New feature or request performance and removed bug Something isn't working labels Mar 18, 2021

andreialecu mentioned this issue Apr 8, 2021

Intl Support #23

Closed

Huxpro mentioned this issue Apr 10, 2021

[Normative] NativeFunction toString is too permissive vs. the reality tc39/ecma262#2381

Open

andreialecu mentioned this issue Apr 12, 2021

3x Performance drop vs JSC in some tasks #495

Open

facebook-github-bot closed this as completed in 582a529 Apr 14, 2021

Huxpro mentioned this issue Apr 30, 2021

Road to v0.65 Discussion react-native-community/releases#223

Closed

38 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance drop using cloneDeep of lodash #471

Performance drop using cloneDeep of lodash #471

karol-bisztyga commented Mar 17, 2021 •

edited

Loading

dulinriley commented Mar 18, 2021

karol-bisztyga commented Mar 18, 2021

andreialecu commented Apr 8, 2021

dulinriley commented Apr 8, 2021

andreialecu commented Apr 8, 2021

dulinriley commented Apr 10, 2021

avp commented Apr 10, 2021 •

edited

Loading

tmikov commented Apr 10, 2021

andreialecu commented Apr 11, 2021

tmikov commented Apr 12, 2021

andreialecu commented Apr 12, 2021

Huxpro commented Apr 15, 2021 •

edited

Loading

karol-bisztyga commented Apr 15, 2021

Performance drop using cloneDeep of lodash #471

Performance drop using cloneDeep of lodash #471

Comments

karol-bisztyga commented Mar 17, 2021 • edited Loading

Bug Description

Minimal reproducing repository

Tests results

The Expected Behavior

Note

dulinriley commented Mar 18, 2021

karol-bisztyga commented Mar 18, 2021

andreialecu commented Apr 8, 2021

dulinriley commented Apr 8, 2021

andreialecu commented Apr 8, 2021

dulinriley commented Apr 10, 2021

avp commented Apr 10, 2021 • edited Loading

tmikov commented Apr 10, 2021

andreialecu commented Apr 11, 2021

tmikov commented Apr 12, 2021

andreialecu commented Apr 12, 2021

Huxpro commented Apr 15, 2021 • edited Loading

karol-bisztyga commented Apr 15, 2021

karol-bisztyga commented Mar 17, 2021 •

edited

Loading

avp commented Apr 10, 2021 •

edited

Loading

Huxpro commented Apr 15, 2021 •

edited

Loading