Use heap allocated buffer instead of char collection in ecma_strings #678

dbatyai · 2015-10-20T16:11:32Z

Remove char collection from ecma_strings, and use a heap buffer instead, where we also store string size and length.

Results (measured on rpi2):

Benchmark	RSS (+ is better)	Perf (+ is better)
3d-cube.js	136 -> 136 (0)	3.728 -> 3.71 (0.4828)
3d-raytrace.js	304 -> 264 (13.1579)	5.918 -> 5.554 (6.1507)
access-binary-trees.js	88 -> 92 (-4.545)	2.72 -> 2.72 (0)
access-fannkuch.js	52 -> 52 (0)	10.03 -> 9.94 (0.8973)
access-nbody.js	68 -> 68 (0)	4.65 -> 4.63 (0.4301)
bitops-3bit-bits-in-byte.js	40 -> 40 (0)	3.18 -> 3.13 (1.5723)
bitops-bits-in-byte.js	40 -> 40 (0)	4.35 -> 4.36 (-0.23)
bitops-bitwise-and.js	36 -> 36 (0)	4.26 -> 4.26 (0)
controlflow-recursive.js	224 -> 224 (0)	3.23 -> 3.17 (1.8576)
crypto-aes.js	156 -> 152 (2.5641)	5.96 -> 5.71 (4.1946)
crypto-md5.js	216 -> 212 (1.8519)	33.57 -> 10.53 (68.6327)
crypto-sha1.js	156 -> 152 (2.5641)	15.1 -> 5.71 (62.1854)
date-format-xparb.js	104 -> 96 (7.6923)	2.03 -> 1.82 (10.3448)
math-cordic.js	48 -> 48 (0)	4.25 -> 4.23 (0.4706)
math-partial-sums.js	40 -> 40 (0)	2.55 -> 2.5 (1.9608)
math-spectral-norm.js	52 -> 52 (0)	3.22 -> 3.18 (1.2422)
string-base64.js	192 -> 164 (14.5833)	235.22 -> 38.6 (83.5898)
string-fasta.js	60 -> 60 (0)	5.54 -> 4.72 (14.8014)
Geometric mean:	RSS reduction: 2.2246%	Speed up: 21.6495%

egavrin · 2015-10-20T18:08:56Z

3d-raytrace.js fails for me. Please check your measurements.

egavrin@ubuntu:~/jr$ ./jr-native-str ../jsperf/sunspider-1.0.2/3d-raytrace.js 
egavrin@ubuntu:~/jr$ echo $?
1

Measurements on rasp pi 2:

$ clear; setarch linux32 -R  ../tools/run-perf-test.sh ./jr-master ./jr-native-str 5 5000 ../jsperf/sunspider-1.0.2/

Benchmark	RSS (+ is better)	Perf (+ is better)
3d-cube.js	132-> 136 (-3.030)	3.676-> 3.652 (0.653)
3d-morph.js
3d-raytrace.js
access-binary-trees.js	88-> 88 (0.000)	2.712-> 2.724 (-0.442)
access-fannkuch.js	40-> 40 (0.000)	9.824-> 9.772 (0.529)
access-nbody.js	64-> 64 (0.000)	4.608-> 4.604 (0.087)
access-nsieve.js
bitops-3bit-bits-in-byte.js	36-> 36 (0.000)	3.272-> 3.236 (1.100)
bitops-bits-in-byte.js	36-> 36 (0.000)	4.448-> 4.452 (-0.090)
bitops-bitwise-and.js	32-> 32 (0.000)	4.288-> 4.164 (2.892)
bitops-nsieve-bits.js
controlflow-recursive.js	224-> 224 (0.000)	3.196-> 3.172 (0.751)
crypto-aes.js	156-> 148 (5.128)	5.928-> 5.68 (4.184)
crypto-md5.js	208-> 200 (3.846)	31.132->10.072 (67.647)
crypto-sha1.js	148-> 148 (0.000)	13.9-> 5.54 (60.144)
date-format-tofte.js
date-format-xparb.js	100-> 88 (12.000)	2.012-> 1.808 (10.139)
math-cordic.js	48-> 48 (0.000)	4.852-> 4.836 (0.330)
math-partial-sums.js	40-> 40 (0.000)	2.544-> 2.616 (-2.830)
math-spectral-norm.js	52-> 52 (0.000)	3.172-> 3.176 (-0.126)
regexp-dna.js
string-base64.js	208-> 200 (3.846)	216.588->34.792 (83.936)
string-fasta.js	56-> 52 (7.143)	5.52-> 4.692 (15.000)
string-tagcloud.js
string-unpack-code.js
string-validate-input.js
Geometric mean:	RSS reduction: 1.7676%	Speed up: 21.9761%

This patch greatly improves performance, nice work!
Allocation of strings with continuous native array increases fragmentation, so engine won't be able to allocate new strings if there is not enough continuous heap space. Since. our primary concern is reduction of memory consumption and fragmentation, I'd propose to mix both approaches and implement strings on top of recordset, which effectively manages non-continuous memory.

Heap state before allocation of string:

+---------------------------------------+
|-------|       |-------|       |-------|
|-------|       |-------|       |-------|
+---------------------------------------+

Current implementation:

              +-----+-----+
          +---+     |     +---+
          |   +-----+-----+   |
         Fits.               Fits.
          |                   |
+---------v-------------------v---------+
|-------|       |-------|       |-------|
|-------|       |-------|       |-------|
+---------------------------------------+

PR's implementation:

              +-----------+
          +---+           +---+
          |   +-----------+   |
    Does not fit.        Does not fit.
          |                   |
+---------v-------------------v---------+
|-------|       |-------|       |-------|
|-------|       |-------|       |-------|
+---------------------------------------+

egavrin · 2015-10-20T18:32:07Z

a = []

for (i = 0; i < 3000; i++)
{
   a[i] = i + '.';
}

Master: OK
PR: OUT_OF_MEMORY

egavrin · 2015-10-20T18:40:56Z

Mem stats for the following code:

a = []

for (i = 0; i < 1000; i++)
{
   a[i] = i + '.';
}

./jerry.master --mem-stats ./t.js ; echo $?
Heap stats:
   Heap size = 258048 bytes
   Chunk size = 64 bytes
   Allocated chunks count = 0
   Allocated = 0 bytes
   Waste = 0 bytes
   Peak allocated chunks count = 636
   Peak allocated = 40632 bytes
   Peak waste = 205 bytes

Pools stats:
  Chunk size: 8
   Pools: 0
   Allocated chunks: 0
   Free chunks: 0
   Peak pools: 631
   Peak allocated chunks: 5041

./jerry.new --mem-stats ./t.js ; echo $?
Heap stats:
   Heap size = 258048 bytes
   Chunk size = 64 bytes
   Allocated chunks count = 0
   Allocated = 0 bytes
   Waste = 0 bytes
   Peak allocated chunks count = 1386
   Peak allocated = 33469 bytes
   Peak waste = 55235 bytes

Pools stats:
  Chunk size: 8
   Pools: 0
   Allocated chunks: 0
   Free chunks: 0
   Peak pools: 380
   Peak allocated chunks: 3039

The most important difference is the following, waste is 2 times higher than allocated memory:

                                        master -> new
   Peak allocated chunks count =           636 -> 1386
   Peak allocated =                40632 bytes -> 33469 bytes
   Peak waste =                      205 bytes -> 55235 bytes

ruben-ayrapetyan · 2015-10-21T14:32:01Z

@dbatyai, could you, please, describe causes of the performance / memory consumption changes and also try to figure out ways to prevent fragmentation and reduce amount of waste space?

Maybe, improvement of "literal storage" and switching to it for storage of strings can solve the issues. Could you, please, share your opinion about this?

JerryScript-DCO-1.0-Signed-off-by: Dániel Bátyai [email protected]

dbatyai · 2015-10-27T16:16:44Z

Most of the performance gain comes from storing the length of strings, and not having to iterate over them every time it's needed.
Using a continuous heap space for storing strings also caused notable improvement, since the previous char collection had significant overhead both in memory usage (2 bytes form each 8 byte chunk was used by a compressed pointer) and performance (copying character from and into the collection).
The least significant part is that this way string concatenation no longer requires intermediate buffers, instead it can be done directly inside the space of the new string.

The cause of the large amount of wasted memory is that blocks are allocated as 64 byte chunks, and since short strings don't need that much, the rest will be unused.
Using literal storage could solve this issue, as it manages this unused space pretty well. The problem is that is currently does not support freeing allocated records, and as far as I can tell, this would require serious changes. Also, we could no longer treat strings as continuous data.

Adding a special case for shorter strings, and only allocating a pool chunk if they fit into it could help in the test case mentioned by @egavrin , but there would still be other cases where wasted space would be large.
On the other hand, using smaller heap chunks would reduce the waste effectively, but that could affect other parts of the engine as well. (A little radical, but with 16 byte heap chunks the allocated memory with the patch applied was less than the current master, even with the additional waste.)

What do you think?

ruben-ayrapetyan · 2015-10-27T16:47:03Z

Most of the performance gain comes from storing the length of strings

Length of string is in unit_number (see ecma_new_chars_collection).

/**
 * Description of a collection's header.
 */
typedef struct
{
  /** Number of elements in the collection */
  ecma_length_t unit_number;

  /** Compressed pointer to first chunk with collection's data */
  mem_cpointer_t first_chunk_cp;

  /** Compressed pointer to last chunk with collection's data */
  mem_cpointer_t last_chunk_cp;
} ecma_collection_header_t;

dbatyai · 2015-10-27T16:58:21Z

unit_number stored the size of the string, see ecma-helpers-string.cpp:73.
The actual length, as in number of unicode characters, was not stored previously.
Instead, ecma_get_chars_collection_length (ecma-helpers-string.cpp:113) was called every time it was requested.

ruben-ayrapetyan · 2015-10-27T17:09:04Z

@dbatyai, I see. Thanks. Could we add the length to beginning of a string's storage?

egavrin · 2015-10-27T17:20:18Z

The problem is that is currently does not support freeing allocated records, and as far as I can tell, this would require serious changes.

I would say that is big, but needed change regardless the string optimization.

16 byte heap chunks the allocated memory with the patch applied was less than the current master

Could you share the exact measurements for sunspider using both run-perf-test and mem_stats?
My current understanding is that usage of 16 byte chunks will decrease performance, so we could try increase chunk size to 128 byte.

LaszloLango · 2016-02-10T07:39:12Z

@dbatyai, what about this PR? The initial results were really promising. Could you update your patch?

dbatyai · 2016-02-11T17:18:08Z

Closing, as this is no longer relevant.

dbatyai added the enhancement An improvement label Oct 20, 2015

dbatyai added this to the Engine optimization & enhancement milestone Oct 20, 2015

egavrin assigned ruben-ayrapetyan Oct 20, 2015

ruben-ayrapetyan assigned dbatyai and unassigned ruben-ayrapetyan Oct 21, 2015

Use heap allocated buffer instead of char collection in ecma_strings

62d400d

JerryScript-DCO-1.0-Signed-off-by: Dániel Bátyai [email protected]

dbatyai force-pushed the string_heap_buffer branch from 304509d to 62d400d Compare October 27, 2015 16:17

dbatyai assigned ruben-ayrapetyan and unassigned dbatyai Oct 27, 2015

ruben-ayrapetyan assigned egavrin and unassigned ruben-ayrapetyan Oct 27, 2015

egavrin assigned dbatyai and unassigned egavrin Oct 28, 2015

dbatyai mentioned this pull request Feb 11, 2016

New Allocator and improved String handling. #874

Merged

dbatyai closed this Feb 11, 2016

dbatyai deleted the string_heap_buffer branch March 1, 2016 11:50

zherczeg mentioned this pull request Jul 20, 2016

Nominating Dániel Bátyai (dbatyai) for JerryScript Maintainer status #1220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use heap allocated buffer instead of char collection in ecma_strings #678

Use heap allocated buffer instead of char collection in ecma_strings #678

dbatyai commented Oct 20, 2015

egavrin commented Oct 20, 2015

egavrin commented Oct 20, 2015

egavrin commented Oct 20, 2015

ruben-ayrapetyan commented Oct 21, 2015

dbatyai commented Oct 27, 2015

ruben-ayrapetyan commented Oct 27, 2015

dbatyai commented Oct 27, 2015

ruben-ayrapetyan commented Oct 27, 2015

egavrin commented Oct 27, 2015

LaszloLango commented Feb 10, 2016

dbatyai commented Feb 11, 2016

Use heap allocated buffer instead of char collection in ecma_strings #678

Use heap allocated buffer instead of char collection in ecma_strings #678

Conversation

dbatyai commented Oct 20, 2015

egavrin commented Oct 20, 2015

egavrin commented Oct 20, 2015

egavrin commented Oct 20, 2015

ruben-ayrapetyan commented Oct 21, 2015

dbatyai commented Oct 27, 2015

ruben-ayrapetyan commented Oct 27, 2015

dbatyai commented Oct 27, 2015

ruben-ayrapetyan commented Oct 27, 2015

egavrin commented Oct 27, 2015

LaszloLango commented Feb 10, 2016

dbatyai commented Feb 11, 2016