-
Notifications
You must be signed in to change notification settings - Fork 676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use bit vector to store CESU-8 lookup table, #794
Conversation
This idea seems clever. However, I don't think it is little endian specific. The value is stored in a 32 bit register, and endianness is irrelevant for values in registers. |
RP2:
|
@@ -757,6 +757,17 @@ lit_utf8_string_code_unit_at (const lit_utf8_byte_t *utf8_buf_p, /**< utf-8 stri | |||
return code_unit; | |||
} /* lit_utf8_string_code_unit_at */ | |||
|
|||
/* CESU-8 number of bytes occupied lookup table */ | |||
#ifndef __LITTLE_ENDIAN | |||
const __attribute__ ((aligned (CESU_8_TABLE_MEM_ALIGNMENT))) lit_utf8_byte_t table[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the alignment attribute?
@zherczeg , yes, you are right. Updated. |
On x86, most of performance gains from 'inline' lit_get_unicode_char_size_by_utf8_first_byte() --- a/jerry-core/lit/lit-strings.cpp -lit_utf8_size_t the performance is still pretty good. OS, ubuntu 15.04, 32 bit
Wed Jan 6 11:57:36 EST 2016 Maybe lit_get_unicode_char_size_by_utf8_first_byte() is not bottleneck on ARM? @egavrin , do you turn LTO on with RPi2? |
It is strange, that this change improves x86 by 4% but it is negligible on ARM. Regardless, I like this optimization, and I have no objection for landing it. However, it would be good to figure out why the difference is so big. Any idea? |
on Raspberry Pi 2 Just inline lit_get_unicode_char_size_by_utf8_first_byte() --- a/jerry-core/lit/lit-strings.cpp -lit_utf8_size_t
ARM benefits from inline too. |
Raspberry Pi 2 Not inline, use bit vector version -+lit_utf8_size_t attribute ((noinline)) return (cesu_8_store >> shift) & 0x3;
It shows bit vector hurts ARM performance a lot. Here is the objdump of bitvector version, 0000dec4 <_Z44lit_get_unicode_char_size_by_utf8_first_byteh.9439>: This is the original version, I am not familiar with ARM, is this ldr damages performance? |
Yes, since it is on the constant pool, and that load probably wastes a whole cache line. I suspect this is an agressive "size" optimization, which hurts performance too much. ARM could produce this constant with two 4 byte long instruction (8 byte altogether), but it only takes 6 byte this way. However I don't think this optimization worth at all. Is this ASM block helps?
|
uint32_t cesu_8_store; does not help. |
How about attr_always_inline_ , and just use bit vector for x86? lit_get_unicode_char_size_by_utf8_first_byte is a small function, not called everywhere. Raspberry Pi 2, run perf.sh 6 times
Sun 22 Nov 21:36:59 UTC 2015 |
It is better just to inline |
LGTM. |
- inline JerryScript-DCO-1.0-Signed-off-by: Xin Hu [email protected]
lgtm |
Merged (7255d64) |
JerryScript-DCO-1.0-Signed-off-by: Xin Hu [email protected]
Run ./tools/run-perf-test.sh 10 times,
OS, ubuntu 15.04, 32 bit
CPU, Intel(R) Celeron(R) CPU N2820 @ 2.13GHz, 2 core
(+ is better)
(+ is better)
Wed Dec 30 20:40:51 EST 2015
Another one for lit_get_unicode_char_size_by_utf8_first_byte.
Use bit vector to store the USEC-8 length table.
Since the value is 0,1,2,3, we can use two bits to represent one table item.
The whole set is 16*2 = 32 bit, that is just using one uint32 to store the USEC-8 table.
This way squeeze 12 bytes comparing to uint8 table version.
The average performance seems fine. crypto-md5 , crypto-sh1.js and string-base64.js are really good, but several others are not that good comparing to #793
Hard to say which one is better.