Eliminating some extra string copies #28

omo · 2015-03-19T13:18:50Z

Oops, I accidentally closed #27 so re-opening this...
@zcbenz WDYT?

@hokein

thanks for the @hokein's effort. The test has an excessive long line. Making this benchmark fast will result faster parsing for some real-world peculiar files.

omo · 2015-03-19T13:22:41Z

For the record: This aims to improve the situation around atom/atom#979

omo · 2015-03-19T13:28:44Z

It hits the API incompatibility between node versions :-/ Looking...

zcbenz · 2015-03-23T02:30:50Z

src/onig-string-context.h

+  bool HasMultibyteCharacters() const;
+  const char* utf8_value() const { return *utf8Value; }
+  size_t utf8_length() const { return utf8Value.length(); }
+  const wchar_t* utf16_value() const { return reinterpret_cast<const wchar_t*>(*utf16Value); }


Since utf16_value is only used on Windows, I think we can only define it on Windows too, in case we accidentally used utf16_value on POSIX in future.

Right. Updated this PR to ifdef utf16 bits out.

OnigStringContext is an immutable object that encapsulates various values which are derived from a v8String, including a UTF8 copy, a UTF16 copy and a predicate. OnigStringContext can be used as a cache of these values to save extra computation like string copying and memory allocation when the same V8 string is given to FindNextMatchSync() in subsequent calls. Speedup: * large.js sync: 482ms -> 366ms * large.js async: 5914ms -> 4971ms (These numbers are noisy) * oneline sync: 66624ms -> 1085ms * oneline async: N/A (took too long) Although there are some rooms for improvement in the async case, this overall seems good starting point to address slow-long-line problem, where we've repeatedly allocated the long string.

zcbenz · 2015-03-23T06:45:44Z

This PR is awesome, thanks!

Eliminating some extra string copies

zcbenz · 2015-03-23T06:48:22Z

[email protected] has been published.

omo · 2015-03-23T15:15:04Z

\o/

This test data is taken from atom/atom#979 (comment),

04eb161

thanks for the @hokein's effort. The test has an excessive long line. Making this benchmark fast will result faster parsing for some real-world peculiar files.

omo mentioned this pull request Mar 19, 2015

Eliminating some extra string copies #27

Closed

zcbenz reviewed Mar 23, 2015
View reviewed changes

zcbenz added a commit that referenced this pull request Mar 23, 2015

Merge pull request #28 from omo/string-context

8b1859d

Eliminating some extra string copies

zcbenz merged commit 8b1859d into atom:master Mar 23, 2015

winstliu mentioned this pull request Apr 18, 2017

first-mate not taking advantage of caching in Oniguruma atom/first-mate#93

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminating some extra string copies #28

Eliminating some extra string copies #28

omo commented Mar 19, 2015

omo commented Mar 19, 2015

omo commented Mar 19, 2015

zcbenz Mar 23, 2015

omo Mar 23, 2015

zcbenz commented Mar 23, 2015

zcbenz commented Mar 23, 2015

omo commented Mar 23, 2015

Eliminating some extra string copies #28

Eliminating some extra string copies #28

Conversation

omo commented Mar 19, 2015

omo commented Mar 19, 2015

omo commented Mar 19, 2015

zcbenz Mar 23, 2015

Choose a reason for hiding this comment

omo Mar 23, 2015

Choose a reason for hiding this comment

zcbenz commented Mar 23, 2015

zcbenz commented Mar 23, 2015

omo commented Mar 23, 2015