Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Eliminating some extra string copies #28

Merged
merged 2 commits into from
Mar 23, 2015
Merged

Eliminating some extra string copies #28

merged 2 commits into from
Mar 23, 2015

Conversation

omo
Copy link
Contributor

@omo omo commented Mar 19, 2015

Oops, I accidentally closed #27 so re-opening this...
@zcbenz WDYT?

thanks for the @hokein's effort.

The test has an excessive long line. Making this benchmark fast
will result faster parsing for some real-world peculiar files.
@omo
Copy link
Contributor Author

omo commented Mar 19, 2015

For the record: This aims to improve the situation around atom/atom#979

@omo
Copy link
Contributor Author

omo commented Mar 19, 2015

It hits the API incompatibility between node versions :-/ Looking...

bool HasMultibyteCharacters() const;
const char* utf8_value() const { return *utf8Value; }
size_t utf8_length() const { return utf8Value.length(); }
const wchar_t* utf16_value() const { return reinterpret_cast<const wchar_t*>(*utf16Value); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since utf16_value is only used on Windows, I think we can only define it on Windows too, in case we accidentally used utf16_value on POSIX in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Updated this PR to ifdef utf16 bits out.

OnigStringContext is an immutable object that encapsulates
various values which are derived from a v8String, including
a UTF8 copy, a UTF16 copy and a predicate.

OnigStringContext can be used as a cache of these values to
save extra computation like string copying and memory allocation
when the same V8 string is given to FindNextMatchSync() in
subsequent calls.

Speedup:
 * large.js sync:    482ms ->  366ms
 * large.js async:  5914ms -> 4971ms (These numbers are noisy)
 * oneline sync:   66624ms -> 1085ms
 * oneline async: N/A (took too long)

Although there are some rooms for improvement in the async case,
this overall seems good starting point to address slow-long-line
problem, where we've repeatedly allocated the long string.
@zcbenz
Copy link
Contributor

zcbenz commented Mar 23, 2015

This PR is awesome, thanks!

zcbenz added a commit that referenced this pull request Mar 23, 2015
Eliminating some extra string copies
@zcbenz zcbenz merged commit 8b1859d into atom:master Mar 23, 2015
@zcbenz
Copy link
Contributor

zcbenz commented Mar 23, 2015

[email protected] has been published.

@omo
Copy link
Contributor Author

omo commented Mar 23, 2015

\o/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants