Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex-dna with &[u8] matching #31

Closed
llogiq opened this issue Apr 22, 2016 · 4 comments · Fixed by #32
Closed

regex-dna with &[u8] matching #31

llogiq opened this issue Apr 22, 2016 · 4 comments · Fixed by #32

Comments

@llogiq
Copy link
Contributor

llogiq commented Apr 22, 2016

I have a gist where I use regex::bytes instead of plain regex. I'd have thought that it should be faster (because we can get rid of the UTF-8 check), but in fact it is slower. @BurntSushi there's probably some reason for that, just a heads-up; if the bytes variant should get faster we can retry the measurements.

On my system:

variant wall user sys
UTF-8 1.52 3.94 0.33
bytes 2.23 4.50 0.35
@BurntSushi
Copy link
Contributor

Yeah, I noticed this a few weeks ago and though it was curious. I actually can't think of any obvious reason why it's slower. My (more extensive) benchmark suite says Regex::new and bytes::Regex::new have comparable performance. I'll look into it.

Another thing you could try is from_utf8_unchecked.

@BurntSushi
Copy link
Contributor

Ahahahahaha. This is because replace_all in Regex uses push_str, which should be a memcpy. On the other hand, bytes::Regex uses extend, which seems to be slower, and therefore I'm guessing doesn't get optimized down to a memcpy. Nice.

@BurntSushi
Copy link
Contributor

OK, once I fixed that in regex, the best of 3 for bytes::Regex was 0.59s and for Regex it was 0.61s. I figured it would be that small. It's barely perceptible and probably in the noise because I not-so-infrequently see both of them get up to 0.7s.

BurntSushi added a commit to rust-lang/regex that referenced this issue Apr 22, 2016
This was using `Vec::extend` to accumulate bytes in a buffer, but this
compiles down to less efficient code than, say, `Vec::extend_from_slice`.
However, that method is newly available as of Rust 1.6, so we do a small
backport to regain performance.

This bug was noticed by @llogiq here: TeXitoi/benchmarksgame-rs#31
In particular, this increases the performance of bytes::Regex two-fold on
that benchmark.
@llogiq
Copy link
Contributor Author

llogiq commented Apr 23, 2016

Ok then I'm going to wait for a new Regex release so I can submit it without complicating our makefile. Thanks @BurntSushi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants