regex-dna with &[u8] matching #31

llogiq · 2016-04-22T20:52:50Z

I have a gist where I use regex::bytes instead of plain regex. I'd have thought that it should be faster (because we can get rid of the UTF-8 check), but in fact it is slower. @BurntSushi there's probably some reason for that, just a heads-up; if the bytes variant should get faster we can retry the measurements.

On my system:

variant	wall	user	sys
UTF-8	1.52	3.94	0.33
bytes	2.23	4.50	0.35

The text was updated successfully, but these errors were encountered:

BurntSushi · 2016-04-22T21:29:47Z

Yeah, I noticed this a few weeks ago and though it was curious. I actually can't think of any obvious reason why it's slower. My (more extensive) benchmark suite says Regex::new and bytes::Regex::new have comparable performance. I'll look into it.

Another thing you could try is from_utf8_unchecked.

BurntSushi · 2016-04-22T23:00:46Z

Ahahahahaha. This is because replace_all in Regex uses push_str, which should be a memcpy. On the other hand, bytes::Regex uses extend, which seems to be slower, and therefore I'm guessing doesn't get optimized down to a memcpy. Nice.

BurntSushi · 2016-04-22T23:23:04Z

OK, once I fixed that in regex, the best of 3 for bytes::Regex was 0.59s and for Regex it was 0.61s. I figured it would be that small. It's barely perceptible and probably in the noise because I not-so-infrequently see both of them get up to 0.7s.

@llogiq

This was using `Vec::extend` to accumulate bytes in a buffer, but this compiles down to less efficient code than, say, `Vec::extend_from_slice`. However, that method is newly available as of Rust 1.6, so we do a small backport to regain performance. This bug was noticed by @llogiq here: TeXitoi/benchmarksgame-rs#31 In particular, this increases the performance of bytes::Regex two-fold on that benchmark.

llogiq · 2016-04-23T05:34:49Z

Ok then I'm going to wait for a new Regex release so I can submit it without complicating our makefile. Thanks @BurntSushi!

BurntSushi mentioned this issue Apr 22, 2016

Fixes a performance bug in bytes::Regex::replace. rust-lang/regex#210

Merged

llogiq mentioned this issue Apr 27, 2016

moved regex_dna to u8 matching #32

Merged

TeXitoi closed this as completed in #32 May 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regex-dna with &[u8] matching #31

regex-dna with &[u8] matching #31

llogiq commented Apr 22, 2016

BurntSushi commented Apr 22, 2016

BurntSushi commented Apr 22, 2016

BurntSushi commented Apr 22, 2016

llogiq commented Apr 23, 2016

regex-dna with &[u8] matching #31

regex-dna with &[u8] matching #31

Comments

llogiq commented Apr 22, 2016

BurntSushi commented Apr 22, 2016

BurntSushi commented Apr 22, 2016

BurntSushi commented Apr 22, 2016

llogiq commented Apr 23, 2016