Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Improve dfa matching #511

Closed
wants to merge 10 commits into from
Closed

perf: Improve dfa matching #511

wants to merge 10 commits into from

Conversation

Marwes
Copy link
Contributor

@Marwes Marwes commented Sep 7, 2018

Various improvements from profiling RegexSet::matches usage.

 name                old ns/iter    new ns/iter    diff ns/iter   diff %  speedup 
 misc::is_match_set  73 (342 MB/s)  76 (328 MB/s)             3    4.11%   x 0.96 
 misc::matches_set   857 (29 MB/s)  604 (41 MB/s)          -253  -29.52%   x 1.42 

@BurntSushi
Copy link
Member

Interesting! Nice improvements. Can you brainstorm ways of removing the new use of unsafe introduced in this PR? I think I buy your safety argument, but I also think I want to have a higher bar for introducing the use of unsafe. Does this improve specific use cases or do you expect it to have improvements across the board? I would guess the former off the top of my head.

@BurntSushi
Copy link
Member

@Marwes Also, if you're going to continue digging into more performance optimizations outside this PR, I'd like to request that we limit their scope to smaller changes. I am planning a complete rearchitecture of this crate and don't want to push much harder on the existing infrastructure.

@Marwes
Copy link
Contributor Author

Marwes commented Sep 7, 2018

Can you brainstorm ways of removing the new use of unsafe introduced in this PR?

In 1.21 there are impl<'a, T> From<&'a [T]> for Rc<[T]> which would make it possible to use a Rc pointer instead but that's the closest.

The safety considerations is the same for Pin/immovable values. It is fine to hold a reference to a heap allocated value as long as it does not get freed and in this case the heap values only gets freed after map keys.

Does this improve specific use cases or do you expect it to have improvements across the board?

I have no idea how much it improves for other cases but it should always be an improvement as it avoids an allocation and copy for every created state. It wasn't the most significant improvement however and if you expect that StateMap won't be needed after a re-factoring I might as well remove it.

Also, if you're going to continue digging into more performance optimizations outside this PR, I'd like to request that we limit their scope to smaller changes.

I am not planning on doing any further improvements. I have just been doing some experimentation from time to time and these were the changes that actually made a noticeable difference. So I just packed it all up today as something mergeable.

@Marwes
Copy link
Contributor Author

Marwes commented Sep 7, 2018

I am planning a complete rearchitecture of this crate and don't want to push much harder on the existing infrastructure.

I saw this while looking through some issues which is part of why I didn't want to let the branch get stale, hopefully at least some of the changes are agnostic enough to survive it.

@Marwes
Copy link
Contributor Author

Marwes commented Sep 25, 2018

ping @BurntSushi Is any of these changes still interesting? Should I remove the improvement involving unsafe?

@BurntSushi
Copy link
Member

@Marwes Thanks for pinging me on this. I think in order for me to be comfortable with the unsafe, I would need to do a more careful review of the code in addition to a performance analysis myself. Otherwise, it's too hard for me to judge whether it's worth it. So it might be worth getting rid of the unsafe.

@RReverser
Copy link
Contributor

While that particular unsafe usage is mostly safe and is relatively common unless one uses third-party wrapper like owning_ref, I think it might worth replacing it here with Rc (and correspondingly changing State::data).

The cost of increasing non-atomic counter will be pretty much negligible compared to everything else regex already does, and it would still save the memory in the same way unsafe &[u8] does in current code.

@Marwes
Copy link
Contributor Author

Marwes commented Sep 25, 2018

@RReverser I would just replace it with Rc in this case, except that would require regex to do a minimum version bump to 1.21 (from 1.20) so that may not be desirable. (Or it could be Rc<Box<[T]>> I suppose, a double indirection shouldn't be do bad)

In 1.21 there are impl<'a, T> From<&'a [T]> for Rc<[T]> which would make it possible to use a Rc pointer instead but that's the closest.

@RReverser
Copy link
Contributor

@Marwes Hmm, why do you need either &'a or Rc<Box<...>>? You can just change State::data from Box<[u8]> to Rc<[u8]>, use Rc<[u8]> as a key of HashMap too and then use Rc::clone(...) in insert.

@Marwes
Copy link
Contributor Author

Marwes commented Sep 25, 2018

Maybe I am forgetting something but I don't think there is a way to construct a variable sized Rc<[u8]> without that constructor?

@RReverser
Copy link
Contributor

Oh I was going to say you can convert it from Box but you're right, that conversion was also added only in 1.21... I suppose it's up to @BurntSushi then.

@Marwes
Copy link
Contributor Author

Marwes commented Sep 26, 2018

Re-ran the benchmarks on just the unsafe addition and there were only a ~1% speedup so I removed it. Using double indirection + Arc (Rc can't be used as it isn't Send) gave a slight performance degradation instead so that isn't viable either.

@RReverser
Copy link
Contributor

Ah, that's even better news.

@Marwes
Copy link
Contributor Author

Marwes commented Oct 5, 2018

Ping @BurntSushi (unsafe is removed losing only a small bit of the overall improvement)

@RReverser
Copy link
Contributor

RReverser commented Oct 8, 2018

@Marwes Does this still give 2x (?) memory saving after the unsafe removal?

@Marwes
Copy link
Contributor Author

Marwes commented Oct 8, 2018

Does this still give 2x (?) memory saving after the unsafe removal?

No, State { data: Box<[u8]> } is stored twice just as before this PR.

@RReverser
Copy link
Contributor

Hmm, maybe it's worth using Rc then after all? It was kind of nice improvement...

@BurntSushi
Copy link
Member

If the only thing stopping us from the nicer and safe solution is Rust 1.21, then I say we just go with that. I guess I can make the next release a 1.1.0 release and bump the minimum Rust version.

@Marwes
Copy link
Contributor Author

Marwes commented Oct 8, 2018

Actually, Arc/Rc requires two extra usize per state. I have not idea how large each State is, but if they are normally in the 16 bytes range the extra constant overhead might actually make it worse.

@Marwes
Copy link
Contributor Author

Marwes commented Oct 8, 2018

Added back the StateMap addition but with Arc instead of unsafe, bumped the minimum rust version to 1.21 and rebased to get rid of a conflict in .travis.yml.

Actually, Arc/Rc requires two extra usize per state. I have not idea how large each State is, but if they are normally in the 16 bytes range the extra constant overhead might actually make it worse.

Even the small strings in the added benchmarks yielded an average state size of > 30 bytes so it shouldn't be worse in memory use except perhaps in degenerate cases.

@Marwes Marwes force-pushed the bench_set branch 2 times, most recently from d4ad8b3 to e585739 Compare October 8, 2018 14:43
@RReverser
Copy link
Contributor

Is Arc as opposed to Rc really needed there? Isn't owning struct already accessed via thread-safe wrapper? If so, this would just add unnecessary overhead.

@BurntSushi
Copy link
Member

@RReverser You still need a Send bound, which is required by thread_local. I don't know off hand what it would take to drop the Send requirement here. It's been a while since I've sunk my head into thread_local.

@Marwes
Copy link
Contributor Author

Marwes commented Oct 9, 2018

Yeah, it is a bit odd that a thread local value needs Send but it seems like thread_local allows iterating over the the values of all the threads https://amanieu.github.io/thread_local-rs/thread_local/struct.ThreadLocal.html#method.iter_mut.

Also Amanieu/thread_local-rs#3

Failure looks spurious (network connectivity during setup).

@Marwes Marwes force-pushed the bench_set branch 2 times, most recently from a3590f7 to 01241bb Compare October 9, 2018 10:57
@Marwes
Copy link
Contributor Author

Marwes commented Oct 16, 2018

@BurntSushi This ought to be good now.

@Marwes
Copy link
Contributor Author

Marwes commented Oct 29, 2018

Ping @BurntSushi

@BurntSushi
Copy link
Member

@Marwes Yes, I know I'm on the hook for this. It's going to take me some time to get around to merging it.

@bors
Copy link
Contributor

bors commented Nov 6, 2018

☔ The latest upstream changes (presumably a3bccb1) made this pull request unmergeable. Please resolve the merge conflicts.

@BurntSushi
Copy link
Member

OK, I finally got around to looking at this. It took quite some time to review and fix up, but I've put up my revisions here: #540. Thanks @Marwes! I especially appreciate the persistence in finding a way to avoid unsafe. :-)

@Marwes Marwes deleted the bench_set branch December 1, 2018 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants