-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce unnecessary Vec allocations and indirections #77
Conversation
c4a996a
to
9f44242
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see some performance improvements here! Ultimately, const generics (#19) would go further into this direction.
70194d7
to
038d597
Compare
e2f0650
to
b5c8f04
Compare
b1a798a
to
a010cc0
Compare
I found that it was faster to just recreate stack-allocated arrays in |
885a14f
to
b90b484
Compare
b90b484 removes the explicit bounds checks in Looks like clippy is failing on an unrelated file with a new lint, so I'm going to leave that as is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed the lints in #80, please rebase and re-run the workflows.
self.pos_decoders = [0x400; 115]; | ||
self.is_match = [0x400; 192]; | ||
self.is_rep = [0x400; 12]; | ||
self.is_rep_g0 = [0x400; 12]; | ||
self.is_rep_g1 = [0x400; 12]; | ||
self.is_rep_g2 = [0x400; 12]; | ||
self.is_rep_0long = [0x400; 192]; | ||
self.state = 0; | ||
self.rep.fill(0); | ||
self.rep = [0; 4]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this brings any readability benefit over fill
. The compiler should be able to optimize both versions of the code regardless.
I see you however observed otherwise:
I found that it was faster to just recreate stack-allocated arrays in DecoderState than bother with filling them. Heap-backed arrays still use fill to keep their allocation.
Can you add a comment here to explain that?
(IMO, this might also be worth filing a bug to the Rust compiler, to understand if fill
being slower is intentional or not a missed optimization somewhere in the compiler.)
b90b484
to
8e08e96
Compare
Addressed nits, and reformatted imports as per #78 (comment) as well. |
583006a
to
58360eb
Compare
58360eb
to
c80e4aa
Compare
086d897
to
444e68c
Compare
Accidentally closed because of a faulty rebase, 444e68c should be properly rebased on current master now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've re-run the benchmarks, which show no significant change. However this is still a valuable cleanup - assuming the Vec2D
implementation is correct with all the needed bounds checks!
* Changed literal_probs array from a Vec<Vec<u16>> to a Vec2D backed by a contiguous allocation * BitTrees in LenDecoder and DecoderState are now stored inline. The actual BitTree data still lives in a Vec but one level of indirection is reduced. * Don't bother with filling stack-allocated DecoderState arrays on reset, and just recreate the arrays dropping the existing ones.
444e68c
to
48b66fa
Compare
Fixed nits in 48b66fa |
Pull Request Overview
Vec<Vec<u16>>
to a Vec2D backed by a contiguousVec<u16>
.slice::fill
. The heap-allocated arrays still usefill
to keep their allocation.This gives a small but free performance boost and reduces overall allocations.
6e1f0d7
After this PR (notice the tighter variance in particular)
Testing Strategy
This pull request was tested by...