-
-
Notifications
You must be signed in to change notification settings - Fork 699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvement on the scalar / random bitpacker code. #1781
Conversation
659bf93
to
e0d93c0
Compare
return 0u64; | ||
} | ||
let addr_in_bits = idx * self.num_bits as u32; | ||
let addr_in_bits = idx * self.num_bits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if dynamic dispatch on a compile time num_bits for all 64 variants would be a benefit for performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the usage sites, it looks like the compiler will have a hard time inferring a constant, hence I would suspect that it could help. But I suspect it would also need some kind of type registry to provide access to &'static dyn
without boxing?
Finally, if the number of actually used instance is really small, it might be better to use "dictionary dispatch", e.g. have a match
on the bit width and branch into compile-time specializations with known bit widths, c.f. https://lwn.net/Articles/815908/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if what I meant was clear, when opening you'd get a typed version
pub fn new(num_bits: u8) -> Arc<dyn BitUnpackerTrait>{
match num_bits {
1 => Arc::new(BitUnpacker::<1>),
2 => Arc::new(BitUnpacker::<2>),
...
}
}
That would limit inlining and we have a dynamic dispatch, but it would allow optimization in the get
method and reduce the number of cycles there.
https://godbolt.org/z/ch4Wsjaf6
A match
would also be an alternative worth looking into.
With specialized versions, we don't need the padding anymore, since we can read a variable number of bytes, depending on num_bits
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if what I meant was clear, when opening you'd get a typed version
This is exactly how I understood your suggestion. I assumed a registry type data structure to avoid allocating the Arc
/Box
/etc every time and handing out &'static dyn BitUnpackerTrait
instead.
Is there are inherent limit on the possible values of num_bits
? I don't think eagerly producing 255 monomorphisations and matching on num_bits
to dispatch into them is reasonable, but IDK 32 might be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect a regression but who knows.
That's not a bad idea., at least it could help removing the 0 special case if there are no regressions...
and with the bit=1 case we could also use it to build an adhoc column value for booleans.
Added proptesting Added simple benchmark Added assert and comments on the very non trivial hidden contract Remove the need for an extra padding. The last point introduces a small performance regression (~10%).
e0d93c0
to
e3e46e1
Compare
* Improvement on the scalar / random bitpacker code. Added proptesting Added simple benchmark Added assert and comments on the very non trivial hidden contract Remove the need for an extra padding. The last point introduces a small performance regression (~10%). * Fixing unit tests
Added proptesting
Added simple benchmark
Added assert and comments on the very non trivial hidden contract Remove the need for an extra padding.
The last point introduces a small performance regression (~10%).