-
-
Notifications
You must be signed in to change notification settings - Fork 698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apply gcd on fastfield as preprocessing #1418
Conversation
2f6daa2
to
4592fc7
Compare
src/fastfield/reader.rs
Outdated
pub fn open_from_id( | ||
bytes: OwnedBytes, | ||
id: u8, | ||
gcd: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm so you choose to make gcd somewhat special... Could it have been just another Codec? (well wrapping another codec)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, currently it's its own layer, e.g.
Create: i64 -> convert to u64 -> find/apply gcd -> detect/serialize codec
Read: read u64 from codec -> apply gcd -> convert to i64
When we move it into the codec, every codec needs to be gcd aware.
On the read side, the codecs are already wrapped in FastFieldReaderCodecWrapper
. We could have a separate wrapper like FastFieldReaderCodecGCDWrapper
, but that would be a performance optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I was suggesting moving it as one codec.
- codec1
- codec2
- gcd<C: Codec>
I assumed you would do this because it seemed like the natural solution to have:
- one dynamic dispatch
as opposed to - having two (dynamic dispatch or if statement).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the read path that is.
Co-authored-by: Paul Masurel <[email protected]>
Codecov Report
@@ Coverage Diff @@
## main #1418 +/- ##
==========================================
+ Coverage 94.22% 94.24% +0.02%
==========================================
Files 236 237 +1
Lines 43778 44239 +461
==========================================
+ Hits 41250 41694 +444
- Misses 2528 2545 +17
Continue to review full report at Codecov.
|
src/fastfield/serializer/gcd.rs
Outdated
break; | ||
} | ||
} | ||
let mut gcd = (num1).gcd(num2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the gcd crate returns on value 0, but we want gcd(0, n) = n
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the 0-filter this case is covered
src/fastfield/serializer/gcd.rs
Outdated
} | ||
gcd = gcd.gcd(val); | ||
if gcd == 1 { | ||
return None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return Some(1);
I think the Option<_>
is mostly nice to return None when there are no element. L/ike min for instance.
src/fastfield/serializer/gcd.rs
Outdated
#[cfg(test)] | ||
mod tests { | ||
use std::collections::HashMap; | ||
use std::path::Path; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add unit test for gcd....
in particular, what happens with
[0] -> Some(1)
[0, 10]-> Some(10)
[10, 0]-> Some(10)
[] -> None
[15,30,5,10] -> 5
[15,16,10] -> 1
[0,5,5,5] -> 5
let mut docs = vec![]; | ||
for i in 1..=num_vals { | ||
let val = i as i64 * 1000i64; | ||
docs.push(doc!(*FIELDI64=>val)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great you had a i64 test.
field_write: &mut CountingWriter<W>, | ||
stats: FastFieldStats, | ||
fastfield_accessor: impl FastFieldDataAccess, | ||
iter1: impl Iterator<Item = u64>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you think of a way to ditch iter1/iter2 stuff as well as the Fn() -> impl Iterator stuff? (can be done in a separate ticket)
Would
impl IntoIterator for &SomeType
work? (the reference is Copy)...
or maybe can We afford materializing a Vec?
No description provided.