Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply gcd on fastfield as preprocessing #1418

Merged
merged 9 commits into from
Jul 29, 2022
Merged

apply gcd on fastfield as preprocessing #1418

merged 9 commits into from
Jul 29, 2022

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Jul 20, 2022

No description provided.

@PSeitz PSeitz force-pushed the gcd_encoding branch 2 times, most recently from 2f6daa2 to 4592fc7 Compare July 20, 2022 13:12
src/fastfield/reader.rs Outdated Show resolved Hide resolved
src/fastfield/reader.rs Outdated Show resolved Hide resolved
pub fn open_from_id(
bytes: OwnedBytes,
id: u8,
gcd: u64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm so you choose to make gcd somewhat special... Could it have been just another Codec? (well wrapping another codec)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently it's its own layer, e.g.

Create: i64 -> convert to u64 -> find/apply gcd -> detect/serialize codec
Read: read u64 from codec -> apply gcd -> convert to i64

When we move it into the codec, every codec needs to be gcd aware.

On the read side, the codecs are already wrapped in FastFieldReaderCodecWrapper. We could have a separate wrapper like FastFieldReaderCodecGCDWrapper, but that would be a performance optimization.

Copy link
Collaborator

@fulmicoton fulmicoton Jul 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I was suggesting moving it as one codec.

- codec1
- codec2
- gcd<C: Codec>

I assumed you would do this because it seemed like the natural solution to have:

  • one dynamic dispatch
    as opposed to
  • having two (dynamic dispatch or if statement).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the read path that is.

src/fastfield/reader.rs Outdated Show resolved Hide resolved
src/fastfield/reader.rs Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jul 22, 2022

Codecov Report

Merging #1418 (ce8d6b2) into main (23fe73a) will increase coverage by 0.02%.
The diff coverage is 95.87%.

@@            Coverage Diff             @@
##             main    #1418      +/-   ##
==========================================
+ Coverage   94.22%   94.24%   +0.02%     
==========================================
  Files         236      237       +1     
  Lines       43778    44239     +461     
==========================================
+ Hits        41250    41694     +444     
- Misses       2528     2545      +17     
Impacted Files Coverage Δ
fastfield_codecs/src/lib.rs 96.87% <ø> (ø)
src/fastfield/reader.rs 87.80% <81.13%> (-1.28%) ⬇️
src/fastfield/serializer/mod.rs 91.48% <93.84%> (+0.18%) ⬆️
fastfield_codecs/src/bitpacked.rs 100.00% <100.00%> (ø)
fastfield_codecs/src/linearinterpol.rs 99.54% <100.00%> (ø)
fastfield_codecs/src/multilinearinterpol.rs 100.00% <100.00%> (ø)
src/fastfield/gcd.rs 100.00% <100.00%> (ø)
src/fastfield/mod.rs 94.76% <100.00%> (+0.34%) ⬆️
src/fastfield/multivalued/mod.rs 97.05% <100.00%> (+0.04%) ⬆️
src/fastfield/writer.rs 91.08% <100.00%> (+0.03%) ⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23fe73a...ce8d6b2. Read the comment docs.

src/fastfield/mod.rs Outdated Show resolved Hide resolved
break;
}
}
let mut gcd = (num1).gcd(num2);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the gcd crate returns on value 0, but we want gcd(0, n) = n

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the 0-filter this case is covered

}
gcd = gcd.gcd(val);
if gcd == 1 {
return None;
Copy link
Collaborator

@fulmicoton fulmicoton Jul 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return Some(1);

I think the Option<_> is mostly nice to return None when there are no element. L/ike min for instance.

#[cfg(test)]
mod tests {
use std::collections::HashMap;
use std::path::Path;
Copy link
Collaborator

@fulmicoton fulmicoton Jul 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add unit test for gcd....

in particular, what happens with

[0] -> Some(1)
[0, 10]-> Some(10)
[10, 0]-> Some(10)
[] -> None
[15,30,5,10] -> 5
[15,16,10] -> 1
[0,5,5,5] -> 5

src/fastfield/reader.rs Outdated Show resolved Hide resolved
let mut docs = vec![];
for i in 1..=num_vals {
let val = i as i64 * 1000i64;
docs.push(doc!(*FIELDI64=>val));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great you had a i64 test.

field_write: &mut CountingWriter<W>,
stats: FastFieldStats,
fastfield_accessor: impl FastFieldDataAccess,
iter1: impl Iterator<Item = u64>,
Copy link
Collaborator

@fulmicoton fulmicoton Jul 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you think of a way to ditch iter1/iter2 stuff as well as the Fn() -> impl Iterator stuff? (can be done in a separate ticket)

Would

impl IntoIterator for &SomeType

work? (the reference is Copy)...
or maybe can We afford materializing a Vec?

@PSeitz PSeitz merged commit fcc7bd7 into main Jul 29, 2022
@PSeitz PSeitz deleted the gcd_encoding branch July 29, 2022 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants