-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10636: [Rust][Parquet] Switch to Rust Stable by removing specialization in parquet #8698
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format?
See also: |
@@ -95,6 +96,12 @@ impl From<Vec<u32>> for Int96 { | |||
} | |||
} | |||
|
|||
impl fmt::Display for Int96 { | |||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | |||
write!(f, "{:?}", self.data()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Maybe as a followup this should print the number rather than its raw bytes?
94f58d0
to
ba7d30b
Compare
I think I have a solution with |
Thanks @GregBowyer for the PR. It's will be awesome if we can get this done. I think before merging this we should get some benchmark results showing the diff before & after. We use to have some benchmark in the old repo but haven't port them to arrow yet. |
I think I pulled all the transmutes out so thats good
I will just pull them in tomorrow and get some baselines, I dont think it will be that hard. I doubt there will be anything here for performance. I mean there are some more branches now, but I suspect the predict effortlessly, and on the flip side we are not chasing things around the I-Cache so I suspect its a wash in the end. However better checked than assumed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked carefully at the first half of this PR and it looks quite good. By my reading this has also removed the use of transmute
Assuming this looks good from the benchmark perspective as mentioned by @sunchao I think this is a great step forward. Thank you @GregBowyer
impl_from_raw!(f32, self => { Err(general_err!("Type cannot be converted to i64")) }); | ||
impl_from_raw!(f64, self => { Err(general_err!("Type cannot be converted to i64")) }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impl_from_raw!(f32, self => { Err(general_err!("Type cannot be converted to i64")) }); | |
impl_from_raw!(f64, self => { Err(general_err!("Type cannot be converted to i64")) }); | |
impl_from_raw!(f32, self => { Err(general_err!("Type f32 cannot be converted to i64")) }); | |
impl_from_raw!(f64, self => { Err(general_err!("Type f64 cannot be converted to i64")) }); |
Here is a potential contribution to this effort: #8708 (a PR with the bench marks ported -- fyi @GregBowyer ). To run:
|
I do not have time to review this, but having tried this myself once (and failed miserably), I am just leaving a big thank you note to @GregBowyer for this 😍 |
As per benchmarking this with the changes in alamb#2 this is slower than specialisation (I ran benchmarks a lot as its very noisy). I have solutions for the speed in encoding (boolean and int96 are to be solved but shouldn't be hard) I will work on decoding shortly. |
I have been working on this w.r.t performance, I think I have most parts performing better than the original. I am running off clean benchmarks right now to validate. There are a few ancillary tweaks I have made (I will call out in the PR review) that gain some additional performance (I spotted some pipeline hazards in |
I am very excited to see it. Thank you so much @GregBowyer |
6387b54
to
10fd951
Compare
.github/workflows/rust.yml
Outdated
@@ -50,7 +50,10 @@ jobs: | |||
strategy: | |||
fail-fast: false | |||
matrix: | |||
rust: [nightly-2020-11-24] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is still in-flight, but the better strategy here is to run the Parquet tests with cargo +stable test
as we also add the stable toolchain to compile arrow with. You might need to rebase to see that in the dev scripts, as this change was added last week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with that, I am fighting the CI anyhow and any assistance is welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll submit a PR on your branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the safest option is to revert the CI changes, then we can add parquet to stable tests separately in #8821
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes reverted, I will probably need help getting the small CI parts fixed up
918dcbd
to
f8f9749
Compare
Ok review comments been addressed. I seperated out the benchmark code from this PR and rebased it to match comment style in git. I think when CI can be made to function I am good with the PR, I am thinking of following up with a round of performance tuning in compression and delta_fixed_bin |
Codecov Report
@@ Coverage Diff @@
## master #8698 +/- ##
==========================================
+ Coverage 76.81% 84.28% +7.47%
==========================================
Files 181 194 +13
Lines 40985 47951 +6966
==========================================
+ Hits 31483 40417 +8934
+ Misses 9502 7534 -1968
Continue to review full report at Codecov.
|
Thanks a lot, @GregBowyer ! I think that this is almost done here. Two comments:
|
A small update: the CI runs successfully for everything except Clippy (issue unrelated to this PR) 🎉 🎉🎉 I fielded #8909 that addresses the Clippy warning. I see two options:
Let me know what you prefer, @GregBowyer 😃 |
I suggest merging this PR as is and then putting in the CI change as a follow on PR (this one is already large and long outstanding). Just let us know @GregBowyer |
Remove specialization from parquet-rs. This allows the codebase to be compiled with stable
aeda0f2
to
816b129
Compare
I am ok merging as is and putting a fixup in later. I pushed commits that should not alter the testing submodule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this one more once over and it (still looks good to me). Merging this in
Filed https://issues.apache.org/jira/browse/ARROW-10929 to track migrating CI to stable rust |
I did one last local test run after merging from master to make sure the tests passed. They did. So I'll also send a note over to the dev channel. Again thanks so much for this work @GregBowyer cc @sunchao @andygrove @jorgecarleitao @vertexclique @jhorstmann @Dandandan |
I filed https://issues.apache.org/jira/browse/ARROW-10931 to track follow on work to improve compressors |
Wow, awesome work everyone 💯 |
Update the docs with respect to changes after #8698 Closes #8931 from alamb/alamb/ARROW-10933-update-docs Authored-by: Andrew Lamb <[email protected]> Signed-off-by: Andy Grove <[email protected]>
This is a cherry-pick of https://github.com/jorgecarleitao/arrow/commit/ca66d6d945e265dd2c83464bd80ff1dd7d231f7c by @jorgecarleitao It runs all tests except the simd using `stable` -- The SIMD feature still require nightly rust, but the default features do not (after #8698) Update: It also silences a few clippy lints which start complaining on stable -- I'll comment inline Closes #8930 from alamb/ARROW-10929-stable-ci Lead-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Jorge C. Leitao <[email protected]> Signed-off-by: Jorge C. Leitao <[email protected]>
Update the docs with respect to changes after apache/arrow#8698 Closes #8931 from alamb/alamb/ARROW-10933-update-docs Authored-by: Andrew Lamb <[email protected]> Signed-off-by: Andy Grove <[email protected]>
This is a cherry-pick of https://github.com/jorgecarleitao/arrow/commit/ca66d6d945e265dd2c83464bd80ff1dd7d231f7c by @jorgecarleitao It runs all tests except the simd using `stable` -- The SIMD feature still require nightly rust, but the default features do not (after apache/arrow#8698) Update: It also silences a few clippy lints which start complaining on stable -- I'll comment inline Closes #8930 from alamb/ARROW-10929-stable-ci Lead-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Jorge C. Leitao <[email protected]> Signed-off-by: Jorge C. Leitao <[email protected]>
…lization in parquet https://issues.apache.org/jira/browse/ARROW-10636 This is a very initial attempt at removing the specialization features from the Rust Parquet implementation. The specialisation is too complex to be covered by `min_specialization` and requires a bit of reworking in the crate. Right now the code dispatches in sub-traits and methods on the Parquet type, and uses a combination of trait abuse, macros and transmutes to eliminate the feature. I have broken this up into several commits ranging from the simplest removals (which could probably be taken fairly easily) to the most ugly and complex. I am not stoked on the `transmute` abuse, and I think another take (or follow up) should be taken to remove as many as possible in the code. The general trait for `DataType::T` has been made a private sealed trait to make it impossible to implement external to the Parquet crate, this is intentional as I dont think many of the public interfaces are sensible for end users to be able to implement. # TODO: - [x] Purge the added `std::mem::transmute`s if possible - [x] Refine and rationalise the `unimplemented!` implementations - [x] Performance test? - [x] Rebase & Relabel commits with JIRA number Closes apache#8698 from GregBowyer/remove-rust-specialization Authored-by: Greg Bowyer <[email protected]> Signed-off-by: Andrew Lamb <[email protected]>
Update the docs with respect to changes after apache#8698 Closes apache#8931 from alamb/alamb/ARROW-10933-update-docs Authored-by: Andrew Lamb <[email protected]> Signed-off-by: Andy Grove <[email protected]>
This is a cherry-pick of https://github.com/jorgecarleitao/arrow/commit/ca66d6d945e265dd2c83464bd80ff1dd7d231f7c by @jorgecarleitao It runs all tests except the simd using `stable` -- The SIMD feature still require nightly rust, but the default features do not (after apache#8698) Update: It also silences a few clippy lints which start complaining on stable -- I'll comment inline Closes apache#8930 from alamb/ARROW-10929-stable-ci Lead-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Jorge C. Leitao <[email protected]> Signed-off-by: Jorge C. Leitao <[email protected]>
https://issues.apache.org/jira/browse/ARROW-10636
This is a very initial attempt at removing the specialization features from the Rust Parquet implementation.
The specialisation is too complex to be covered by
min_specialization
and requires a bit of reworking in the crate.Right now the code dispatches in sub-traits and methods on the Parquet type, and uses a combination of trait abuse, macros and transmutes to eliminate the feature.
I have broken this up into several commits ranging from the simplest removals (which could probably be taken fairly easily) to the most ugly and complex.
I am not stoked on the
transmute
abuse, and I think another take (or follow up) should be taken to remove as many as possible in the code.The general trait for
DataType::T
has been made a private sealed trait to make it impossible to implement external to the Parquet crate, this is intentional as I dont think many of the public interfaces are sensible for end users to be able to implement.TODO:
std::mem::transmute
s if possibleunimplemented!
implementations