Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Allow parquet to be compiled without arrow (fix --no-default-features) #731

Merged
merged 4 commits into from
Sep 9, 2021

Conversation

Marwes
Copy link

@Marwes Marwes commented Aug 30, 2021

Resolves #733

--no-default-features is currently broken in the parquet crate due to
arrow being required. With some small tweaks it can be made entirely
optional.

Added some extra steps to catch when --no-default-features does not
work on CI as well.

Are there any user-facing changes?

BREAKING CHANGE

The items exported from parquet's test_common module is now only exported if the test feature is used.

`--no-default-features` is currently broken in the parquet crate due to
arrow being required. With some small tweaks it can be made entirely
optional.

Added some extra steps to catch when `--no-default-features` does not
work on CI as well.
@github-actions github-actions bot added the parquet Changes to the parquet crate label Aug 30, 2021
@alamb alamb changed the title fix: Allow parquet to be compiled without arrow fix: Allow parquet to be compiled without arrow (fix --no-default-features) Aug 31, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Marwes for the find

I think the CI is failing because, confusingly, cargo test compiles the crate without the test feature and then compiles the test running code with test

/// be a power of 2.
///
/// Copied from the arrow crate to make arrow optional
pub fn round_upto_power_of_2(num: usize, factor: usize) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was copied from arrow/src/util/bit_util.rs (which is fine, I am just pointing it out)

@@ -118,6 +118,9 @@ jobs:
cargo run --example dynamic_types
cargo run --example read_csv
cargo run --example read_csv_infer_schema
(cd parquet && cargo check --no-default-features)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 Thank you for adding the tests

@@ -60,6 +60,7 @@ serde_json = { version = "1.0", features = ["preserve_order"] }
[features]
default = ["arrow", "snap", "brotli", "flate2", "lz4", "zstd", "base64"]
cli = ["serde_json", "base64", "clap"]
test = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you call this feature something other than test (test_common perhaps) and add it to the default list the CI will work

diff --git a/parquet/Cargo.toml b/parquet/Cargo.toml
index 830e33505e..7ad02298b7 100644
--- a/parquet/Cargo.toml
+++ b/parquet/Cargo.toml
@@ -58,9 +58,9 @@ arrow = { path = "../arrow", version = "6.0.0-SNAPSHOT" }
 serde_json = { version = "1.0", features = ["preserve_order"] }
 
 [features]
-default = ["arrow", "snap", "brotli", "flate2", "lz4", "zstd", "base64"]
+default = ["arrow", "snap", "brotli", "flate2", "lz4", "zstd", "base64", "test_common"]
 cli = ["serde_json", "base64", "clap"]
-test = []
+test_common = []
 
 [[ bin ]]
 name = "parquet-read"
diff --git a/parquet/src/util/mod.rs b/parquet/src/util/mod.rs
index 2c653ceef9..94787a5b85 100644
--- a/parquet/src/util/mod.rs
+++ b/parquet/src/util/mod.rs
@@ -22,9 +22,9 @@ pub mod bit_util;
 mod bit_packing;
 pub mod cursor;
 pub mod hash_util;
-#[cfg(feature = "test")]
+#[cfg(feature = "test_common")]
 pub(crate) mod test_common;
-#[cfg(feature = "test")]
+#[cfg(feature = "test_common")]
 pub use self::test_common::page_util::{
     DataPageBuilder, DataPageBuilderImpl, InMemoryPageIterator,
 };

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this should not be part of the default features as it appears (to me at least) to be a module only used for tests, so end users shouldn't ever need the module.

@codecov-commenter
Copy link

codecov-commenter commented Aug 31, 2021

Codecov Report

Merging #731 (ea2151b) into master (deb31a0) will increase coverage by 0.01%.
The diff coverage is 86.66%.

❗ Current head ea2151b differs from pull request most recent head 1b9bafc. Consider uploading reports for the commit 1b9bafc to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #731      +/-   ##
==========================================
+ Coverage   82.51%   82.53%   +0.01%     
==========================================
  Files         168      168              
  Lines       47647    47658      +11     
==========================================
+ Hits        39317    39333      +16     
+ Misses       8330     8325       -5     
Impacted Files Coverage Δ
parquet/src/data_type.rs 77.32% <ø> (ø)
parquet/src/schema/printer.rs 74.65% <83.33%> (+1.95%) ⬆️
parquet/src/util/bit_util.rs 93.18% <100.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update deb31a0...1b9bafc. Read the comment docs.

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the change, and am happy with it

@jhorstmann
Copy link
Contributor

I just ran into this issue myself, would be great to get this merged.

The get_test_path function in test_common/file_util depends on arrow::util::test_util::parquet_test_data(), so maybe the test_common feature needs to include the arrow feature. Or if that is the only arrow usage than that method could maybe also be copied as it's just 4 lines.

@jhorstmann
Copy link
Contributor

And the arrow feature again depends on base64 for encoding/decoding the arrow schema

@alamb
Copy link
Contributor

alamb commented Sep 9, 2021

Thanks @Marwes . Sorry for the late review + merge. I have been away and am now catching up on PRs

@alamb alamb merged commit 715318e into apache:master Sep 9, 2021
@alamb
Copy link
Contributor

alamb commented Sep 9, 2021

merged @jhorstmann -- sorry for the delay

@alamb
Copy link
Contributor

alamb commented Sep 9, 2021

I don't plan to backport this to arrow 5.4 (it will be included in 6.0) so as to avoid causing downstream projects to break

@Marwes Marwes deleted the default-features branch September 9, 2021 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--no-default-features is broken for parquet
5 participants