-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structify ConfigOptions (#4517) #4771
Conversation
@@ -236,9 +230,7 @@ impl ParquetExec { | |||
/// Return the value described in [`Self::with_enable_page_index`] | |||
fn enable_page_index(&self, config_options: &ConfigOptions) -> bool { | |||
self.enable_page_index | |||
.or_else(|| config_options.get_bool(OPT_PARQUET_ENABLE_PAGE_INDEX)) | |||
// default to false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer have implicit defaults all over the place 😄
.finish() | ||
/// A type-safe container for [`ConfigExtension`] | ||
#[derive(Debug, Default, Clone)] | ||
pub struct Extensions(BTreeMap<&'static str, ExtensionBox>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This formulation is specifically designed to be able to replace the AnyMap currently on SessionConfig. As an added bonus it no longer requires types to be interior-mutable
pub round_robin_repartition: bool, default = true | ||
|
||
/// Parquet options | ||
pub parquet: ParquetOptions, default = Default::default() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This nesting is a source of non-trivial additional complexity, but I made it work so at least we have support for this in the future.
@@ -1489,6 +1381,15 @@ impl SessionConfig { | |||
} | |||
} | |||
|
|||
impl From<ConfigOptions> for SessionConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lays the ground-word for eventually removing SessionConfig altogether #3887
self.config_options().get(variable) | ||
// TOOD: Move ConfigOptions into common crate | ||
match variable { | ||
"datafusion.execution.time_zone" => self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nasty hack, I hope to move ConfigOptions
into core in a subsequent PR so that it can be used directly instead of via ContextProvider
@@ -103,9 +103,10 @@ query R | |||
SHOW ALL | |||
---- | |||
datafusion.catalog.create_default_catalog_and_schema true | |||
datafusion.catalog.format NULL | |||
datafusion.catalog.has_header false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option was missing from the definitions
datafusion.catalog.information_schema true | ||
datafusion.catalog.location NULL | ||
datafusion.catalog.type NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type
is a reserved keyword in rust, and the name was pretty overloaded anyway, so I opted to rename this to format
. Amusingly this was already what the local variables were named as
let config = SessionConfig::new() | ||
.with_batch_size(TEST_BATCH_SIZE) | ||
.set_u64( | ||
"datafusion.execution.coalesce_target_batch_size", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was missed in #4757 the new config returns an error for invalid keys and so caught this
self.config_options | ||
.get_bool(OPT_ENABLE_ROUND_ROBIN_REPARTITION) | ||
.unwrap_or_default() | ||
map | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two were added in #4757. They're no longer necessary and so I just opted to remove them
}; | ||
let mut config = ConfigOptions::new(); | ||
for (k, v) in task_props { | ||
let _ = config.set(&k, &v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shows deserializing from a HashMap, as an added bonus this will now work for all keys and not just a specific subset
Longer-term we probably want to change this signature to accept ConfigOptions
directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to #4758
self.set(key, ScalarValue::Utf8(Some(value.into()))) | ||
} | ||
/// Returns the [`ConfigEntry`] stored within this [`ConfigOptions`] | ||
pub fn entries(&self) -> Vec<ConfigEntry> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the mechanism by which ConfigOptions
can be both serialized and documentation generated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how the options are now typed / type checked at compile time rather than runtime.
benchmarks/src/bin/h2o.rs
Outdated
@@ -63,9 +64,10 @@ async fn main() -> Result<()> { | |||
|
|||
async fn group_by(opt: &GroupBy) -> Result<()> { | |||
let path = opt.path.to_str().unwrap(); | |||
let config = SessionConfig::from_env().with_batch_size(65535); | |||
let mut config = ConfigOptions::from_env()?; | |||
config.built_in.execution.batch_size = 65535; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is kind of cool that it is a struct format that is checked by the compiler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it
The created structure works well in my editor.
Thank you @tustvold . This is the one missing
It think we should leave this open at least a full work day after approval (aka don't merge until next Monday)
@@ -1153,7 +1103,7 @@ pub struct SessionConfig { | |||
/// due to `resolve_table_ref` which passes back references) | |||
default_schema: String, | |||
/// Configuration options | |||
config_options: ConfigOptions, | |||
options: ConfigOptions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👎 for the change to the more overloaded options
name, even though I realize it is more concise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, fustfmt was being obtuse and formatting things over 7 lines and this made it happy. SessionConfig isn't long for this world so I think it is fine
/// replaced by [`config_options`]. | ||
/// | ||
/// [`config_options`]: SessionContext::config_option | ||
pub fn to_props(&self) -> HashMap<String, String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be related to #4758 from @yahoNanJing
}; | ||
let mut config = ConfigOptions::new(); | ||
for (k, v) in task_props { | ||
let _ = config.set(&k, &v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to #4758
| datafusion.optimizer.repartition_windows | Boolean | true | Should DataFusion repartition data using the partitions keys to execute window functions in parallel using the provided `target_partitions` level | | ||
| datafusion.optimizer.skip_failed_rules | Boolean | true | When set to true, the logical plan optimizer will produce warning messages if any optimization rules produce errors and then proceed to the next rule. When set to false, any rules that produce errors will cause the query to fail. | | ||
| datafusion.optimizer.top_down_join_key_reordering | Boolean | true | When set to true, the physical plan optimizer will run a top down process to reorder the join keys. Defaults to true | | ||
| key | default | description | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The major difference here seems to be the type column is no longer present. Was that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the fields are now Rust typed not datatype. I could include this, but it seemed unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably clear from the names if something is boolean or numeric. I don't feel strongly
use std::collections::BTreeMap; | ||
use std::fmt::Display; | ||
|
||
macro_rules! config_namespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I think some introduction doc comments here would help -- specifically some examples of what this macro is creating would be nice
I plan to merge this tomorrow morning (GMT) unless anyone needs more time to review |
Benchmark runs are scheduled for baseline = 6c81c10 and contender = 21169b9. 21169b9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #4517
Closes #3500
Closes #4752
Relates to #3887
Rationale for this change
The current approach is:
What changes are included in this PR?
This reworks ConfigOptions to instead be a struct, using a declarative macro to auto-generate serialization logic
Are these changes tested?
Are there any user-facing changes?
Yes