-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify most of SessionConfig
settings into ConfigOptions
#4492
Conversation
I need to deal with the fact that target_partitions is not a constant (it is a function of the number of cpu cores) |
Actually, I found a better way: cb0f651 |
@@ -410,7 +410,7 @@ async fn get_table( | |||
let options = ListingOptions::new(format) | |||
.with_file_extension(extension) | |||
.with_target_partitions(target_partitions) | |||
.with_collect_stat(ctx.config.collect_statistics); | |||
.with_collect_stat(ctx.config.collect_statistics()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this illustrates the major API change for DataFusion users
default_schema: String, | ||
/// Whether the default catalog and schema should be created automatically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the change in this PR is to remove these fields and move them to ConfigOptions
@@ -1328,27 +1381,27 @@ impl SessionConfig { | |||
} | |||
map.insert( | |||
TARGET_PARTITIONS.to_owned(), | |||
format!("{}", self.target_partitions), | |||
format!("{}", self.target_partitions()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this whole function should likely be deprecated and instead we should use ConfigOptions directly. However, for this PR I opted to leave it alone (and thus backwards compatible for Ballista) cc @mingmwang
Looking forward to it! |
478c526
to
cb0f651
Compare
# to a known value that is unlikely to be | ||
# the real number of cores on a system | ||
statement ok | ||
SET datafusion.execution.target_partitions=7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clever 👍
FYI @mingmwang |
@alamb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice!
pub const OPT_TARGET_PARTITIONS: &str = "datafusion.execution.target_partitions"; | ||
|
||
/// Configuration option "datafusion.catalog.create_default_catalog_and_schema" | ||
pub const OPT_CREATE_DEFAULT_CATALOG_AND_SCHEMA: &str = | ||
"datafusion.catalog.create_default_catalog_and_schema"; | ||
/// Configuration option "datafusion.catalog.information_schema" | ||
pub const OPT_INFORMATION_SCHEMA: &str = "datafusion.catalog.information_schema"; | ||
|
||
/// Configuration option "datafusion.optimizer.repartition_joins" | ||
pub const OPT_REPARTITION_JOINS: &str = "datafusion.optimizer.repartition_joins"; | ||
|
||
/// Configuration option "datafusion.optimizer.repartition_aggregations" | ||
pub const OPT_REPARTITION_AGGREGATIONS: &str = | ||
"datafusion.optimizer.repartition_aggregations"; | ||
|
||
/// Configuration option "datafusion.optimizer.repartition_windows" | ||
pub const OPT_REPARTITION_WINDOWS: &str = "datafusion.optimizer.repartition_windows"; | ||
|
||
/// Configuration option "datafusion.execuction_collect_statistics" | ||
pub const OPT_COLLECT_STATISTICS: &str = "datafusion.execuction_collect_statistics"; | ||
|
||
/// Configuration option "datafusion.optimizer.filter_null_join_keys" | ||
pub const OPT_FILTER_NULL_JOIN_KEYS: &str = "datafusion.optimizer.filter_null_join_keys"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason we couldn't just make these an enum? Then we could make the set_x
and get_x
methods type safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 that is an excellent idea -- I don't think there is any reason we could not do so. I will file a follow on ticket to do so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark runs are scheduled for baseline = 2a754f8 and contender = 740a4fa. 740a4fa is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Towards #3887
Also re #4349
Rationale for this change
see #3887
TLDR is
SET <variable>
syntaxWhat changes are included in this PR?
Move all
SessionConfig
settings other than the schema / catalog names into theConfigOptions
structure,Are these changes tested?
Are there any user-facing changes?
Some fields that were
pub
not must be set via awith_
method or accessed via an accessor. The number of locations in the DataFusion codebase was quite low, which was a nice surprise to meThe mix of HashMap /
rust struct style / builder styles is somewhat of a bummer. I bet some fancy macro magic could unify the two but I don't know the right incantations