-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate batch_size
configuration in ExecutionConfig
, RuntimeConfig
and PhysicalPlanConfig
#1562
Conversation
… to `FileScanConfig` to make it clearer
@@ -246,8 +246,8 @@ impl TryInto<LogicalPlan> for &protobuf::LogicalPlanNode { | |||
.collect::<Result<Vec<_>, _>>()?, | |||
partition_count as usize, | |||
), | |||
PartitionMethod::RoundRobin(batch_size) => { | |||
Partitioning::RoundRobinBatch(batch_size as usize) | |||
PartitionMethod::RoundRobin(partition_count) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch_size
here is wrong, it's partition count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice find
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh god, good find, 4096 partitions is not going to be fun.
@@ -92,8 +92,8 @@ mod roundtrip_tests { | |||
.map_err(BallistaError::DataFusionError)?, | |||
); | |||
|
|||
for batch_size in test_batch_sizes.iter() { | |||
let rr_repartition = Partitioning::RoundRobinBatch(*batch_size); | |||
for partition_count in test_partition_counts.iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
@@ -178,7 +178,7 @@ mod tests { | |||
async fn read_limit() -> Result<()> { | |||
let runtime = Arc::new(RuntimeEnv::default()); | |||
let projection = Some(vec![0, 1, 2, 3]); | |||
let exec = get_exec("aggregate_test_100.csv", &projection, 1024, Some(1)).await?; | |||
let exec = get_exec("aggregate_test_100.csv", &projection, Some(1)).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this PR, most of the magic number 1024
is removed from the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea of putting batch_size
on the RuntimeConfig
is 🏅 👍
@@ -901,14 +898,13 @@ pub struct ExecutionConfig { | |||
/// Should Datafusion parquet reader using the predicate to prune data | |||
parquet_pruning: bool, | |||
/// Runtime configurations such as memory threshold and local disk for spill | |||
pub runtime_config: RuntimeConfig, | |||
pub runtime: RuntimeConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to runtime
since I think the word config is redundant. Consider the usage exec_config.runtime_config.batch_size
for example. Everything in exec_config
is config.
Thanks @yjshen -- I'll check this out over the weekend.
I found #1526 (comment) -- sorry |
ExecutionConfig
, RuntimeConfig
and PhysicalPlanConfig
batch_size
configuration in ExecutionConfig
, RuntimeConfig
and PhysicalPlanConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -246,8 +246,8 @@ impl TryInto<LogicalPlan> for &protobuf::LogicalPlanNode { | |||
.collect::<Result<Vec<_>, _>>()?, | |||
partition_count as usize, | |||
), | |||
PartitionMethod::RoundRobin(batch_size) => { | |||
Partitioning::RoundRobinBatch(batch_size as usize) | |||
PartitionMethod::RoundRobin(partition_count) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice find
@@ -178,7 +178,7 @@ mod tests { | |||
async fn read_limit() -> Result<()> { | |||
let runtime = Arc::new(RuntimeEnv::default()); | |||
let projection = Some(vec![0, 1, 2, 3]); | |||
let exec = get_exec("aggregate_test_100.csv", &projection, 1024, Some(1)).await?; | |||
let exec = get_exec("aggregate_test_100.csv", &projection, Some(1)).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea of putting batch_size
on the RuntimeConfig
is 🏅 👍
Thanks @alamb , the main reason I'm not moving I was trying to consolidate more of the configs from the three |
No, |
Which issue does this PR close?
Closes #1565 .
Rationale for this change
batch_size
is only execution runtime related, should consolidate all intoRuntimeConfig
.PhysicalPlanConfig
toFileScanConfig
to make it clear.What changes are included in this PR?
Are there any user-facing changes?