Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

derive(Debug) for Expr #6708

Merged
merged 13 commits into from
Jun 21, 2023
Merged

derive(Debug) for Expr #6708

merged 13 commits into from
Jun 21, 2023

Conversation

parkma99
Copy link
Contributor

Which issue does this PR close?

Closes #6677

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules labels Jun 17, 2023
@github-actions github-actions bot added physical-expr Changes to the physical-expr crates sql SQL Planner labels Jun 19, 2023
@crepererum
Copy link
Contributor

So I had a quick look at the outputs: I think the changes to the sqllogictests (e.g. on=[(Column { name: "t1_id", index: 0 }, Column { name: "t2_id", index: 0 })] => on=[(t1_id@0, t2_id@0)]) are desired. The new output is way easier to read and matches the formatting that we use for other plan formatting. Please update the tests accordingly.

@parkma99
Copy link
Contributor Author

So I had a quick look at the outputs: I think the changes to the sqllogictests (e.g. on=[(Column { name: "t1_id", index: 0 }, Column { name: "t2_id", index: 0 })] => on=[(t1_id@0, t2_id@0)]) are desired. The new output is way easier to read and matches the formatting that we use for other plan formatting. Please update the tests accordingly.

Ok

[SQL] explain select * from hashjoin_datatype_table_t1 t1 right join hashjoin_datatype_table_t2 t2 on t1.c3 = t2.c3
[Diff] (-expected|+actual)
logical_plan
Right Join: CAST(t1.c3 AS Decimal128(10, 2)) = t2.c3
--SubqueryAlias: t1
----TableScan: hashjoin_datatype_table_t1 projection=[c1, c2, c3, c4]
--SubqueryAlias: t2
----TableScan: hashjoin_datatype_table_t2 projection=[c1, c2, c3, c4]
physical_plan
ProjectionExec: expr=[c1@0 as c1, c2@1 as c2, c3@2 as c3, c4@3 as c4, c1@5 as c1, c2@6 as c2, c3@7 as c3, c4@8 as c4]

  • --SortMergeJoin: join_type=Right, on=[(Column { name: "CAST(t1.c3 AS Decimal128(10, 2))", index: 4 }, Column { name: "c3", index: 2 })]
  • ----SortExec: expr=[CAST(t1.c3 AS Decimal128(10, 2))@4 ASC]
  • --SortMergeJoin: join_type=Right, on=[(Cast(Cast { expr: Column(Column { relation: Some(Bare { table: "t1" }), name: "c3" }), data_type: Decimal128(10, 2) })@4, c3@2)]
  • ----SortExec: expr=[Cast(Cast { expr: Column(Column { relation: Some(Bare { table: "t1" }), name: "c3" }), data_type: Decimal128(10, 2) })@4 ASC]
    ------CoalesceBatchesExec: target_batch_size=4096
  • --------RepartitionExec: partitioning=Hash([Column { name: "CAST(t1.c3 AS Decimal128(10, 2))", index: 4 }], 2), input_partitions=2
  • ----------ProjectionExec: expr=[c1@0 as c1, c2@1 as c2, c3@2 as c3, c4@3 as c4, CAST(c3@2 AS Decimal128(10, 2)) as CAST(t1.c3 AS Decimal128(10, 2))]
  • --------RepartitionExec: partitioning=Hash([Cast(Cast { expr: Column(Column { relation: Some(Bare { table: "t1" }), name: "c3" }), data_type: Decimal128(10, 2) })@4], 2), input_partitions=2
  • ----------ProjectionExec: expr=[c1@0 as c1, c2@1 as c2, c3@2 as c3, c4@3 as c4, CAST(c3@2 AS Decimal128(10, 2)) as Cast(Cast { expr: Column(Column { relation: Some(Bare { table: "t1" }), name: "c3" }), data_type: Decimal128(10, 2) })]
    ------------RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1
    --------------MemoryExec: partitions=1, partition_sizes=[1]
    ----SortExec: expr=[c3@2 ASC]
    ------CoalesceBatchesExec: target_batch_size=4096
    --------RepartitionExec: partitioning=Hash([c3@2], 2), input_partitions=2
    ----------RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1
    ------------MemoryExec: partitions=1, partition_sizes=[1]
    at tests/sqllogictests/test_files/joins.slt:2672

Case 2 I find the reason. I will update right now.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jun 20, 2023
@@ -1040,7 +1040,7 @@ mod test {
let empty = empty_with_type(DataType::Int64);
let plan = LogicalPlan::Projection(Projection::try_new(vec![expr], empty)?);
let expected =
"Projection: a IN ([CAST(Int32(1) AS Int64), CAST(Int8(4) AS Int64), Int64(8)]) AS a IN (Map { iter: Iter([Int32(1), Int8(4), Int64(8)]) })\
"Projection: a IN ([CAST(Int32(1) AS Int64), CAST(Int8(4) AS Int64), Int64(8)]) AS a IN (Map { iter: Iter([Literal(Int32(1)), Literal(Int8(4)), Literal(Int64(8))]) })\
\n EmptyRelation";
Copy link
Contributor Author

@parkma99 parkma99 Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test case should not change, but current I failed to past this test case.
When I change the code 0ef94 to use Display I got failed by Projection: a IN ([CAST(Int32(1) AS Int64), CAST(Int8(4) AS Int64), Int64(8)]) missing Alias.

I current could not find a good way to solve it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which alias?

Copy link
Contributor Author

@parkma99 parkma99 Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AS a IN (Map { iter: Iter([Int32(1), Int8(4), Int64(8)]) })\ is missing,I debug find it's a alias.

@parkma99 parkma99 marked this pull request as ready for review June 20, 2023 16:27

let expected = match join_type {
// Should include 3 RepartitionExecs
JoinType::Inner | JoinType::Left | JoinType::LeftSemi | JoinType::LeftAnti => vec![
top_join_plan.as_str(),
join_plan.as_str(),
"RepartitionExec: partitioning=Hash([Column { name: \"a\", index: 0 }], 10), input_partitions=1",
"RepartitionExec: partitioning=Hash([a@0], 10), input_partitions=1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this certainly looks nicer 👍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @parkma99 -- this is looking great and is very appreciated.

I can try and help with the formatting tomorrow if you are still having problems

@@ -79,7 +79,7 @@ use std::sync::Arc;
/// assert_eq!(binary_expr.op, Operator::Eq);
/// }
/// ```
#[derive(Clone, PartialEq, Eq, Hash)]
#[derive(Clone, PartialEq, Eq, Hash, Debug)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "{self:?}")
}
#[macro_export]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how common this functionality is I wonder if we could move this macro (or maybe make a function as I don't see any benefit in making it a macro) in datafusion_util and use it more around the codebase.

Just a thought

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making a function and moving to datafusin_util is a good option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can give it a try maybe after this PR is merged

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried #6739 -- it isn't great -- any thoughts appreciated

@parkma99
Copy link
Contributor Author

parkma99 commented Jun 21, 2023

Hello @alamb , I would like to pass all tests in my local machine, but got some error.
My computer is Mac mini m2 with 16+256.
When I run command cargo test in Terminal, it got error with message :


---- fuzz_cases::order_spill_fuzz::test_sort_1k_mem stdout ----
thread 'fuzz_cases::order_spill_fuzz::test_sort_1k_mem' panicked at 'called `Result::unwrap()` on an `Err` value: Execution("Failed to create partition file at \"/var/folders/c7/nfvz06pj3t5dxq4ycycq8czm0000gn/T/.tmpmetJoJ/.tmpAliZCF\": Os { code: 24, kind: Uncategorized, message: \"Too many open files\" }")', datafusion/core/tests/fuzz_cases/order_spill_fuzz.rs:85:63
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    fuzz_cases::order_spill_fuzz::test_sort_1k_mem

test result: FAILED. 14 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 14.88s

error: test failed, to rerun pass `-p datafusion --test fuzz`

When I run command cargo test in VScode Terminal, it got error with message :

thread '<unknown>' has overflowed its stack
fatal runtime error: stack overflow
error: test failed, to rerun pass `-p datafusion-proto --lib`

Caused by:
  process didn't exit successfully: `/Users/m/code/arrow-datafusion/target/debug/deps/datafusion_proto-34d869f2f58ec4e8` (signal: 6, SIGABRT: process abort signal)
```.

My OS version : 

ProductName: macOS
ProductVersion: 13.4
BuildVersion: 22F66

Rust version:`rustc 1.70.0 (90c541806 2023-05-31)`

parkma99 added 10 commits June 21, 2023 10:03

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@alamb
Copy link
Contributor

alamb commented Jun 21, 2023

When I run command cargo test in Terminal, it got error with message :
message: \"Too many open files\"

That looks like you either have to increase your ulimit to increase the number of open files, or perhaps you can reduce the parallelism of the tests:

$ cargo test -- --test-threads=1

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome @parkma99 -- thank you so much for making it happen

It is a big step forward for DataFusion usability I think

------------AggregateExec: mode=Partial, gby=[cntrycode@0 as cntrycode], aggr=[COUNT(UInt8(1)), SUM(custsale.c_acctbal)]
--------------ProjectionExec: expr=[substr(c_phone@0, 1, 2) as cntrycode, c_acctbal@1 as c_acctbal]
----------------NestedLoopJoinExec: join_type=Inner, filter=BinaryExpr { left: CastExpr { expr: Column { name: "c_acctbal", index: 0 }, cast_type: Decimal128(19, 6), cast_options: CastOptions { safe: false, format_options: FormatOptions { safe: true, null: "", date_format: None, datetime_format: None, timestamp_format: None, timestamp_tz_format: None, time_format: None } } }, op: Gt, right: Column { name: "AVG(customer.c_acctbal)", index: 1 } }
----------------NestedLoopJoinExec: join_type=Inner, filter=CAST(c_acctbal@0 AS Decimal128(19, 6)) > AVG(customer.c_acctbal)@1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really much nicer 👍

fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "{self:?}")
}
#[macro_export]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can give it a try maybe after this PR is merged

@alamb alamb changed the title Minor: derive(Debug) for Expr derive(Debug) for Expr Jun 21, 2023
@alamb alamb merged commit 1b4153f into apache:main Jun 21, 2023
@parkma99 parkma99 deleted the debug_expr branch June 22, 2023 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Debug impl Expr is lossy
3 participants