move the type coercion out of the optimizer and refactor the optimizer #3582

liukun4515 · 2022-09-22T02:27:24Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

When I do this pr https://github.com/apache/arrow-datafusion/pull/3396/files#
I am stuck by many issue about the type in the optimizer framework.

In other SQL or database system, the data type coercion should be done before any optimization.

But we do it in the physical phase, after the @andygrove work and add the TypeCoercion rule in the optimizer. It can be done in the logical phase.

After the https://github.com/apache/arrow-datafusion/pull/3396/files# merge, we need to do the refactor.

If the data type is not right, all the operation and optimization will meet some wired issue.

cc @andygrove @alamb

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

liukun4515 · 2022-09-22T02:37:24Z

alamb · 2022-09-22T11:05:11Z

I agree that type coercion doesn't really belong in an "optimizer" as it actually changes the meaning of the exprs (on purpose) where the other optimzier passes are supposed to keep the meaning of the plan the same, but make it faster in some way.

I recommend changing the code to explicitly run the type coercion logic immediately prior to running any other optimization passes. I also think the type coercion logic should not depend on any other simplification (such as constant folding) -- it should depend only on the types.

Once type coercion is done, then we can run the expr simplifier pass to clean up / simplify the expressions

Dandandan · 2022-09-22T11:07:56Z

I agree - the type coercion should be a mandatory pass that is done after planning.

liukun4515 · 2022-09-22T12:45:40Z

I agree that type coercion doesn't really belong in an "optimizer" as it actually changes the meaning of the exprs (on purpose) where the other optimzier passes are supposed to keep the meaning of the plan the same, but make it faster in some way.

I recommend changing the code to explicitly run the type coercion logic immediately prior to running any other optimization passes. I also think the type coercion logic should not depend on any other simplification (such as constant folding) -- it should depend only on the types.

Once type coercion is done, then we can run the expr simplifier pass to clean up / simplify the expressions

You got it.

andygrove · 2022-09-22T17:22:09Z

Spark has an Analysis phase that runs before the optimizer. Maybe we can learn from their model. Within the analysis phase, they have a number of type coercion rules that they run:

  override def typeCoercionRules: List[Rule[LogicalPlan]] =
    UnpivotCoercion ::
    WidenSetOperationTypes ::
    new CombinedTypeCoercionRule(
      InConversion ::
      PromoteStrings ::
      DecimalPrecision ::
      BooleanEquality ::
      FunctionArgumentConversion ::
      ConcatCoercion ::
      MapZipWithCoercion ::
      EltCoercion ::
      CaseWhenCoercion ::
      IfCoercion ::
      StackCoercion ::
      Division ::
      IntegralDivision ::
      ImplicitTypeCasts ::
      DateTimeOperations ::
      WindowFrameCoercion ::
      StringLiteralCoercion :: Nil) :: Nil

liukun4515 · 2022-09-23T12:24:49Z

Spark has an Analysis phase that runs before the optimizer. Maybe we can learn from their model. Within the analysis phase, they have a number of type coercion rules that they run:

  override def typeCoercionRules: List[Rule[LogicalPlan]] =
    UnpivotCoercion ::
    WidenSetOperationTypes ::
    new CombinedTypeCoercionRule(
      InConversion ::
      PromoteStrings ::
      DecimalPrecision ::
      BooleanEquality ::
      FunctionArgumentConversion ::
      ConcatCoercion ::
      MapZipWithCoercion ::
      EltCoercion ::
      CaseWhenCoercion ::
      IfCoercion ::
      StackCoercion ::
      Division ::
      IntegralDivision ::
      ImplicitTypeCasts ::
      DateTimeOperations ::
      WindowFrameCoercion ::
      StringLiteralCoercion :: Nil) :: Nil

Yes, I think we can follow other database or sql system.

But It's long way to go and refactor.

My plan is to do some refactor, and make good base for the next big refactor like the spark or other system.

alamb · 2022-09-23T12:26:33Z

My plan is to do some refactor, and make good base for the next big refactor like the spark or other system.

Sounds like a great plan to me

alamb · 2022-10-05T18:29:41Z

Here is a PR that contributes to this goal: #3728

liukun4515 · 2022-10-06T07:05:50Z

Here is a PR that contributes to this goal: #3728

After this pr merged I will do other Expr for type coercion

liukun4515 · 2022-10-09T02:35:32Z

after supporting all expr for type coercion, we can move this rule out of the optimizer, and use it as a separate module

liukun4515 added the enhancement New feature or request label Sep 22, 2022

liukun4515 mentioned this issue Sep 22, 2022

simplify_expressions don't support different data type for binary #3556

Closed

liukun4515 mentioned this issue Sep 22, 2022

simplify between expr should consider the data type #3587

Closed

liukun4515 mentioned this issue Oct 1, 2022

support type coercion for case when expr #3673

Closed

This was referenced Oct 4, 2022

Expose + document the type coercion API publicly #3708

Closed

Consolidate coercion code in datafusion_expr::type_coercion and submodules #3728

Merged

This was referenced Oct 6, 2022

Remove type coercions from ScalarValue and aggregation function code #3705

Merged

Add type coercion rule for concat and concat_ws #3721

Merged

This was referenced Oct 6, 2022

support type coercion for ScalarFunction expr in the logical phase #3731

Closed

move type coercion for agg/agg udf #3752

Closed

alamb mentioned this issue Dec 15, 2022

Don't ignore failed optimizer rules #4615

Closed

liukun4515 mentioned this issue Apr 5, 2023

refactor: move type_coercion to analyzer #5831

Merged

liukun4515 closed this as completed Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move the type coercion out of the optimizer and refactor the optimizer #3582

move the type coercion out of the optimizer and refactor the optimizer #3582

liukun4515 commented Sep 22, 2022

liukun4515 commented Sep 22, 2022 •

edited

Loading

alamb commented Sep 22, 2022

Dandandan commented Sep 22, 2022

liukun4515 commented Sep 22, 2022

andygrove commented Sep 22, 2022

liukun4515 commented Sep 23, 2022

alamb commented Sep 23, 2022

alamb commented Oct 5, 2022

liukun4515 commented Oct 6, 2022

liukun4515 commented Oct 9, 2022

move the type coercion out of the optimizer and refactor the optimizer #3582

move the type coercion out of the optimizer and refactor the optimizer #3582

Comments

liukun4515 commented Sep 22, 2022

liukun4515 commented Sep 22, 2022 • edited Loading

alamb commented Sep 22, 2022

Dandandan commented Sep 22, 2022

liukun4515 commented Sep 22, 2022

andygrove commented Sep 22, 2022

liukun4515 commented Sep 23, 2022

alamb commented Sep 23, 2022

alamb commented Oct 5, 2022

liukun4515 commented Oct 6, 2022

liukun4515 commented Oct 9, 2022

liukun4515 commented Sep 22, 2022 •

edited

Loading