[SPARK-34003][SQL][FOLLOWUP] Avoid pushing modified Char/Varchar sort attributes into aggregate for existing ones #31129

yaooqinn · 2021-01-11T10:28:00Z

What changes were proposed in this pull request?

In 0f8e5dd, we partially fix the rule conflicts between PaddingAndLengthCheckForCharVarchar and ResolveAggregateFunctions, as error still exists in

sql like SELECT substr(v, 1, 2), sum(i) FROM t GROUP BY v ORDER BY substr(v, 1, 2)

[info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: expression 'spark_catalog.default.t.`v`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;
[info]   Project [substr(v, 1, 2)#100, sum(i)#101L]
[info]   +- Sort [aggOrder#102 ASC NULLS FIRST], true
[info]      +- !Aggregate [v#106], [substr(v#106, 1, 2) AS substr(v, 1, 2)#100, sum(cast(i#98 as bigint)) AS sum(i)#101L, substr(v#103, 1, 2) AS aggOrder#102
[info]         +- SubqueryAlias spark_catalog.default.t
[info]            +- Project [if ((length(v#97) <= 3)) v#97 else if ((length(rtrim(v#97, None)) > 3)) cast(raise_error(concat(input string of length , cast(length(v#97) as string),  exceeds varchar type length limitation: 3)) as string) else rpad(rtrim(v#97, None), 3,  ) AS v#106, i#98]
[info]               +- Relation[v#97,i#98] parquet
[info]
[info]   Project [substr(v, 1, 2)#100, sum(i)#101L]
[info]   +- Sort [aggOrder#102 ASC NULLS FIRST], true
[info]      +- !Aggregate [v#106], [substr(v#106, 1, 2) AS substr(v, 1, 2)#100, sum(cast(i#98 as bigint)) AS sum(i)#101L, substr(v#103, 1, 2) AS aggOrder#102
[info]         +- SubqueryAlias spark_catalog.default.t
[info]            +- Project [if ((length(v#97) <= 3)) v#97 else if ((length(rtrim(v#97, None)) > 3)) cast(raise_error(concat(input string of length , cast(length(v#97) as string),  exceeds varchar type length limitation: 3)) as string) else rpad(rtrim(v#97, None), 3,  ) AS v#106, i#98]
[info]               +- Relation[v#97,i#98] parquet

We need to look recursively into children to find char/varchars.

In this PR, we try to resolve the full attributes including the original Aggregate expressions and the candidates in SortOrder together, then use the new re-resolved Aggregate expressions to determine which candidate in the SortOrder shall be pushed. This can avoid mismatch for the same attributes w/o this change, as the expressions returned by executeSameContext will change when PaddingAndLengthCheckForCharVarchar takes effects. W/ this change, the expressions can be matched correctly.

For those unmatched, w need to look recursively into children to find char/varchars instead of the expression itself only.

Why are the changes needed?

bugfix

Does this PR introduce any user-facing change?

no

How was this patch tested?

add new tests

…gthCheckForCharVarchar and ResolveAggregateFunctions

SparkQA · 2021-01-11T11:34:54Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38508/

SparkQA · 2021-01-11T11:39:52Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38508/

SparkQA · 2021-01-11T12:22:57Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38513/

SparkQA · 2021-01-11T12:57:51Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38513/

SparkQA · 2021-01-11T14:46:28Z

Test build #133921 has finished for PR 31129 at commit 236bb18.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-01-11T15:47:00Z

Test build #133926 has finished for PR 31129 at commit fe93b53.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-01-12T07:05:27Z

Test build #133952 has finished for PR 31129 at commit 3e4ca3f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-01-12T08:20:38Z

thanks, merging to master/3.1!

… attributes into aggregate for existing ones ### What changes were proposed in this pull request? In 0f8e5dd, we partially fix the rule conflicts between `PaddingAndLengthCheckForCharVarchar` and `ResolveAggregateFunctions`, as error still exists in sql like ```SELECT substr(v, 1, 2), sum(i) FROM t GROUP BY v ORDER BY substr(v, 1, 2)``` ```sql [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: expression 'spark_catalog.default.t.`v`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.; [info] Project [substr(v, 1, 2)#100, sum(i)#101L] [info] +- Sort [aggOrder#102 ASC NULLS FIRST], true [info] +- !Aggregate [v#106], [substr(v#106, 1, 2) AS substr(v, 1, 2)#100, sum(cast(i#98 as bigint)) AS sum(i)#101L, substr(v#103, 1, 2) AS aggOrder#102 [info] +- SubqueryAlias spark_catalog.default.t [info] +- Project [if ((length(v#97) <= 3)) v#97 else if ((length(rtrim(v#97, None)) > 3)) cast(raise_error(concat(input string of length , cast(length(v#97) as string), exceeds varchar type length limitation: 3)) as string) else rpad(rtrim(v#97, None), 3, ) AS v#106, i#98] [info] +- Relation[v#97,i#98] parquet [info] [info] Project [substr(v, 1, 2)#100, sum(i)#101L] [info] +- Sort [aggOrder#102 ASC NULLS FIRST], true [info] +- !Aggregate [v#106], [substr(v#106, 1, 2) AS substr(v, 1, 2)#100, sum(cast(i#98 as bigint)) AS sum(i)#101L, substr(v#103, 1, 2) AS aggOrder#102 [info] +- SubqueryAlias spark_catalog.default.t [info] +- Project [if ((length(v#97) <= 3)) v#97 else if ((length(rtrim(v#97, None)) > 3)) cast(raise_error(concat(input string of length , cast(length(v#97) as string), exceeds varchar type length limitation: 3)) as string) else rpad(rtrim(v#97, None), 3, ) AS v#106, i#98] [info] +- Relation[v#97,i#98] parquet ``` We need to look recursively into children to find char/varchars. In this PR, we try to resolve the full attributes including the original `Aggregate` expressions and the candidates in `SortOrder` together, then use the new re-resolved `Aggregate` expressions to determine which candidate in the `SortOrder` shall be pushed. This can avoid mismatch for the same attributes w/o this change, as the expressions returned by `executeSameContext` will change when `PaddingAndLengthCheckForCharVarchar` takes effects. W/ this change, the expressions can be matched correctly. For those unmatched, w need to look recursively into children to find char/varchars instead of the expression itself only. ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? add new tests Closes #31129 from yaooqinn/SPARK-34003-F. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 99f8489) Signed-off-by: Wenchen Fan <[email protected]>

yaooqinn added 2 commits January 11, 2021 17:54

[SPARK-34003][SQL][FOLLOWUP] Fix Rule conflicts between PaddingAndLen…

236bb18

…gthCheckForCharVarchar and ResolveAggregateFunctions

add a test

fe93b53

yaooqinn changed the title ~~[SPARK-34003][SQL][FOLLOWUP] Avoid pushing modified Char/Varchar sort attributes into aggregate while existing~~ [SPARK-34003][SQL][FOLLOWUP] Avoid pushing modified Char/Varchar sort attributes into aggregate for existing ones Jan 11, 2021

github-actions bot added the SQL label Jan 11, 2021

regen file

3e4ca3f

cloud-fan approved these changes Jan 12, 2021

View reviewed changes

cloud-fan closed this in 99f8489 Jan 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-34003][SQL][FOLLOWUP] Avoid pushing modified Char/Varchar sort attributes into aggregate for existing ones #31129

[SPARK-34003][SQL][FOLLOWUP] Avoid pushing modified Char/Varchar sort attributes into aggregate for existing ones #31129

yaooqinn commented Jan 11, 2021 •

edited

Loading

SparkQA commented Jan 11, 2021

SparkQA commented Jan 11, 2021

SparkQA commented Jan 11, 2021

SparkQA commented Jan 11, 2021

SparkQA commented Jan 11, 2021

SparkQA commented Jan 11, 2021

SparkQA commented Jan 12, 2021

cloud-fan commented Jan 12, 2021

[SPARK-34003][SQL][FOLLOWUP] Avoid pushing modified Char/Varchar sort attributes into aggregate for existing ones #31129

[SPARK-34003][SQL][FOLLOWUP] Avoid pushing modified Char/Varchar sort attributes into aggregate for existing ones #31129

Conversation

yaooqinn commented Jan 11, 2021 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Jan 11, 2021

SparkQA commented Jan 11, 2021

SparkQA commented Jan 11, 2021

SparkQA commented Jan 11, 2021

SparkQA commented Jan 11, 2021

SparkQA commented Jan 11, 2021

SparkQA commented Jan 12, 2021

cloud-fan commented Jan 12, 2021

yaooqinn commented Jan 11, 2021 •

edited

Loading