Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-4475][VL] Allow offloading Spark hour function #4495

Merged
merged 1 commit into from
Jan 29, 2024

Conversation

zwangsheng
Copy link
Contributor

What changes were proposed in this pull request?

After Velox support facebookincubator/velox#8237 HOUR function, we should not fail HOUR function with native validation.

Close #4475

How was this patch tested?

Local test

Run Spark Local Shell with Gluten(add this PR)

scala> spark.sql("CREATE TABLE t1(c1 int, c2 timestamp) USING parquet")
res0: org.apache.spark.sql.DataFrame = []

scala> spark.sql("INSERT INTO t1 VALUES(1, NOW())")
res1: org.apache.spark.sql.DataFrame = []

scala> spark.sql("SELECT c1, HOUR(c2) from t1 LIMIT 1").show
+---+--------+
| c1|hour(c2)|
+---+--------+
|  1|      10|
+---+--------+

截屏2024-01-23 18 54 42

Copy link

#4475

@zwangsheng
Copy link
Contributor Author

Hi @PHILO-HE @ulysses-you, fix gluten validate spark HOUR function, PTAL

@ulysses-you
Copy link
Contributor

can we add a test ?

@@ -907,4 +907,9 @@ class TestOperator extends VeloxWholeStageTransformerSuite with AdaptiveSparkPla
|""".stripMargin
)(df => checkFallbackOperators(df, 1))
}

test("Support HOUR function") {
runQueryAndCompare("SELECT l_orderkey, HOUR('2024-01-23 19:27:01') FROM lineitem LIMIT 1")(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your pr.
It would be better to create a table for this test if there is no suitable tpch data for this function, like the below. I'm afraid Spark will optimize literal input with its constantFolding, which meaninglessly makes an alias projection offload to gluten.
https://github.com/oap-project/gluten/blob/main/backends-velox/src/test/scala/io/glutenproject/execution/TestOperator.scala#L441

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your info, created a template table to test HOUR function.

Copy link

Run Gluten Clickhouse CI

4 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@PHILO-HE
Copy link
Contributor

Hi @zwangsheng, it looks the CI failure is a known issue: "-08:00 not found in timezone database". I once submitted an issue in velox community: facebookincubator/velox#7804. You can remove such test case where "-8:00" is used and leave some comments to clarify.

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

@ulysses-you
Copy link
Contributor

@zwangsheng please resolve conflicts first

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zwangsheng
Copy link
Contributor Author

Hi @PHILO-HE @ulysses-you, failed CI test not related. PTAL again.

PHILO-HE
PHILO-HE previously approved these changes Jan 29, 2024
Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for your work!

@ulysses-you
Copy link
Contributor

It seems the failed tests in ck backend is related.

18:38:27  - Gluten - Hour *** FAILED ***
18:38:27    Incorrect evaluation: hour(2013-11-08 13:10:15, Some(UTC)), actual: 5, expected: 13 (GlutenTestsTrait.scala:287)
18:38:28  <tr><th>print_markdown_GlutenDateExpressionsSuite</th><th colspan=5>Case Count: 11, OffloadGluten Case Count: 10, Unit Count 42, OffloadGluten Unit Count 40</th></tr>
18:38:28  <tr><td>print_markdown_GlutenDateExpressionsSuite</td><td>datetime function current_date</td><td>success</td><td></td><td></td><td></td></tr>
18:38:28  <tr><td>print_markdown_GlutenDateExpressionsSuite</td><td>Gluten - Hour</td><td>error</td><td>fallback</td><td>org.apache.spark.sql.catalyst.expressions.Literal</td><td>class io.glutenproject.execution.ProjectExecTransformer</td></tr>
18:38:28  <tr><td>print_markdown_GlutenDateExpressionsSuite</td><td>date_sub</td><td>success</td><td></td><td></td><td></td></tr>
18:38:28  <tr><td>print_markdown_GlutenDateExpressionsSuite</td><td>datediff</td><td>success</td><td></td><td></td><td></td></tr>
18:38:28  <tr><td>print_markdown_GlutenDateExpressionsSuite</td><td>to_utc_timestamp - invalid time zone id</td><td>success</td><td></td><td></td><td></td></tr>
18:38:28  <tr><td>print_markdown_GlutenDateExpressionsSuite</td><td>datetime function localtimestamp</td><td>success</td><td></td><td></td><td></td></tr>
18:38:28  <tr><td>print_markdown_GlutenDateExpressionsSuite</td><td>from_utc_timestamp - invalid time zone id</td><td>success</td><td></td><td></td><td></td></tr>
18:38:28  <tr><td>print_markdown_GlutenDateExpressionsSuite</td><td>datetime function current_timestamp</td><td>success</td><td></td><td></td><td></td></tr>
18:38:28  <tr><td>print_markdown_GlutenDateExpressionsSuite</td><td>date_add</td><td>success</td><td></td><td></td><td></td></tr>
18:38:28  <tr><td>print_markdown_GlutenDateExpressionsSuite</td><td>Consistent error handling for datetime formatting and parsing functions</td><td>success</td><td></td><td></td><td></td></tr>
18:38:28  <tr><td>print_markdown_GlutenDateExpressionsSuite</td><td>last_day</td><td>success</td><td></td><td></td><td></td></tr>

@PHILO-HE
Copy link
Contributor

@zwangsheng, we should exclude the new test "Gluten - Hour" in ClickHouseTestSettings. Thanks!

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

@PHILO-HE
Copy link
Contributor

I note a new conflict. Could you rebase again? Thanks so much!

Copy link

Run Gluten Clickhouse CI

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks!

@PHILO-HE PHILO-HE changed the title [GLUTEN-4475] Remove fast fail HOUR function in SubstraitToVeloxPlanValidator [GLUTEN-4475][VL] Allow offloading Spark hour function Jan 29, 2024
@PHILO-HE PHILO-HE merged commit f040fe8 into apache:main Jan 29, 2024
20 checks passed
@zwangsheng
Copy link
Contributor Author

Thanks all for your review @ulysses-you @PHILO-HE 😄

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4495_time.csv log/native_master_01_28_2024_d041ce584_time.csv difference percentage
q1 32.12 32.99 0.865 102.69%
q2 24.07 23.85 -0.224 99.07%
q3 38.83 37.74 -1.099 97.17%
q4 37.74 38.98 1.237 103.28%
q5 71.67 67.95 -3.721 94.81%
q6 7.36 7.02 -0.340 95.39%
q7 85.51 80.87 -4.645 94.57%
q8 84.15 84.35 0.196 100.23%
q9 125.64 121.55 -4.089 96.75%
q10 43.35 43.06 -0.287 99.34%
q11 20.96 20.27 -0.683 96.74%
q12 29.48 27.38 -2.093 92.90%
q13 46.28 45.45 -0.826 98.22%
q14 15.53 22.94 7.414 147.74%
q15 28.86 28.71 -0.150 99.48%
q16 15.00 14.08 -0.919 93.87%
q17 100.32 102.67 2.357 102.35%
q18 147.39 148.89 1.499 101.02%
q19 14.01 12.47 -1.541 89.00%
q20 26.08 26.11 0.032 100.12%
q21 226.86 226.03 -0.830 99.63%
q22 13.60 13.55 -0.060 99.56%
total 1234.84 1226.93 -7.906 99.36%

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_master_01_29_2024_time.csv log/native_master_01_28_2024_d041ce584_time.csv difference percentage
q1 33.80 32.99 -0.816 97.59%
q2 25.67 23.85 -1.824 92.89%
q3 38.07 37.74 -0.337 99.11%
q4 40.59 38.98 -1.608 96.04%
q5 69.30 67.95 -1.353 98.05%
q6 7.04 7.02 -0.012 99.83%
q7 85.84 80.87 -4.967 94.21%
q8 86.69 84.35 -2.338 97.30%
q9 121.85 121.55 -0.298 99.76%
q10 43.51 43.06 -0.453 98.96%
q11 19.98 20.27 0.295 101.48%
q12 29.14 27.38 -1.753 93.98%
q13 45.40 45.45 0.048 100.11%
q14 15.65 22.94 7.291 146.57%
q15 29.66 28.71 -0.942 96.82%
q16 14.28 14.08 -0.196 98.63%
q17 102.13 102.67 0.548 100.54%
q18 149.99 148.89 -1.099 99.27%
q19 13.92 12.47 -1.453 89.57%
q20 26.90 26.11 -0.790 97.06%
q21 225.67 226.03 0.361 100.16%
q22 13.48 13.55 0.061 100.45%
total 1238.57 1226.93 -11.637 99.06%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Remove native validation fail with HOUR function.
4 participants