Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE] Add schema validation for broadcast exchange #4608

Merged
merged 2 commits into from
Feb 2, 2024

Conversation

zhztheplayer
Copy link
Member

Follows #4544 (comment) in #4544, fix the following error thrown in Spark UT:

Input schema contains unsupported type when convert row to columnar for StructType(StructField(window,StructType(StructField(start,TimestampNTZType,true),StructField(end,TimestampNTZType,true)),false),StructField(othervalue,IntegerType,false)) due to do not support data type: TimestampNTZType
java.lang.UnsupportedOperationException: Input schema contains unsupported type when convert row to columnar for StructType(StructField(window,StructType(StructField(start,TimestampNTZType,true),StructField(end,TimestampNTZType,true)),false),StructField(othervalue,IntegerType,false)) due to do not support data type: TimestampNTZType
	at io.glutenproject.execution.RowToVeloxColumnarExec.$anonfun$doExecuteColumnarInternal$1(RowToVeloxColumnarExec.scala:52)
	at scala.Option.foreach(Option.scala:407)
	at io.glutenproject.execution.RowToVeloxColumnarExec.doExecuteColumnarInternal(RowToVeloxColumnarExec.scala:49)
	at io.glutenproject.execution.RowToColumnarExecBase.doExecuteColumnar(RowToColumnarExecBase.scala:62)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:221)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
	at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:217)
	at io.glutenproject.backendsapi.velox.SparkPlanExecApiImpl.createBroadcastRelation(SparkPlanExecApiImpl.scala:332)
	at org.apache.spark.sql.execution.ColumnarBroadcastExchangeExec.$anonfun$relationFuture$2(ColumnarBroadcastExchangeExec.scala:79)
	at io.glutenproject.utils.Arm$.withResource(Arm.scala:25)
	at io.glutenproject.metrics.GlutenTimeMetric$.millis(GlutenTimeMetric.scala:37)
	at org.apache.spark.sql.execution.ColumnarBroadcastExchangeExec.$anonfun$relationFuture$1(ColumnarBroadcastExchangeExec.scala:69)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Copy link

github-actions bot commented Feb 1, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Feb 1, 2024

Run Gluten Clickhouse CI

1 similar comment
Copy link

github-actions bot commented Feb 1, 2024

Run Gluten Clickhouse CI

@zhztheplayer
Copy link
Member Author

/Benchmark Velox

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks!

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4608_time.csv log/native_master_01_31_2024_f1ed68005_time.csv difference percentage
q1 32.58 32.75 0.179 100.55%
q2 23.78 24.25 0.467 101.96%
q3 39.65 37.43 -2.216 94.41%
q4 37.24 37.40 0.161 100.43%
q5 70.47 70.15 -0.323 99.54%
q6 7.44 7.00 -0.444 94.03%
q7 84.80 83.66 -1.140 98.66%
q8 85.68 85.87 0.187 100.22%
q9 120.16 124.41 4.246 103.53%
q10 43.09 43.68 0.590 101.37%
q11 20.07 20.32 0.246 101.23%
q12 27.98 26.52 -1.459 94.78%
q13 44.98 45.62 0.645 101.43%
q14 17.36 19.16 1.800 110.37%
q15 27.29 28.85 1.564 105.73%
q16 14.59 14.00 -0.589 95.96%
q17 101.59 102.02 0.430 100.42%
q18 145.42 149.84 4.414 103.04%
q19 12.48 12.62 0.134 101.08%
q20 26.31 26.35 0.045 100.17%
q21 224.28 228.32 4.043 101.80%
q22 13.60 13.50 -0.094 99.31%
total 1220.84 1233.72 12.883 101.06%

@zhztheplayer zhztheplayer merged commit b0d50c0 into apache:main Feb 2, 2024
19 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4608_time.csv log/native_master_01_31_2024_f1ed68005_time.csv difference percentage
q1 32.93 32.75 -0.173 99.47%
q2 25.20 24.25 -0.957 96.20%
q3 36.78 37.43 0.655 101.78%
q4 38.50 37.40 -1.102 97.14%
q5 71.76 70.15 -1.611 97.75%
q6 7.04 7.00 -0.043 99.39%
q7 83.99 83.66 -0.324 99.61%
q8 86.59 85.87 -0.722 99.17%
q9 119.99 124.41 4.415 103.68%
q10 42.74 43.68 0.942 102.20%
q11 20.75 20.32 -0.425 97.95%
q12 29.48 26.52 -2.953 89.98%
q13 46.38 45.62 -0.756 98.37%
q14 20.66 19.16 -1.501 92.74%
q15 28.25 28.85 0.599 102.12%
q16 13.83 14.00 0.173 101.25%
q17 102.39 102.02 -0.372 99.64%
q18 150.04 149.84 -0.207 99.86%
q19 12.53 12.62 0.085 100.68%
q20 27.80 26.35 -1.449 94.79%
q21 226.06 228.32 2.267 101.00%
q22 13.60 13.50 -0.098 99.28%
total 1237.28 1233.72 -3.558 99.71%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants