You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @sagarlakshmipathy
Can you please also share the performance number per query? on TPCDS the Q72 is still a trouble for gluten and needs some special config. Here's some discussions: #1775
Are you testing with HUDI tables by any chance? --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog
For now the HUDI support is not ready in Gluten. It will actually run with vanilla Spark code, and with a RowtoColumn(memcpy) connect to Gluten native operators. So this will actually bring lots of overhead.
It is quite likely due to the fallback of scanning HUDI tables. Here's the issue tracker for unified data lake design, ICEBERG and DELTA LAKE are now both supported(not 100%) now. #3378
Backend
VL (Velox)
Bug description
[Expected behavior] Faster query runs compared to OSS Spark
[actual behavior] OSS Spark runs in half the time taken by Gluten+Velox Spark.
Spark version
None
Spark configurations
Gluten+Velox+Spark
OSS Spark
System information
Environment: Amazon EMR - 10 workers, 1 driver all
m5.4xlarge
OS: Amazon Linux 2
Relevant logs
Wondering what you need me to capture that'll help you
The text was updated successfully, but these errors were encountered: