Fix Spark Master build #3928

hvanhovell · 2024-12-05T22:48:34Z

Which Delta project/connector is this regarding?

Description

This PR fixes the master build for Delta Spark.

The fixes were needed for a number of unrelated changes in Apache Spark:

Unified Scala SQL Interface
Aggregate adds a hint parameter in master.
TableSpec adds a collation parameter in master.
Condition and errorClass are swapped in the SparkThrowable framework.
...

I opted to do this in one PR because this is the only way I am sure tests pass.

How was this patch tested?

Existing tests.

Does this PR introduce any user-facing changes?

No.

hvanhovell · 2024-12-06T01:53:11Z

spark/src/main/scala-spark-master/shims/DeltaThrowableHelperShims.scala

 object DeltaThrowableHelperShims {
  /**
   * Handles a breaking change (SPARK-46810) between Spark 3.5 and Spark Master (4.0) where
   * `error-classes.json` was renamed to `error-conditions.json`.
   */
  val SPARK_ERROR_CLASS_SOURCE_FILE = "error/error-conditions.json"
+
+  def showColumnsWithConflictDatabasesError(


Needed because of apache/spark@53c1f31

hvanhovell · 2024-12-06T01:53:46Z

spark/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala

@@ -72,8 +72,7 @@ import org.apache.spark.sql.types._
 * A SQL parser that tries to parse Delta commands. If failing to parse the SQL text, it will
 * forward the call to `delegate`.
 */
-class DeltaSqlParser(val delegateSpark: ParserInterface) extends ParserInterfaceShims {
-  private val delegate = ParserInterfaceShims(delegateSpark)
+class DeltaSqlParser(val delegate: ParserInterface) extends ParserInterface {


Needed because of apache/spark@8791767

spark/src/main/scala/org/apache/spark/sql/delta/DeltaLog.scala

hvanhovell · 2024-12-06T01:57:05Z

spark/src/main/scala/org/apache/spark/sql/delta/perf/OptimizeMetadataOnlyDeltaQuery.scala

-            isStatsOptimizable(aggExprs) => Some(fileIndex)
-      case _ => None
+    def unapply(plan: Aggregate): Option[TahoeLogFileIndex] = {
+      // GROUP BY is not supports. All AggregateExpression must be stats optimizable.


Needed because of apache/spark@d6b7334

spark/src/main/scala/org/apache/spark/sql/delta/files/TransactionalWrite.scala

hvanhovell · 2025-02-14T15:10:35Z

spark/src/main/scala/org/apache/spark/sql/delta/stats/DataSkippingStatsTracker.scala

@@ -182,7 +182,7 @@ class DeltaJobStatisticsTracker(
  override def newTaskInstance(): WriteTaskStatsTracker = {
    val rootPath = new Path(rootUri)
    val hadoopConf = srlHadoopConf.value
-    new DeltaTaskStatisticsTracker(dataCols, statsColExpr, rootPath, hadoopConf)
+    new DeltaTaskStatisticsTracker(dataCols, prepareForEval(statsColExpr), rootPath, hadoopConf)


to_json is RuntimeReplaceable now. We need replace it before we try to execute it. This was the most narrow waist I could find.

apache/spark@b4eb034

For anyone who is interested. I tried to fix this by getting the expression from the optimized plan first. This for some bizarre reason did made test fail for Decimal columns. I found that the writer was writing min/max values with a rather large values, these would later on get dropped by the DataskippingReader because they would not fit in a Decimal(3,2).

hvanhovell · 2025-02-14T15:11:41Z

spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaStreamUtils.scala

-    constructor.newInstance(
-      newIncrementalExecution,
-      ExpressionEncoder(newIncrementalExecution.analyzed.schema)).asInstanceOf[DataFrame]
+    DataFrameUtils.ofRows(newIncrementalExecution)


I am not sure why the previous code was using reflection to do this.

hvanhovell · 2025-02-28T17:45:07Z

spark/src/test/scala/org/apache/spark/sql/delta/MergeIntoDVsSuite.scala

@@ -227,7 +227,9 @@ class MergeIntoDVsSuite extends MergeIntoDVsTests {
          tableHasDVs = true,
          targetDf = sourceDF.as("s").join(targetDFWithMetadata.as("t"), condition),
          candidateFiles = corruptedFiles,
-          condition = condition.expr
+          condition = condition.expr,
+          fileNameColumnOpt = Option(col("s._metadata.file_name")),


To the reviewer. I think this is correct, but I am not 100% sure.

For context: both side of the join have a matching metadata field now. Using _metadata.file_name or _metadata.row_index will fail now with an AMBIGUOUS_REFERENCE exception.

hvanhovell · 2025-02-28T17:46:13Z

spark/src/test/scala/org/apache/spark/sql/delta/SchemaValidationSuite.scala

-    val clonedSession = cloneMethod.invoke(spark).asInstanceOf[SparkSession]
-    clonedSession
-  }
+  def cloneSession(spark: SparkSession): SparkSession = spark.cloneSession()


I am not sure why we need these gymnastics. As long as the test is defined in org.apache.spark.. you can call SparkSession.cloneSession() directly.

hvanhovell · 2025-02-28T17:47:12Z

spark/src/test/scala/org/apache/spark/sql/delta/ShowDeltaTableColumnsSuite.scala

+
+  private def checkShowColumns(schema1: String, schema2: String, e: AnalysisException): Unit = {
+    val expectedMessage = Seq(
+      s"SHOW COLUMNS with conflicting databases: '$schema1' != '$schema2'",  // SPARK-3.5


I didn't want to add shims for this. It is just an error message, that in both cases - IMO - is perfectly understandable.

hvanhovell · 2025-02-28T17:48:30Z

spark/src/test/scala/org/apache/spark/sql/delta/test/DeltaHiveTest.scala

@@ -46,7 +46,7 @@ trait DeltaHiveTest extends SparkFunSuite with BeforeAndAfterAll { self: DeltaSQ
    _sc = new SparkContext("local", this.getClass.getName, conf)
    _hiveContext = new TestHiveContext(_sc)
    _session = _hiveContext.sparkSession
-    SparkSession.setActiveSession(_session)
+    setActiveSession(_session)


The use of relocated here is a bit weird. We should be able to set the session.

hvanhovell · 2025-02-28T17:49:49Z

spark/src/test/scala/org/apache/spark/sql/delta/DeltaErrorsSuite.scala

-          DeltaErrors.multipleSourceRowMatchingTargetRowInMergeException(spark)
-        assert(exceptionWithoutContext.getMessage.contains("https") === false)
-      }
+      val newSession = spark.newSession()


Create a new session so we don't have to clean-up.

hvanhovell · 2025-02-28T17:50:40Z

spark/src/test/scala-spark-master/shims/logging/DeltaStructuredLoggingSuite.scala

 class DeltaStructuredLoggingSuite extends DeltaStructuredLoggingSuiteBase {
  override def className: String = classOf[DeltaStructuredLoggingSuite].getSimpleName
  override def logFilePath: String = "target/structured.log"

+  override def beforeAll(): Unit = {
+    super.beforeAll()
+    Logging.enableStructuredLogging()


Structured logging is not enabled by default (anymore), so we have to enable it.

hvanhovell · 2025-02-28T19:10:40Z

build.sbt

@@ -200,7 +200,6 @@ def crossSparkSettings(): Seq[Setting[_]] = getSparkVersion() match {
    scalaVersion := scala213,
    crossScalaVersions := Seq(scala213),
    targetJvm := "17",
-    resolvers += "Spark master staging" at "https://repository.apache.org/content/groups/snapshots/",


Not needed anymore. Removing it to avoid confusion.

hvanhovell · 2025-02-28T19:11:07Z

spark/src/test/scala/org/apache/spark/sql/delta/DeltaAlterTableTests.scala

+          ex2,
+          "Missing field V2",
+          "Couldn't resolve positional argument AFTER V2",
+          "Renaming column is not supported in Hive-style ALTER COLUMN, " +


I am not sure if this is expected.

spark-connect/client/src/test/scala-spark-master/io/delta/connect/tables/DeltaTableSuite.scala

...-connect/client/src/test/scala-spark-master/io/delta/connect/tables/RemoteSparkSession.scala

hvanhovell added 10 commits December 2, 2024 16:36

SPARK-49025: Make Column implementation agnostic

29c81a8

Merge remote-tracking branch 'delta/master' into SPARK-49025

7769b02

Use shims to make cross compilation work.

3cd4522

Fix tests.

1e42d57

Merge remote-tracking branch 'delta/master' into SPARK-49025

96b744e

Fix DataSkippingPredicateBuilder

5f4b78e

Merge remote-tracking branch 'delta/master' into SPARK-49025

a790e70

CI

ebfc764

Fix Master

46cbac3

Fix missing error

446b11d

hvanhovell commented Dec 6, 2024

View reviewed changes

spark/src/main/scala/org/apache/spark/sql/delta/DeltaLog.scala Outdated Show resolved Hide resolved

hvanhovell commented Dec 6, 2024

View reviewed changes

hvanhovell added 7 commits December 5, 2024 22:05

Actual bug

ecaac64

Fix build

0342f3c

Attempt to fix errors

e3f20ca

No style

e668428

...

de70e3d

Merge remote-tracking branch 'delta/master' into fix_mstr

3a1cb9e

toJson codegen fix

732ab1c

hvanhovell commented Dec 10, 2024

View reviewed changes

spark/src/main/scala/org/apache/spark/sql/delta/files/TransactionalWrite.scala Outdated Show resolved Hide resolved

hvanhovell added 8 commits December 11, 2024 11:42

Attempt to fix errors

53af65a

Merge remote-tracking branch 'delta/master' into fix_mstr

bbecbee

Merge remote-tracking branch 'delta/master' into fix_mstr

4201898

Merge remote-tracking branch 'delta/master' into fix_mstr

cef032a

Use optimizedPlan instead

1baf1f5

Merge remote-tracking branch 'delta/master' into fix_mstr

246e186

Fix Scala Refactor Errors

512ff70

Fix Scala Refactor Errors

99c37f0

Fix stats collection

11e4824

hvanhovell commented Feb 14, 2025

View reviewed changes

hvanhovell added 6 commits February 14, 2025 17:08

Fix tests

269c8ca

SPARK-51356

c6262ed

Merge remote-tracking branch 'delta/master' into fix_mstr

2713335

Structured Logging was disabled by default

fcc5577

Changed error message

7bfa849

Fix column resolution

4cb463e

hvanhovell commented Feb 28, 2025

View reviewed changes

hvanhovell added 4 commits February 28, 2025 14:06

Fix delta connect plugin

cc30948

Merge remote-tracking branch 'delta/master' into fix_mstr

f7f68c4

Fix DeltaAlterTableTests

3929e67

Add apache snapshots to repositories

410b919

hvanhovell commented Feb 28, 2025

View reviewed changes

hvanhovell added 3 commits February 28, 2025 20:34

Fix connect server compilation

ca734a3

Fix connect test compilation. Disable them for now.

1441c43

Merge remote-tracking branch 'delta/master' into fix_mstr

aff0892

hvanhovell commented Mar 2, 2025

View reviewed changes

spark-connect/client/src/test/scala-spark-master/io/delta/connect/tables/DeltaTableSuite.scala Outdated Show resolved Hide resolved

vicennial reviewed Mar 3, 2025

View reviewed changes

spark-connect/client/src/test/scala-spark-master/io/delta/connect/tables/DeltaTableSuite.scala Outdated Show resolved Hide resolved

...-connect/client/src/test/scala-spark-master/io/delta/connect/tables/RemoteSparkSession.scala Outdated Show resolved Hide resolved

raveeram-db requested a review from scottsand-db March 3, 2025 17:38

hvanhovell added 3 commits March 3, 2025 14:53

Merge remote-tracking branch 'delta/master' into fix_mstr

2260fcc

Undo unneeded connect client changes

d15f30d

Fix import

dc43bf6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Spark Master build #3928

Fix Spark Master build #3928

hvanhovell commented Dec 5, 2024 •

edited

Loading

hvanhovell Dec 6, 2024

hvanhovell Dec 6, 2024

hvanhovell Dec 6, 2024

hvanhovell Feb 14, 2025

hvanhovell Feb 14, 2025

hvanhovell Feb 14, 2025

hvanhovell Feb 14, 2025

hvanhovell Feb 28, 2025

hvanhovell Mar 1, 2025

hvanhovell Feb 28, 2025

hvanhovell Feb 28, 2025

hvanhovell Feb 28, 2025

hvanhovell Feb 28, 2025

hvanhovell Feb 28, 2025

hvanhovell Feb 28, 2025

hvanhovell Feb 28, 2025

Fix Spark Master build #3928

Are you sure you want to change the base?

Fix Spark Master build #3928

Conversation

hvanhovell commented Dec 5, 2024 • edited Loading

Which Delta project/connector is this regarding?

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hvanhovell commented Dec 5, 2024 •

edited

Loading