[SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md #29608

LuciferYang · 2020-09-01T05:38:19Z

What changes were proposed in this pull request?

sql-expression-schema.md automatically generated by ExpressionsSchemaSuite, but only expressions entries are checked in ExpressionsSchemaSuite. So if we manually modify the contents of the file, ExpressionsSchemaSuite does not necessarily guarantee the correctness of the it some times. For example, Spark-24884 added regexp_extract_all expression support, and manually modify the sql-expression-schema.md but not change the content of Number of queries cause file content inconsistency.

Some additional checks have been added to ExpressionsSchemaSuite to improve the correctness guarantee of sql-expression-schema.md as follow:

Number of queries should equals size of expressions entries in sql-expression-schema.md
Number of expressions that missing example should equals size of Expressions missing examples in sql-expression-schema.md
MissExamples from case should same as expectedMissingExamples from sql-expression-schema.md

Why are the changes needed?

Ensure the correctness of sql-expression-schema.md content.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Enhanced ExpressionsSchemaSuite

maropu · 2020-09-01T08:29:07Z

ok to test

SparkQA · 2020-09-01T13:18:27Z

Test build #128142 has finished for PR 29608 at commit 6d36603.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

Seems reasonable

srowen · 2020-09-01T13:50:09Z

sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala

@@ -152,7 +152,7 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {

    val outputSize = outputs.size
    val headerSize = header.size
-    val expectedOutputs: Seq[QueryOutput] = {
+    val (expectedMissExamples, expectedOutputs): (Array[String], Seq[QueryOutput]) = {


No big deal, but the types can just be added to the two tuple elements, instead of declaring them separately as a type for the whole tuple

First changed to val (expectedMissingExamples: Array[String], expectedOutputs: Seq[QueryOutput]), but I feel the type declaration here is redundant, so changed to val (expectedMissingExamples, expectedOutputs). Is this acceptable？

srowen · 2020-09-01T13:50:32Z

sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala

@@ -161,14 +161,28 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
        s"Expected $expectedSize blocks in result file but got " +
          s"${outputSize + headerSize}. Try regenerate the result files.")

-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissExample = lines(3).split(":")(1).trim.toInt


miss -> missed or missing, in most cases in this change

srowen · 2020-09-01T13:50:51Z

sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala

+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,
+        s"outputs size: ${expectedOutputs.size} not same as numberOfQueries: $numberOfQueries " +
+          "record in result file. Try regenerate the result files.")


regenerate -> regenerating

LuciferYang · 2020-09-02T02:57:09Z

@srowen 5c8e29e and bf0ceea the commets

SparkQA · 2020-09-02T07:05:01Z

Test build #128176 has finished for PR 29608 at commit 5c8e29e.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-09-02T07:05:02Z

Test build #128178 has finished for PR 29608 at commit bf0ceea.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

maropu

Looks okay cc: @cloud-fan @beliefer

maropu · 2020-09-02T23:26:14Z

sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala


-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExample = lines(3).split(":")(1).trim.toInt


nit: Example -> Examples

LuciferYang · 2020-09-03T03:28:41Z

The failed check cause by Install R linter dependencies and SparkR

beliefer · 2020-09-03T06:14:52Z

sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala

        val segments = lines(i + headerSize).split('|')
        QueryOutput(
          className = segments(1).trim,
          funcName = segments(2).trim,
          sql = segments(3).trim,
          schema = segments(4).trim)
      }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,


I think should put these assert on line 163

~~numberOfQueries == outputSize~~
Ah, I got it. But if a new function here, expectedOutputs.size must not equal to numberOfQueries.
expectedOutputs.size == outputSize in fact.

@beliefer This assert place here to verify the consistency of the file contents to avoid inconsistency caused by manual modification, and I think line 189 assert expectedOutputs.size == outputSize achieves the same goal as numberOfQueries == outputSize. Is this acceptable?

You can try to mock a new function with comments and test this suite.

You can try to mock a new function with comments and test this suite.

You are right, Spark-24884 already trigger this problem cause by incomplete manual modification.

With the new function add scene, I think the right way is run this case with SPARK_GENERATE_GOLDEN_FILES = 1 to automatically regenerate the correct sql-expression-schema.md becasuse sql-expression-schema.md header said Automatically generated by ExpressionsSchemaSuite.

Therefore, if the file is manually modified instead of automatically generated, I think the assertion failure caused by incorrect modification should be expected.

Do we need to tell users clearly with Try regenerating the result files with sys env SPARK_GENERATE_GOLDEN_FILES = 1?

I got it. Thanks! Could you put these assert on line 163, so that it looks clear

Address 6e6489b reorder the assertions.

beliefer · 2020-09-03T06:17:51Z

sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala

        val segments = lines(i + headerSize).split('|')
        QueryOutput(
          className = segments(1).trim,
          funcName = segments(2).trim,
          sql = segments(3).trim,
          schema = segments(4).trim)
      }
+
+      // Ensure consistency of the result file.
+      assert(numberOfQueries == expectedOutputs.size,


~~numberOfQueries == outputSize~~
Ah, I got it. But if a new function here, expectedOutputs.size must not equal to numberOfQueries.
expectedOutputs.size == outputSize in fact.

beliefer · 2020-09-03T06:21:29Z

sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala

-      Seq.tabulate(outputSize) { i =>
+      val numberOfQueries = lines(2).split(":")(1).trim.toInt
+      val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt
+      val missingExamples = lines(4).split(":")(1).trim.split(",")


expectedMissingExamples

beliefer · 2020-09-03T06:21:42Z

sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala

+          s"numberOfMissingExamples: $numberOfMissingExamples " +
+          "record in result file. Try regenerating the result files.")
+
+      (missingExamples, expectedOutputs)


beliefer · 2020-09-03T06:23:03Z

Good catch! Except for some minor issues.

SparkQA · 2020-09-03T07:05:02Z

Test build #128221 has finished for PR 29608 at commit cbb0fcd.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-09-03T07:05:02Z

Test build #128236 has finished for PR 29608 at commit 5a574c3.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-09-03T07:05:02Z

Test build #128222 has finished for PR 29608 at commit 85299b1.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

beliefer · 2020-09-03T10:31:31Z

LGTM!

LuciferYang · 2020-09-03T10:44:52Z

Thx for your review @maropu @srowen @beliefer ~

SparkQA · 2020-09-03T13:41:08Z

Test build #128249 has finished for PR 29608 at commit 6e6489b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

LuciferYang · 2020-09-03T14:01:08Z

@maropu org.apache.spark.sql.hive.thriftserver.CliSuite.* failed... I think It doesn't seem to be caused by this pr, can you help trigger retest?

maropu · 2020-09-03T14:26:30Z

@maropu org.apache.spark.sql.hive.thriftserver.CliSuite.* failed... I think It doesn't seem to be caused by this pr, can you help trigger retest?

All the tests in GitHub Actions passed, so this PR looks fine.

LuciferYang · 2020-09-03T15:18:16Z

Thx ~ @maropu

maropu · 2020-09-04T00:43:55Z

Thanks! Merged to master.

Guarantee sql-expression-schema.md content correctness

6d36603

probot-autolabeler bot added the SQL label Sep 1, 2020

srowen reviewed Sep 1, 2020

View reviewed changes

fix comments

5c8e29e

fix comments-2

bf0ceea

maropu approved these changes Sep 2, 2020

View reviewed changes

yangjie added 2 commits September 3, 2020 10:42

fix maropu's comments

cbb0fcd

fix maropu's comments

85299b1

beliefer reviewed Sep 3, 2020

View reviewed changes

fix expectedMissingExamples

5a574c3

reorder assertion

6e6489b

maropu closed this in 1de272f Sep 4, 2020

LuciferYang deleted the sql-expression-schema branch June 6, 2022 03:44

[SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md #29608

[SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md #29608

Conversation

LuciferYang commented Sep 1, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

maropu commented Sep 1, 2020

SparkQA commented Sep 1, 2020

srowen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LuciferYang Sep 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LuciferYang Sep 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LuciferYang Sep 2, 2020 • edited Loading

Choose a reason for hiding this comment

LuciferYang commented Sep 2, 2020 • edited Loading

SparkQA commented Sep 2, 2020

SparkQA commented Sep 2, 2020

maropu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LuciferYang commented Sep 3, 2020

Choose a reason for hiding this comment

beliefer Sep 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beliefer Sep 3, 2020 • edited Loading

Choose a reason for hiding this comment

LuciferYang Sep 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beliefer Sep 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beliefer Sep 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beliefer commented Sep 3, 2020

SparkQA commented Sep 3, 2020

SparkQA commented Sep 3, 2020

SparkQA commented Sep 3, 2020

beliefer commented Sep 3, 2020

LuciferYang commented Sep 3, 2020

SparkQA commented Sep 3, 2020

LuciferYang commented Sep 3, 2020

maropu commented Sep 3, 2020

LuciferYang commented Sep 3, 2020

maropu commented Sep 4, 2020

LuciferYang commented Sep 1, 2020 •

edited

Loading

LuciferYang Sep 2, 2020 •

edited

Loading

LuciferYang Sep 2, 2020 •

edited

Loading

LuciferYang Sep 2, 2020 •

edited

Loading

LuciferYang commented Sep 2, 2020 •

edited

Loading

beliefer Sep 3, 2020 •

edited

Loading

beliefer Sep 3, 2020 •

edited

Loading

LuciferYang Sep 3, 2020 •

edited

Loading

beliefer Sep 3, 2020 •

edited

Loading

beliefer Sep 3, 2020 •

edited

Loading