Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md #29608

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!-- Automatically generated by ExpressionsSchemaSuite -->
## Summary
- Number of queries: 338
- Number of queries: 339
- Number of expressions that missing example: 34
- Expressions missing examples: and,string,tinyint,double,smallint,date,decimal,boolean,float,binary,bigint,int,timestamp,struct,cume_dist,dense_rank,input_file_block_length,input_file_block_start,input_file_name,lag,lead,monotonically_increasing_id,ntile,!,not,or,percent_rank,rank,row_number,spark_partition_id,version,window,positive,count_min_sketch
## Schema of Built-in Functions
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {

val outputSize = outputs.size
val headerSize = header.size
val expectedOutputs: Seq[QueryOutput] = {
val (expectedMissExamples, expectedOutputs): (Array[String], Seq[QueryOutput]) = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big deal, but the types can just be added to the two tuple elements, instead of declaring them separately as a type for the whole tuple

Copy link
Contributor Author

@LuciferYang LuciferYang Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First changed to val (expectedMissingExamples: Array[String], expectedOutputs: Seq[QueryOutput]), but I feel the type declaration here is redundant, so changed to val (expectedMissingExamples, expectedOutputs). Is this acceptable?

val expectedGoldenOutput = fileToString(resultFile)
val lines = expectedGoldenOutput.split("\n")
val expectedSize = lines.size
Expand All @@ -161,14 +161,28 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
s"Expected $expectedSize blocks in result file but got " +
s"${outputSize + headerSize}. Try regenerate the result files.")

Seq.tabulate(outputSize) { i =>
val numberOfQueries = lines(2).split(":")(1).trim.toInt
val numberOfMissExample = lines(3).split(":")(1).trim.toInt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

miss -> missed or missing, in most cases in this change

Copy link
Contributor Author

@LuciferYang LuciferYang Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done ~

val missExamples = lines(4).split(":")(1).trim.split(",")
val expectedOutputs = Seq.tabulate(outputSize) { i =>
val segments = lines(i + headerSize).split('|')
QueryOutput(
className = segments(1).trim,
funcName = segments(2).trim,
sql = segments(3).trim,
schema = segments(4).trim)
}

// Ensure consistency of the result file.
assert(numberOfQueries == expectedOutputs.size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think should put these assert on line 163

Copy link
Contributor

@beliefer beliefer Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numberOfQueries == outputSize
Ah, I got it. But if a new function here, expectedOutputs.size must not equal to numberOfQueries.
expectedOutputs.size == outputSize in fact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beliefer This assert place here to verify the consistency of the file contents to avoid inconsistency caused by manual modification, and I think line 189 assert expectedOutputs.size == outputSize achieves the same goal as numberOfQueries == outputSize. Is this acceptable?

Copy link
Contributor

@beliefer beliefer Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can try to mock a new function with comments and test this suite.

Copy link
Contributor Author

@LuciferYang LuciferYang Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can try to mock a new function with comments and test this suite.

You are right, Spark-24884 already trigger this problem cause by incomplete manual modification.

With the new function add scene, I think the right way is run this case with SPARK_GENERATE_GOLDEN_FILES = 1 to automatically regenerate the correct sql-expression-schema.md becasuse sql-expression-schema.md header said Automatically generated by ExpressionsSchemaSuite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, if the file is manually modified instead of automatically generated, I think the assertion failure caused by incorrect modification should be expected.

Do we need to tell users clearly with Try regenerating the result files with sys env SPARK_GENERATE_GOLDEN_FILES = 1?

Copy link
Contributor

@beliefer beliefer Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it. Thanks! Could you put these assert on line 163, so that it looks clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address 6e6489b reorder the assertions.

s"outputs size: ${expectedOutputs.size} not same as numberOfQueries: $numberOfQueries " +
"record in result file. Try regenerate the result files.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regenerate -> regenerating

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok ~

Copy link
Contributor Author

@LuciferYang LuciferYang Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done ~

assert(numberOfMissExample == missExamples.size,
s"miss examples size: ${missExamples.size} not same as " +
s"numberOfMissExample: $numberOfMissExample " +
"record in result file. Try regenerate the result files.")

(missExamples, expectedOutputs)
}

// Compare results.
Expand All @@ -179,5 +193,13 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
assert(expected.sql == output.sql, "SQL query did not match")
assert(expected.schema == output.schema, s"Schema did not match for query ${expected.sql}")
}

// Compare expressions missing examples
assert(expectedMissExamples.length == missingExamples.size,
"The number of missing examples not equals the number of expected missing examples.")

missingExamples.zip(expectedMissExamples).foreach { case (output, expected) =>
assert(expected == output, "Missing example expression not match")
}
}
}