-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32762][SQL][TEST] Enhance the verification of ExpressionsSchemaSuite to sql-expression-schema.md #29608
Conversation
ok to test |
Test build #128142 has finished for PR 29608 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable
@@ -152,7 +152,7 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession { | |||
|
|||
val outputSize = outputs.size | |||
val headerSize = header.size | |||
val expectedOutputs: Seq[QueryOutput] = { | |||
val (expectedMissExamples, expectedOutputs): (Array[String], Seq[QueryOutput]) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No big deal, but the types can just be added to the two tuple elements, instead of declaring them separately as a type for the whole tuple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First changed to val (expectedMissingExamples: Array[String], expectedOutputs: Seq[QueryOutput])
, but I feel the type declaration here is redundant, so changed to val (expectedMissingExamples, expectedOutputs)
. Is this acceptable?
@@ -161,14 +161,28 @@ class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession { | |||
s"Expected $expectedSize blocks in result file but got " + | |||
s"${outputSize + headerSize}. Try regenerate the result files.") | |||
|
|||
Seq.tabulate(outputSize) { i => | |||
val numberOfQueries = lines(2).split(":")(1).trim.toInt | |||
val numberOfMissExample = lines(3).split(":")(1).trim.toInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
miss -> missed or missing, in most cases in this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done ~
// Ensure consistency of the result file. | ||
assert(numberOfQueries == expectedOutputs.size, | ||
s"outputs size: ${expectedOutputs.size} not same as numberOfQueries: $numberOfQueries " + | ||
"record in result file. Try regenerate the result files.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regenerate -> regenerating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done ~
Test build #128176 has finished for PR 29608 at commit
|
Test build #128178 has finished for PR 29608 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay cc: @cloud-fan @beliefer
|
||
Seq.tabulate(outputSize) { i => | ||
val numberOfQueries = lines(2).split(":")(1).trim.toInt | ||
val numberOfMissingExample = lines(3).split(":")(1).trim.toInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Example
-> Examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done ~
The failed check cause by |
val segments = lines(i + headerSize).split('|') | ||
QueryOutput( | ||
className = segments(1).trim, | ||
funcName = segments(2).trim, | ||
sql = segments(3).trim, | ||
schema = segments(4).trim) | ||
} | ||
|
||
// Ensure consistency of the result file. | ||
assert(numberOfQueries == expectedOutputs.size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think should put these assert on line 163
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numberOfQueries == outputSize
Ah, I got it. But if a new function here, expectedOutputs.size
must not equal to numberOfQueries
.
expectedOutputs.size == outputSize
in fact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@beliefer This assert place here to verify the consistency of the file contents to avoid inconsistency caused by manual modification, and I think line 189 assert expectedOutputs.size == outputSize
achieves the same goal as numberOfQueries == outputSize
. Is this acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can try to mock a new function with comments and test this suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can try to mock a new function with comments and test this suite.
You are right, Spark-24884 already trigger this problem cause by incomplete manual modification.
With the new function add scene, I think the right way is run this case with SPARK_GENERATE_GOLDEN_FILES = 1
to automatically regenerate the correct sql-expression-schema.md
becasuse sql-expression-schema.md
header said Automatically generated by ExpressionsSchemaSuite
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore, if the file is manually modified instead of automatically generated, I think the assertion failure caused by incorrect modification should be expected.
Do we need to tell users clearly with Try regenerating the result files with sys env SPARK_GENERATE_GOLDEN_FILES = 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it. Thanks! Could you put these assert on line 163, so that it looks clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address 6e6489b reorder the assertions.
val segments = lines(i + headerSize).split('|') | ||
QueryOutput( | ||
className = segments(1).trim, | ||
funcName = segments(2).trim, | ||
sql = segments(3).trim, | ||
schema = segments(4).trim) | ||
} | ||
|
||
// Ensure consistency of the result file. | ||
assert(numberOfQueries == expectedOutputs.size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numberOfQueries == outputSize
Ah, I got it. But if a new function here, expectedOutputs.size
must not equal to numberOfQueries
.
expectedOutputs.size == outputSize
in fact.
Seq.tabulate(outputSize) { i => | ||
val numberOfQueries = lines(2).split(":")(1).trim.toInt | ||
val numberOfMissingExamples = lines(3).split(":")(1).trim.toInt | ||
val missingExamples = lines(4).split(":")(1).trim.split(",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expectedMissingExamples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done ~
s"numberOfMissingExamples: $numberOfMissingExamples " + | ||
"record in result file. Try regenerating the result files.") | ||
|
||
(missingExamples, expectedOutputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done ~
Good catch! Except for some minor issues. |
Test build #128221 has finished for PR 29608 at commit
|
Test build #128236 has finished for PR 29608 at commit
|
Test build #128222 has finished for PR 29608 at commit
|
LGTM! |
Test build #128249 has finished for PR 29608 at commit
|
@maropu |
All the tests in GitHub Actions passed, so this PR looks fine. |
Thx ~ @maropu |
Thanks! Merged to master. |
What changes were proposed in this pull request?
sql-expression-schema.md
automatically generated byExpressionsSchemaSuite
, but only expressions entries are checked inExpressionsSchemaSuite
. So if we manually modify the contents of the file,ExpressionsSchemaSuite
does not necessarily guarantee the correctness of the it some times. For example, Spark-24884 addedregexp_extract_all
expression support, and manually modify thesql-expression-schema.md
but not change the content ofNumber of queries
cause file content inconsistency.Some additional checks have been added to
ExpressionsSchemaSuite
to improve the correctness guarantee ofsql-expression-schema.md
as follow:Number of queries
should equals size ofexpressions entries
insql-expression-schema.md
Number of expressions that missing example
should equals size ofExpressions missing examples
insql-expression-schema.md
MissExamples
from case should same asexpectedMissingExamples
fromsql-expression-schema.md
Why are the changes needed?
Ensure the correctness of
sql-expression-schema.md
content.Does this PR introduce any user-facing change?
No
How was this patch tested?
Enhanced ExpressionsSchemaSuite