-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49775][SQL][TESTS] Make tests of INVALID_PARAMETER_VALUE.CHARSET
deterministic
#48235
Conversation
b956a6c
to
348dc51
Compare
Merged to master. |
@@ -24,7 +24,7 @@ | |||
private[sql] object CharsetProvider { | |||
|
|||
final lazy val VALID_CHARSETS = | |||
Set("us-ascii", "iso-8859-1", "utf-8", "utf-16be", "utf-16le", "utf-16", "utf-32") | |||
Array("us-ascii", "iso-8859-1", "utf-8", "utf-16be", "utf-16le", "utf-16", "utf-32").sorted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it more performant to use the contains of set than array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's happening once anyway so shouldn't be a biggie. invalidCharsetError
happens for every call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
charset can be a column value,this could be expensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, in that case, we should probably keep this as a set, and just create another variable sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.scala-lang.org/api/2.12.5/scala/collection/SortedSet$.html maybe we can use a SortedSet here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here: #48245
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
…ting ### What changes were proposed in this pull request? This PR is a followup of #48235 that addresses #48235 (comment) comment. ### Why are the changes needed? For better performance (in theory) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests should verify them ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48245 from HyukjinKwon/SPARK-49775-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…SET` deterministic ### What changes were proposed in this pull request? Make tests of `INVALID_PARAMETER_VALUE.CHARSET` deterministic ### Why are the changes needed? `VALID_CHARSETS` is a Set, so `VALID_CHARSETS.mkString(", ")` is non-deterministic, and cause failures in different testing environments, e.g. ``` org.scalatest.exceptions.TestFailedException: ansi/string-functions.sql Expected "...sets" : "UTF-16LE, U[TF-8, UTF-32, UTF-16BE, UTF-16, US-ASCII, ISO-8859-1]", "functionName...", but got "...sets" : "UTF-16LE, U[S-ASCII, ISO-8859-1, UTF-8, UTF-32, UTF-16BE, UTF-16]", "functionName..." Result did not match for query apache#93 select encode('hello', 'WINDOWS-1252') at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564) at org.scalatest.Assertions.assertResult(Assertions.scala:847) at org.scalatest.Assertions.assertResult$(Assertions.scala:842) at org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564) ``` ### Does this PR introduce _any_ user-facing change? No, test only ### How was this patch tested? updated tests ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#48235 from zhengruifeng/sql_test_sort_charset. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…ting ### What changes were proposed in this pull request? This PR is a followup of apache#48235 that addresses apache#48235 (comment) comment. ### Why are the changes needed? For better performance (in theory) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests should verify them ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48245 from HyukjinKwon/SPARK-49775-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…SET` deterministic ### What changes were proposed in this pull request? Make tests of `INVALID_PARAMETER_VALUE.CHARSET` deterministic ### Why are the changes needed? `VALID_CHARSETS` is a Set, so `VALID_CHARSETS.mkString(", ")` is non-deterministic, and cause failures in different testing environments, e.g. ``` org.scalatest.exceptions.TestFailedException: ansi/string-functions.sql Expected "...sets" : "UTF-16LE, U[TF-8, UTF-32, UTF-16BE, UTF-16, US-ASCII, ISO-8859-1]", "functionName...", but got "...sets" : "UTF-16LE, U[S-ASCII, ISO-8859-1, UTF-8, UTF-32, UTF-16BE, UTF-16]", "functionName..." Result did not match for query apache#93 select encode('hello', 'WINDOWS-1252') at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564) at org.scalatest.Assertions.assertResult(Assertions.scala:847) at org.scalatest.Assertions.assertResult$(Assertions.scala:842) at org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564) ``` ### Does this PR introduce _any_ user-facing change? No, test only ### How was this patch tested? updated tests ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#48235 from zhengruifeng/sql_test_sort_charset. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…ting ### What changes were proposed in this pull request? This PR is a followup of apache#48235 that addresses apache#48235 (comment) comment. ### Why are the changes needed? For better performance (in theory) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests should verify them ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48245 from HyukjinKwon/SPARK-49775-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Make tests of
INVALID_PARAMETER_VALUE.CHARSET
deterministicWhy are the changes needed?
VALID_CHARSETS
is a Set, soVALID_CHARSETS.mkString(", ")
is non-deterministic, and cause failures in different testing environments, e.g.Does this PR introduce any user-facing change?
No, test only
How was this patch tested?
updated tests
Was this patch authored or co-authored using generative AI tooling?
no