Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49775][SQL][TESTS] Make tests of INVALID_PARAMETER_VALUE.CHARSET deterministic #48235

Closed

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Sep 25, 2024

What changes were proposed in this pull request?

Make tests of INVALID_PARAMETER_VALUE.CHARSET deterministic

Why are the changes needed?

VALID_CHARSETS is a Set, so VALID_CHARSETS.mkString(", ") is non-deterministic, and cause failures in different testing environments, e.g.

        org.scalatest.exceptions.TestFailedException: ansi/string-functions.sql
Expected "...sets" : "UTF-16LE, U[TF-8, UTF-32, UTF-16BE, UTF-16, US-ASCII, ISO-8859-1]",
    "functionName...", but got "...sets" : "UTF-16LE, U[S-ASCII, ISO-8859-1, UTF-8, UTF-32, UTF-16BE, UTF-16]",
    "functionName..." Result did not match for query #93
select encode('hello', 'WINDOWS-1252')
      at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
      at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
      at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
      at org.scalatest.Assertions.assertResult(Assertions.scala:847)
      at org.scalatest.Assertions.assertResult$(Assertions.scala:842)
      at org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564)

Does this PR introduce any user-facing change?

No, test only

How was this patch tested?

updated tests

Was this patch authored or co-authored using generative AI tooling?

no

@HyukjinKwon
Copy link
Member

Merged to master.

@zhengruifeng zhengruifeng deleted the sql_test_sort_charset branch September 25, 2024 10:35
@@ -24,7 +24,7 @@
private[sql] object CharsetProvider {

final lazy val VALID_CHARSETS =
Set("us-ascii", "iso-8859-1", "utf-8", "utf-16be", "utf-16le", "utf-16", "utf-32")
Array("us-ascii", "iso-8859-1", "utf-8", "utf-16be", "utf-16le", "utf-16", "utf-32").sorted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more performant to use the contains of set than array?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's happening once anyway so shouldn't be a biggie. invalidCharsetError happens for every call

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charset can be a column value,this could be expensive?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, in that case, we should probably keep this as a set, and just create another variable sorted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here: #48245

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

dongjoon-hyun pushed a commit that referenced this pull request Sep 25, 2024
…ting

### What changes were proposed in this pull request?

This PR is a followup of #48235 that addresses #48235 (comment) comment.

### Why are the changes needed?

For better performance (in theory)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests should verify them

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48245 from HyukjinKwon/SPARK-49775-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…SET` deterministic

### What changes were proposed in this pull request?
Make tests of `INVALID_PARAMETER_VALUE.CHARSET` deterministic

### Why are the changes needed?
`VALID_CHARSETS` is a Set, so `VALID_CHARSETS.mkString(", ")` is non-deterministic, and cause failures in different testing environments, e.g.
```
        org.scalatest.exceptions.TestFailedException: ansi/string-functions.sql
Expected "...sets" : "UTF-16LE, U[TF-8, UTF-32, UTF-16BE, UTF-16, US-ASCII, ISO-8859-1]",
    "functionName...", but got "...sets" : "UTF-16LE, U[S-ASCII, ISO-8859-1, UTF-8, UTF-32, UTF-16BE, UTF-16]",
    "functionName..." Result did not match for query apache#93
select encode('hello', 'WINDOWS-1252')
      at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
      at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
      at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
      at org.scalatest.Assertions.assertResult(Assertions.scala:847)
      at org.scalatest.Assertions.assertResult$(Assertions.scala:842)
      at org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564)

```

### Does this PR introduce _any_ user-facing change?
No, test only

### How was this patch tested?
updated tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#48235 from zhengruifeng/sql_test_sort_charset.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…ting

### What changes were proposed in this pull request?

This PR is a followup of apache#48235 that addresses apache#48235 (comment) comment.

### Why are the changes needed?

For better performance (in theory)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests should verify them

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#48245 from HyukjinKwon/SPARK-49775-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…SET` deterministic

### What changes were proposed in this pull request?
Make tests of `INVALID_PARAMETER_VALUE.CHARSET` deterministic

### Why are the changes needed?
`VALID_CHARSETS` is a Set, so `VALID_CHARSETS.mkString(", ")` is non-deterministic, and cause failures in different testing environments, e.g.
```
        org.scalatest.exceptions.TestFailedException: ansi/string-functions.sql
Expected "...sets" : "UTF-16LE, U[TF-8, UTF-32, UTF-16BE, UTF-16, US-ASCII, ISO-8859-1]",
    "functionName...", but got "...sets" : "UTF-16LE, U[S-ASCII, ISO-8859-1, UTF-8, UTF-32, UTF-16BE, UTF-16]",
    "functionName..." Result did not match for query apache#93
select encode('hello', 'WINDOWS-1252')
      at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
      at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
      at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
      at org.scalatest.Assertions.assertResult(Assertions.scala:847)
      at org.scalatest.Assertions.assertResult$(Assertions.scala:842)
      at org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564)

```

### Does this PR introduce _any_ user-facing change?
No, test only

### How was this patch tested?
updated tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#48235 from zhengruifeng/sql_test_sort_charset.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…ting

### What changes were proposed in this pull request?

This PR is a followup of apache#48235 that addresses apache#48235 (comment) comment.

### Why are the changes needed?

For better performance (in theory)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests should verify them

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#48245 from HyukjinKwon/SPARK-49775-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants