Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-36970][SQL] Manual disabled format B of date_format function to make Java 17 compatible with Java 8 #34237

Closed
wants to merge 7 commits into from

Conversation

LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Oct 11, 2021

What changes were proposed in this pull request?

The date_format function with B format has different behavior when use Java 8 and Java 17, select date_format('2018-11-17 13:33:33.333', 'B') in datetime-formatting-invalid.sql can prove this.

The case result with Java 8 is

-- !query
select date_format('2018-11-17 13:33:33.333', 'B')
-- !query schema
struct<>
-- !query output
java.lang.IllegalArgumentException
Unknown pattern letter: B

and the case result with Java 17 is

- datetime-formatting-invalid.sql *** FAILED ***
  datetime-formatting-invalid.sql
  Expected "struct<[]>", but got "struct<[date_format(2018-11-17 13:33:33.333, B):string]>" Schema did not match for query #34
  select date_format('2018-11-17 13:33:33.333', 'B'): -- !query
  select date_format('2018-11-17 13:33:33.333', 'B')
  -- !query schema
  struct<date_format(2018-11-17 13:33:33.333, B):string>
  -- !query output
  in the afternoon (SQLQueryTestSuite.scala:469)

We found that this is due to the new support of format B in Java 17

'B' is used to represent Pattern letters to output a day period in Java 17

     *  Pattern  Count  Equivalent builder methods
     *  -------  -----  --------------------------
     *    B       1      appendDayPeriodText(TextStyle.SHORT)
     *    BBBB    4      appendDayPeriodText(TextStyle.FULL)
     *    BBBBB   5      appendDayPeriodText(TextStyle.NARROW)

And through http://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html , we can confirm that format B is not documented/supported for date_format function currently.

So the main change of this pr is manual disabled format B of date_format function in DateTimeFormatterHelper to make Java 17 compatible with Java 8.

Why are the changes needed?

Ensure that Java 17 and Java 8 have the same behavior.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Pass the Jenkins or GitHub Action
  • Manual test SQLQueryTestSuite with JDK 17

Before

- datetime-formatting-invalid.sql *** FAILED ***
  datetime-formatting-invalid.sql
  Expected "struct<[]>", but got "struct<[date_format(2018-11-17 13:33:33.333, B):string]>" Schema did not match for query #34
  select date_format('2018-11-17 13:33:33.333', 'B'): -- !query
  select date_format('2018-11-17 13:33:33.333', 'B')
  -- !query schema
  struct<date_format(2018-11-17 13:33:33.333, B):string>
  -- !query output
  in the afternoon (SQLQueryTestSuite.scala:469)

After
The test select date_format('2018-11-17 13:33:33.333', 'B') in datetime-formatting-invalid.sql passed

@github-actions github-actions bot added the SQL label Oct 11, 2021
@LuciferYang
Copy link
Contributor Author

cc @srowen @MaxGekk @dongjoon-hyun

@SparkQA
Copy link

SparkQA commented Oct 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48544/

@SparkQA
Copy link

SparkQA commented Oct 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48544/

@@ -280,6 +280,7 @@ private object DateTimeFormatterHelper {
// 2.4, the SimpleDateFormat uses Monday as the first day of week.
final val weekBasedLetters = Set('Y', 'W', 'w', 'u', 'e', 'c')
final val unsupportedLetters = Set('A', 'n', 'N', 'p')
final val unknownPatternLetters: Set[Char] = Set('B')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between unsupported and unknown letters from your point of view?

Copy link
Contributor Author

@LuciferYang LuciferYang Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk Both unsupportedLetters and unknownPatternLetters are used for explicitly banned some pattern letters and throw IllegalArgumentException

The only difference is the content of the error message, throw IllegalArgumentException with Unknown pattern letter: $c Is the current behavior.

-- !query
select date_format('2018-11-17 13:33:33.333', 'B')
-- !query schema
struct<>
-- !query output
java.lang.IllegalArgumentException
Unknown pattern letter: B

If we accept the change of error message content, we can reuse unsupportedLetters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we accept the change of error message content ...

I think it is ok. Let's put 'B' to unsupportedLetters. BTW, should we standardise the exception by following https://issues.apache.org/jira/browse/SPARK-33539, cc @allisonwang-db ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK ~

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thanks!

@SparkQA
Copy link

SparkQA commented Oct 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48550/

@SparkQA
Copy link

SparkQA commented Oct 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48550/

@SparkQA
Copy link

SparkQA commented Oct 11, 2021

Test build #144066 has finished for PR 34237 at commit 1bc02ad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48556/

@SparkQA
Copy link

SparkQA commented Oct 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48556/

@SparkQA
Copy link

SparkQA commented Oct 11, 2021

Test build #144072 has finished for PR 34237 at commit c6713ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SetCatalogAndNamespace(child: LogicalPlan) extends UnaryCommand
  • case class SetNamespaceCommand(namespace: Seq[String]) extends LeafRunnableCommand

@SparkQA
Copy link

SparkQA commented Oct 11, 2021

Test build #144078 has finished for PR 34237 at commit 8271b47.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @LuciferYang , @srowen , @MaxGekk .
Merged to master for Apache Spark 3.3.

@dongjoon-hyun
Copy link
Member

Oh, I'm taking my words back. It seems that there is a TPCDS test failure still.

@dongjoon-hyun
Copy link
Member

Could you re-trigger GitHub Action once more, @LuciferYang ?

@LuciferYang
Copy link
Contributor Author

Could you re-trigger GitHub Action once more, @LuciferYang ?

OK, the exit code is 137. It looks like it was killed

@LuciferYang
Copy link
Contributor Author

@dongjoon-hyun
image
TPCDS passed although other tests are still running

@MaxGekk
Copy link
Member

MaxGekk commented Oct 12, 2021

@LuciferYang Could you re-trigger GAs via an empty commit:

$ git commit --allow-empty -m "Trigger build"

@SparkQA
Copy link

SparkQA commented Oct 12, 2021

Test build #144143 has finished for PR 34237 at commit 0ad1959.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • protected class YarnSchedulerEndpoint(override val rpcEnv: RpcEnv)
  • case class HashedRelationBroadcastMode(key: Seq[Expression], isNullAware: Boolean = false)

@SparkQA
Copy link

SparkQA commented Oct 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48620/

@SparkQA
Copy link

SparkQA commented Oct 12, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48620/

@SparkQA
Copy link

SparkQA commented Oct 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48630/

@SparkQA
Copy link

SparkQA commented Oct 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48628/

@SparkQA
Copy link

SparkQA commented Oct 12, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48628/

@SparkQA
Copy link

SparkQA commented Oct 12, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48630/

@LuciferYang
Copy link
Contributor Author

Already re-trigger GAs many times,it seems that I need to investigate the reasons for the failure of hive SlowTest

@SparkQA
Copy link

SparkQA commented Oct 12, 2021

Test build #144152 has finished for PR 34237 at commit ac2e700.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member

MaxGekk commented Oct 12, 2021

+1, LGTM. Merging to master. All GAs passed
Screenshot 2021-10-12 at 20 11 56
Thank you, @LuciferYang and @srowen @dongjoon-hyun @allisonwang-db for review.

@MaxGekk MaxGekk closed this in 1af7072 Oct 12, 2021
@dongjoon-hyun
Copy link
Member

Thank you all. :)

@LuciferYang
Copy link
Contributor Author

thank all ~

@LuciferYang LuciferYang deleted the SPARK-36970 branch October 22, 2023 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants