Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8420][SQL] Fix comparision of timestamps/dates with strings #6888

Closed
wants to merge 6 commits into from

Conversation

marmbrus
Copy link
Contributor

In earlier versions of Spark SQL we casted TimestampType and DataType to StringType when it was involved in a binary comparison with a StringType. This allowed comparing a timestamp with a partial date as a user would expect.

  • time > "2014-06-10"
  • time > "2014"

In 1.4.0 we tried to cast the String instead into a Timestamp. However, since partial dates are not a valid complete timestamp this results in null which results in the tuple being filtered.

This PR restores the earlier behavior. Note that we still special case equality so that these comparisons are not affected by not printing zeros for subsecond precision.

marmbrus referenced this pull request Jun 18, 2015
If you have a table with TIMESTAMP column, that column can't be used in WHERE clause properly - it is not evaluated properly. [More](https://issues.apache.org/jira/browse/SPARK-3173)

Motivation: http://www.aproint.com/aggregation-with-spark-sql/

- [x] modify SqlParser so it supports casting to TIMESTAMP (workaround for item 2)
- [x] the string literal should be converted into Timestamp if the column is Timestamp.

Author: Zdenek Farana <[email protected]>
Author: Zdenek Farana <[email protected]>

Closes #2084 from byF/SPARK-3173 and squashes the following commits:

442b59d [Zdenek Farana] Fixed test merge conflict
2dbf4f6 [Zdenek Farana] Merge remote-tracking branch 'origin/SPARK-3173' into SPARK-3173
65b6215 [Zdenek Farana] Fixed timezone sensitivity in the test
47b27b4 [Zdenek Farana] Now works in the case of "StringLiteral=TimestampColumn"
96a661b [Zdenek Farana] Code style change
491dfcf [Zdenek Farana] Added test cases for SPARK-3173
4446b1e [Zdenek Farana] A string literal is casted into Timestamp when the column is Timestamp.
59af397 [Zdenek Farana] Added a new TIMESTAMP keyword; CAST to TIMESTAMP now can be used in SQL expression.
@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35178 has finished for PR 6888 at commit aaa9508.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35183 has finished for PR 6888 at commit 4dfc412.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35192 has finished for PR 6888 at commit 1172c60.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor Author

@yhuai updated to avoid changing equality behavior.



checkAnswer(
df.select("t").filter($"t" >= "2014-06-01"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor: change it to $"t" >= "2014"?

@yhuai
Copy link
Contributor

yhuai commented Jun 19, 2015

lgtm

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35319 has finished for PR 6888 at commit 1f09adf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35320 has finished for PR 6888 at commit bdef29c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Jun 19, 2015

Thanks! I am merging it to master and branch 1.4.

@asfgit asfgit closed this in a333a72 Jun 19, 2015
@yhuai
Copy link
Contributor

yhuai commented Jun 20, 2015

It does not merge cleanly with branch 1.4. Will have another pr for 1.4 backport.

asfgit pushed a commit that referenced this pull request Jun 22, 2015
…branch-1.4)

This is branch 1.4 backport of #6888.

Below is the original description.

In earlier versions of Spark SQL we casted `TimestampType` and `DataType` to `StringType` when it was involved in a binary comparison with a `StringType`.  This allowed comparing a timestamp with a partial date as a user would expect.
 - `time > "2014-06-10"`
 - `time > "2014"`

In 1.4.0 we tried to cast the String instead into a Timestamp.  However, since partial dates are not a valid complete timestamp this results in `null` which results in the tuple being filtered.

This PR restores the earlier behavior.  Note that we still special case equality so that these comparisons are not affected by not printing zeros for subsecond precision.

Author: Michael Armbrust <michaeldatabricks.com>

Closes #6888 from marmbrus/timeCompareString and squashes the following commits:

bdef29c [Michael Armbrust] test partial date
1f09adf [Michael Armbrust] special handling of equality
1172c60 [Michael Armbrust] more test fixing
4dfc412 [Michael Armbrust] fix tests
aaa9508 [Michael Armbrust] newline
04d908f [Michael Armbrust] [SPARK-8420][SQL] Fix comparision of timestamps/dates with strings

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala

Author: Michael Armbrust <[email protected]>

Closes #6914 from yhuai/timeCompareString-1.4 and squashes the following commits:

9882915 [Michael Armbrust] [SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 22, 2015
…branch-1.4)

This is branch 1.4 backport of apache#6888.

Below is the original description.

In earlier versions of Spark SQL we casted `TimestampType` and `DataType` to `StringType` when it was involved in a binary comparison with a `StringType`.  This allowed comparing a timestamp with a partial date as a user would expect.
 - `time > "2014-06-10"`
 - `time > "2014"`

In 1.4.0 we tried to cast the String instead into a Timestamp.  However, since partial dates are not a valid complete timestamp this results in `null` which results in the tuple being filtered.

This PR restores the earlier behavior.  Note that we still special case equality so that these comparisons are not affected by not printing zeros for subsecond precision.

Author: Michael Armbrust <michaeldatabricks.com>

Closes apache#6888 from marmbrus/timeCompareString and squashes the following commits:

bdef29c [Michael Armbrust] test partial date
1f09adf [Michael Armbrust] special handling of equality
1172c60 [Michael Armbrust] more test fixing
4dfc412 [Michael Armbrust] fix tests
aaa9508 [Michael Armbrust] newline
04d908f [Michael Armbrust] [SPARK-8420][SQL] Fix comparision of timestamps/dates with strings

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala

Author: Michael Armbrust <[email protected]>

Closes apache#6914 from yhuai/timeCompareString-1.4 and squashes the following commits:

9882915 [Michael Armbrust] [SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings
@marmbrus marmbrus deleted the timeCompareString branch August 3, 2015 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants