-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-2407: Added internal implementation of SQL SUBSTR() #1359
Conversation
Merged build started. |
Merged build triggered. |
QA tests have started for PR 1359. This patch merges cleanly. |
|
||
def nullable: Boolean = true | ||
def dataType: DataType = { | ||
if (str.dataType == BinaryType) str.dataType else StringType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Will -- I think we need to check if resolved
is true here to not break Catalyst's contract. Similar to here.
Hey @willb - thanks for working on this, which is going to be very useful for Spark SQL! I left a couple minor comments. Another general concern is the performance of |
Thanks for the comments, @concretevitamin. I've made the changes from your comments and will think about reducing branch overhead before pushing an update. |
@concretevitamin Inlining |
This sounds pretty cool! |
QA results for PR 1359: |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Merged build triggered. |
Merged build started. |
QA tests have started for PR 1359. This patch merges cleanly. |
QA results for PR 1359: |
Merged build finished. All automated tests passed. |
All automated tests passed. |
override def children = str :: pos :: len :: Nil | ||
|
||
@inline | ||
def slice[T, C <% IndexedSeqOptimized[T,_]](str: C, startPos: Int, sliceLen: Int): Any = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to do this without view bounds? I think those are going to disappear soon: https://issues.scala-lang.org/browse/SI-7629
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think I can do it with implicit parameters. I'll push an update once I've run the Catalyst suite against my change.
QA tests have started for PR 1359. This patch merges cleanly. |
QA results for PR 1359: |
QA tests have started for PR 1359. This patch merges cleanly. |
QA results for PR 1359: |
Hey Will, this looks great. Thanks for implementing! Mind fixing the conflict with master so I can merge? |
This replaces the Hive UDF for SUBSTR(ING) with an implementation in Catalyst and adds tests to verify correct operation. Squashes 8969d038 and e6419b4e.
* orders imports in stringOperations.scala * Substring.dataType throws exception if children are unresolved * inlines Substring.slice (~11.5% performance improvement on microbenchmark runs) * adds a special `toString` case for two-argument SUBSTR expressions * removes spurious I_ prefix to SUBSTR(ING) in HiveQL.scala Thanks to @concretevitamin for prompt and useful feedback!
add to whitelist |
QA tests have started for PR 1359. This patch merges cleanly. |
QA results for PR 1359: |
I'm going to go ahead and merge this. Someone can make a follow-up with nullability narrowing. |
Thanks! Merged into master and 1.0. |
This replaces the Hive UDF for SUBSTR(ING) with an implementation in Catalyst and adds tests to verify correct operation. Author: William Benton <[email protected]> Closes #1359 from willb/internalSqlSubstring and squashes the following commits: ccedc47 [William Benton] Fixed too-long line. a30a037 [William Benton] replace view bounds with implicit parameters ec35c80 [William Benton] Adds fixes from review: 4f3bfdb [William Benton] Added internal implementation of SQL SUBSTR() (cherry picked from commit 61de65b) Signed-off-by: Michael Armbrust <[email protected]>
This is a follow-up of #1359 with nullability narrowing. Author: Takuya UESHIN <[email protected]> Closes #1426 from ueshin/issues/SPARK-2504 and squashes the following commits: 5157832 [Takuya UESHIN] Remove unnecessary white spaces. 80958ac [Takuya UESHIN] Fix nullability of Substring expression. (cherry picked from commit 632fb3d) Signed-off-by: Reynold Xin <[email protected]>
This is a follow-up of #1359 with nullability narrowing. Author: Takuya UESHIN <[email protected]> Closes #1426 from ueshin/issues/SPARK-2504 and squashes the following commits: 5157832 [Takuya UESHIN] Remove unnecessary white spaces. 80958ac [Takuya UESHIN] Fix nullability of Substring expression.
hi, it is really very useful for us, i tried this implementation from @willb , in spark-shell, i still got java.lang.UnsupportedOperationException by Query Plan, i made some change in SqlParser: chutium@1de83a7 |
Awesome! Please submit a pull request with that addition.
|
PR submitted #1442 |
follow-up of #1359 Author: chutium <[email protected]> Closes #1442 from chutium/master and squashes the following commits: b49cc8a [chutium] SPARK-2407: Added Parser of SQL SUBSTRING() #1442 9a60ccf [chutium] SPARK-2407: Added Parser of SQL SUBSTR() #1442 06e933b [chutium] Merge https://github.com/apache/spark c870172 [chutium] Merge https://github.com/apache/spark 094f773 [chutium] Merge https://github.com/apache/spark 88cb37d [chutium] Merge https://github.com/apache/spark 1de83a7 [chutium] SPARK-2407: Added Parse of SQL SUBSTR()
This replaces the Hive UDF for SUBSTR(ING) with an implementation in Catalyst and adds tests to verify correct operation. Author: William Benton <[email protected]> Closes apache#1359 from willb/internalSqlSubstring and squashes the following commits: ccedc47 [William Benton] Fixed too-long line. a30a037 [William Benton] replace view bounds with implicit parameters ec35c80 [William Benton] Adds fixes from review: 4f3bfdb [William Benton] Added internal implementation of SQL SUBSTR()
This is a follow-up of apache#1359 with nullability narrowing. Author: Takuya UESHIN <[email protected]> Closes apache#1426 from ueshin/issues/SPARK-2504 and squashes the following commits: 5157832 [Takuya UESHIN] Remove unnecessary white spaces. 80958ac [Takuya UESHIN] Fix nullability of Substring expression.
follow-up of apache#1359 Author: chutium <[email protected]> Closes apache#1442 from chutium/master and squashes the following commits: b49cc8a [chutium] SPARK-2407: Added Parser of SQL SUBSTRING() apache#1442 9a60ccf [chutium] SPARK-2407: Added Parser of SQL SUBSTR() apache#1442 06e933b [chutium] Merge https://github.com/apache/spark c870172 [chutium] Merge https://github.com/apache/spark 094f773 [chutium] Merge https://github.com/apache/spark 88cb37d [chutium] Merge https://github.com/apache/spark 1de83a7 [chutium] SPARK-2407: Added Parse of SQL SUBSTR()
This replaces the Hive UDF for SUBSTR(ING) with an implementation in Catalyst
and adds tests to verify correct operation.