-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23327] [SQL] Update the description and tests of three external API or functions #20495
Changes from 4 commits
48f552c
21ea233
9ecc809
c33cc9a
9e97db9
b837053
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1705,10 +1705,12 @@ def unhex(col): | |
@ignore_unicode_prefix | ||
@since(1.5) | ||
def length(col): | ||
"""Calculates the length of a string or binary expression. | ||
"""Computes the character length of a given string or number of bytes or a binary string. | ||
The length of character strings include the trailing spaces. The length of binary strings | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as a side note, why is it calling out trailing spaces? what about leading spaces? isn't all spaces factored into the character length? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I ask because I want to understand this better to see if we should update R https://github.com/apache/spark/blob/master/R/pkg/R/functions.R#L1029 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason is Yeah. This PR also can updates it in R side too. |
||
includes binary zeros. | ||
|
||
>>> spark.createDataFrame([('ABC',)], ['a']).select(length('a').alias('length')).collect() | ||
[Row(length=3)] | ||
>>> spark.createDataFrame([('ABC ',)], ['a']).select(length('a').alias('length')).collect() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, not only There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
[Row(length=4)] | ||
""" | ||
sc = SparkContext._active_spark_context | ||
return Column(sc._jvm.functions.length(_to_java_column(col))) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1655,15 +1655,17 @@ case class Left(str: Expression, len: Expression, child: Expression) extends Run | |
*/ | ||
// scalastyle:off line.size.limit | ||
@ExpressionDescription( | ||
usage = "_FUNC_(expr) - Returns the character length of `expr` or number of bytes in binary data.", | ||
usage = "_FUNC_(expr) - Returns the character length of `expr` or number of bytes in binary data. " + | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why are other places use "binary string" and here we have "binary data"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should be consistent, either There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 for string data / binary data |
||
"The length of character strings include the trailing spaces. The length of binary strings " + | ||
"includes binary zeros.", | ||
examples = """ | ||
Examples: | ||
> SELECT _FUNC_('Spark SQL'); | ||
9 | ||
> SELECT CHAR_LENGTH('Spark SQL'); | ||
9 | ||
> SELECT CHARACTER_LENGTH('Spark SQL'); | ||
9 | ||
> SELECT _FUNC_('Spark SQL '); | ||
10 | ||
> SELECT CHAR_LENGTH('Spark SQL '); | ||
10 | ||
> SELECT CHARACTER_LENGTH('Spark SQL '); | ||
10 | ||
""") | ||
// scalastyle:on line.size.limit | ||
case class Length(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number of bytes of a binary value
?