Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23327] [SQL] Update the description and tests of three external API or functions #20495

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions python/pyspark/sql/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -1705,10 +1705,12 @@ def unhex(col):
@ignore_unicode_prefix
@since(1.5)
def length(col):
"""Calculates the length of a string or binary expression.
"""Computes the character length of a given string or number of bytes or a binary string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number of bytes of a binary value?

The length of character strings include the trailing spaces. The length of binary strings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a side note, why is it calling out trailing spaces? what about leading spaces? isn't all spaces factored into the character length?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ask because I want to understand this better to see if we should update R https://github.com/apache/spark/blob/master/R/pkg/R/functions.R#L1029

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is LEN in MS SQL Server excludes trailing blanks. : )

Yeah. This PR also can updates it in R side too.

includes binary zeros.

>>> spark.createDataFrame([('ABC',)], ['a']).select(length('a').alias('length')).collect()
[Row(length=3)]
>>> spark.createDataFrame([('ABC ',)], ['a']).select(length('a').alias('length')).collect()
Copy link
Member

@dongjoon-hyun dongjoon-hyun Feb 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, not only description, this PR improves the test coverage and refactos the code, too.
Could you update the PR description/title more correctly?
Otherwise, we had better split this PR according to @rdblue 's recommendations in our dev mailing list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

[Row(length=4)]
"""
sc = SparkContext._active_spark_context
return Column(sc._jvm.functions.length(_to_java_column(col)))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -988,8 +988,11 @@ class SessionCatalog(
// -------------------------------------------------------

/**
* Create a metastore function in the database specified in `funcDefinition`.
* Create a function in the database specified in `funcDefinition`.
* If no such database is specified, create it in the current database.
*
* @param ignoreIfExists: When true, ignore if the function with the specified name exists
* in the specified database.
*/
def createFunction(funcDefinition: CatalogFunction, ignoreIfExists: Boolean): Unit = {
val db = formatDatabaseName(funcDefinition.identifier.database.getOrElse(getCurrentDatabase))
Expand Down Expand Up @@ -1061,7 +1064,7 @@ class SessionCatalog(
}

/**
* Check if the specified function exists.
* Check if the function with the specified name exists
*/
def functionExists(name: FunctionIdentifier): Boolean = {
val db = formatDatabaseName(name.database.getOrElse(getCurrentDatabase))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1655,15 +1655,17 @@ case class Left(str: Expression, len: Expression, child: Expression) extends Run
*/
// scalastyle:off line.size.limit
@ExpressionDescription(
usage = "_FUNC_(expr) - Returns the character length of `expr` or number of bytes in binary data.",
usage = "_FUNC_(expr) - Returns the character length of `expr` or number of bytes in binary data. " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are other places use "binary string" and here we have "binary data"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be consistent, either character string vs binary string, or string data vs binary data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for string data / binary data

"The length of character strings include the trailing spaces. The length of binary strings " +
"includes binary zeros.",
examples = """
Examples:
> SELECT _FUNC_('Spark SQL');
9
> SELECT CHAR_LENGTH('Spark SQL');
9
> SELECT CHARACTER_LENGTH('Spark SQL');
9
> SELECT _FUNC_('Spark SQL ');
10
> SELECT CHAR_LENGTH('Spark SQL ');
10
> SELECT CHARACTER_LENGTH('Spark SQL ');
10
""")
// scalastyle:on line.size.limit
case class Length(child: Expression) extends UnaryExpression with ImplicitCastInputTypes {
Expand Down
2 changes: 2 additions & 0 deletions sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
Original file line number Diff line number Diff line change
Expand Up @@ -2825,6 +2825,7 @@ class Dataset[T] private[sql](
*
* At least one partition-by expression must be specified.
* When no explicit sort order is specified, "ascending nulls first" is assumed.
* Note, the rows are not sorted in each partition of the resulting Dataset.
*
* @group typedrel
* @since 2.3.0
Expand All @@ -2848,6 +2849,7 @@ class Dataset[T] private[sql](
*
* At least one partition-by expression must be specified.
* When no explicit sort order is specified, "ascending nulls first" is assumed.
* Note, the rows are not sorted in each partition of the resulting Dataset.
*
* @group typedrel
* @since 2.3.0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,24 +40,28 @@ import org.apache.spark.sql.types.{StringType, StructField, StructType}
* CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] [databaseName.]functionName
* AS className [USING JAR\FILE 'uri' [, JAR|FILE 'uri']]
* }}}
*
* @param ignoreIfExists: When true, ignore if the function with the specified name exists
* in the specified database.
* @param replace: When true, alter the function with the specified name
*/
case class CreateFunctionCommand(
databaseName: Option[String],
functionName: String,
className: String,
resources: Seq[FunctionResource],
isTemp: Boolean,
ifNotExists: Boolean,
ignoreIfExists: Boolean,
replace: Boolean)
extends RunnableCommand {

if (ifNotExists && replace) {
if (ignoreIfExists && replace) {
throw new AnalysisException("CREATE FUNCTION with both IF NOT EXISTS and REPLACE" +
" is not allowed.")
}

// Disallow to define a temporary function with `IF NOT EXISTS`
if (ifNotExists && isTemp) {
if (ignoreIfExists && isTemp) {
throw new AnalysisException(
"It is not allowed to define a TEMPORARY function with IF NOT EXISTS.")
}
Expand All @@ -79,12 +83,12 @@ case class CreateFunctionCommand(
// Handles `CREATE OR REPLACE FUNCTION AS ... USING ...`
if (replace && catalog.functionExists(func.identifier)) {
// alter the function in the metastore
catalog.alterFunction(CatalogFunction(func.identifier, className, resources))
catalog.alterFunction(func)
} else {
// For a permanent, we will store the metadata into underlying external catalog.
// This function will be loaded into the FunctionRegistry when a query uses it.
// We do not load it into FunctionRegistry right now.
catalog.createFunction(CatalogFunction(func.identifier, className, resources), ifNotExists)
catalog.createFunction(func, ignoreIfExists)
}
}
Seq.empty[Row]
Expand Down
4 changes: 3 additions & 1 deletion sql/core/src/main/scala/org/apache/spark/sql/functions.scala
Original file line number Diff line number Diff line change
Expand Up @@ -2267,7 +2267,9 @@ object functions {
}

/**
* Computes the length of a given string or binary column.
* Computes the character length of a given string or number of bytes or a binary string.
* The length of character strings include the trailing spaces. The length of binary strings
* includes binary zeros.
*
* @group string_funcs
* @since 1.5.0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -236,39 +236,39 @@ class DDLParserSuite extends PlanTest with SharedSQLContext {
Seq(
FunctionResource(FunctionResourceType.fromString("jar"), "/path/to/jar1"),
FunctionResource(FunctionResourceType.fromString("jar"), "/path/to/jar2")),
isTemp = true, ifNotExists = false, replace = false)
isTemp = true, ignoreIfExists = false, replace = false)
val expected2 = CreateFunctionCommand(
Some("hello"),
"world",
"com.matthewrathbone.example.SimpleUDFExample",
Seq(
FunctionResource(FunctionResourceType.fromString("archive"), "/path/to/archive"),
FunctionResource(FunctionResourceType.fromString("file"), "/path/to/file")),
isTemp = false, ifNotExists = false, replace = false)
isTemp = false, ignoreIfExists = false, replace = false)
val expected3 = CreateFunctionCommand(
None,
"helloworld3",
"com.matthewrathbone.example.SimpleUDFExample",
Seq(
FunctionResource(FunctionResourceType.fromString("jar"), "/path/to/jar1"),
FunctionResource(FunctionResourceType.fromString("jar"), "/path/to/jar2")),
isTemp = true, ifNotExists = false, replace = true)
isTemp = true, ignoreIfExists = false, replace = true)
val expected4 = CreateFunctionCommand(
Some("hello"),
"world1",
"com.matthewrathbone.example.SimpleUDFExample",
Seq(
FunctionResource(FunctionResourceType.fromString("archive"), "/path/to/archive"),
FunctionResource(FunctionResourceType.fromString("file"), "/path/to/file")),
isTemp = false, ifNotExists = false, replace = true)
isTemp = false, ignoreIfExists = false, replace = true)
val expected5 = CreateFunctionCommand(
Some("hello"),
"world2",
"com.matthewrathbone.example.SimpleUDFExample",
Seq(
FunctionResource(FunctionResourceType.fromString("archive"), "/path/to/archive"),
FunctionResource(FunctionResourceType.fromString("file"), "/path/to/file")),
isTemp = false, ifNotExists = true, replace = false)
isTemp = false, ignoreIfExists = true, replace = false)
comparePlans(parsed1, expected1)
comparePlans(parsed2, expected2)
comparePlans(parsed3, expected3)
Expand Down