-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-38225][SQL] Adjust input format
of function to_binary
#35533
Conversation
@@ -2565,15 +2565,15 @@ case class ToBinary(expr: Expression, format: Option[Expression], child: Express | |||
case lit if lit.foldable => | |||
val value = lit.eval() | |||
if (value == null) Literal(null, BinaryType) | |||
else { | |||
else if (value.isInstanceOf[UTF8String]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for value
to be an instance of java.lang.String
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it can only be UTF8String
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems we can remove this check now, as we have checked the data type ahead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! Removed.
-- !query schema | ||
struct<> | ||
-- !query output | ||
org.apache.spark.sql.AnalysisException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intentionally failed query.
-- !query schema | ||
struct<> | ||
-- !query output | ||
org.apache.spark.sql.AnalysisException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intentionally failed query.
format
of function to_binary
Would you please review it when you are free? Thanks! |
@@ -2565,15 +2565,14 @@ case class ToBinary(expr: Expression, format: Option[Expression], child: Express | |||
case lit if lit.foldable => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd put the string type check here, as to_binary(str_col, cast(null as int))
should fail instead of returning null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
May I confirm that the case below is acceptable?
spark-sql> select to_binary(cast(null as int), 'utf-8');
NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, the type coercion can cast any atomic type to string type implicitly.
@@ -2562,18 +2562,17 @@ case class ToBinary(expr: Expression, format: Option[Expression], child: Express | |||
|
|||
def this(expr: Expression, format: Expression) = this(expr, Option(format), | |||
format match { | |||
case lit if lit.foldable => | |||
case lit if (lit.foldable && Seq(StringType, NullType).contains(lit.dataType)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, do other systems like snowflake support to_binary('abc', null)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snowflake doesn't specify that in their document, but BigQuery and Teradata support to_binary('abc', null)
:
Expressions, functions, and operators | BigQuery | Google Cloud "If an operand is NULL, the result is NULL, with the exception of the IS operator."
Teradata Online Documentation | Quick access to technical manuals "If either instring or in_encoding is NULL, the result is NULL."
thanks, merging to master! |
What changes were proposed in this pull request?
Adjust input
format
of functionto_binary
:format
parameterbase2
format supportWhy are the changes needed?
Currently, function to_binary doesn't deal with the non-string
format
parameter properly.For example,
spark.sql("select to_binary('abc', 1)")
raises casting error, rather than hint that encoding format is unsupported.In addition, the current result of the
base2
format is incorrect compared to other DBMSes: ours useCast(expr, BinaryType)
which utilizesutf-8
format, whereas DBMSes (e.g. BigQuery)' convert a BASE2-encoded string to bytes.So we remove the
base2
support now and will follow up with that later.Does this PR introduce any user-facing change?
Yes.
format
parameter. For example:From:
To:
base2
format supportHow was this patch tested?
Unit test.