-
Notifications
You must be signed in to change notification settings - Fork 28.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-37507][SQL] Add a new SQL function to_binary
### What changes were proposed in this pull request? Introduce a SQL function `to_binary`: Converts the input string to a binary value based on the supplied format (of how to interpret the string). Syntax: ``` to_binary(str_column[, fmt]) ``` where - `fmt` can be a case-insensitive string literal of "hex", "utf-8", "base2", or "base64". - By default, the binary format for conversion is "hex" if `fmt` is omitted. ### Why are the changes needed? `to_binary` is a common function available in many DBMSes, for example: - [TO_VARBYTE function - Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/r_TO_VARBYTE.html) - [TO_BINARY — Snowflake Documentation](https://docs.snowflake.com/en/sql-reference/functions/to_binary.html) - [Expressions, functions, and operators | BigQuery | Google Cloud](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#format_string_as_bytes) - [Teradata Online Documentation | Quick access to technical manuals](https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/etRo5aTAY9n5fUPjxSEynw) Introducing it improves compatibility and the ease of migration. In addition, `to_binary` can unify existing Spark functions: `encode`, `unhex`, `unbase64`, and `binary`, which makes API easier to remember and use. ### Does this PR introduce _any_ user-facing change? Yes, a new function for the string to binary conversion with a specified format. ### How was this patch tested? Unit test. Closes #35415 from xinrong-databricks/to_binary. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
- Loading branch information
1 parent
25a4c5f
commit 25dd425
Showing
6 changed files
with
318 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters