-
Notifications
You must be signed in to change notification settings - Fork 28.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-37228][SQL][PYTHON] Implement DataFrame.mapInArrow in Python
### What changes were proposed in this pull request? This PR proposes to implement `DataFrame.mapInArrow` that allows users to apply a function with PyArrow record batches such as: ```python def do_something(iterator): for arrow_batch in iterator: # do something with `pyarrow.RecordBatch` and create new `pyarrow.RecordBatch`. # ... yield arrow_batch df.mapInArrow(do_something, df.schema).show() ``` The general idea is simple. It shares the same codebase of `DataFrame.mapInPandas` except the pandas conversion logic. This PR also piggy-backs: - Removes the check in `spark.udf.register` on `SQL_MAP_PANDAS_ITER_UDF`. This type is only used for `DataFrame.mapInPandas` internally, and it cannot be registered as a SQL UDF - Removes the type hints for `pandas_udf` that is used for internal purposes such as `SQL_MAP_PANDAS_ITER_UDF` and `SQL_COGROUPED_MAP_PANDAS_UDF`. Both cannot be used for `pandas_udf` as a SQL expression and it should be hidden to end users. Note that documentation will be done in another PR. ### Why are the changes needed? For usability and technical problems. Both are elabourated in more details at SPARK-37227. Please also see the discussions at #26783. ### Does this PR introduce _any_ user-facing change? Yes, this PR adds a new API: ```python import pyarrow as pa df = spark.createDataFrame( [(1, "foo"), (2, None), (3, "bar"), (4, "bar")], "a int, b string") def func(iterator): for batch in iterator: # `batch` is pyarrow.RecordBatch. yield batch df.mapInArrow(func, df.schema).collect() ``` ### How was this patch tested? Manually tested, and unit tests were added. Closes #34505 from HyukjinKwon/SPARK-37228. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
- Loading branch information
1 parent
950422f
commit 775e05f
Showing
20 changed files
with
468 additions
and
131 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.