SPARK-30434 Move pandas related functionalities into 'pandas' sub-package #327

zero323 · 2020-01-15T14:35:23Z

@HyukjinKwon If it is not to much to ask, I'd really appreciate if you could take a quick glance and let me know if you see any obvious issues.

HyukjinKwon · 2020-01-16T03:51:21Z

doc/api-coverage.rst

@@ -150,6 +150,22 @@ API Coverage
 +------------------------------------------------+---------------------+--------------------+------------+
 | `pyspark.sql.group`_                           | ✘                   | ✔                  |            |
 +------------------------------------------------+---------------------+--------------------+------------+
+| `pyspark.sql.pandas`_                          | ✔                   | ✘                  |            |


I intended pandas to be Internal too; however, I am maybe ignorant about the meaning of Internal and Mixed here.

That crossed my mind, by at least _ops affect public API.

I think I should drop the idea of internal anyway - these days stub generators are good to keep these in sync.

HyukjinKwon · 2020-01-16T03:53:12Z

test-data/unit/sql-udf.test

@@ -75,7 +75,7 @@ pandas_udf(lambda *xs: 42, "str", PandasUDFType.GROUPED_AGG)


 [case mapIterUdf]
-from pyspark.sql.functions import pandas_udf, PandasUDFType
+from pyspark.sql.pandas.functions import pandas_udf, PandasUDFType


I am also not so much aware of this code base; however, just wanted to note:

original import will work as was.

MAP ITER became mapInPandas

original import will work as was.

That's primarily to check both sides. In general MyPy is pretty strict about these things so

from pyspark.sql.pandas.functions import pandas_udf, PandasUDFType

would type check fine in pyspark.sql.functions but, not as transitive import. For that I had to switch to aliases.

from pyspark.sql.pandas.functions import pandas_udf as pandas_udf, PandasUDFType as PandasUDFType

HyukjinKwon

Looks good from a cursory look.

zero323 · 2020-01-16T12:21:34Z

Looks good from a cursory look.

I couldn't ask for more. Thank you so much @HyukjinKwon!

zero323 mentioned this pull request Jan 15, 2020

Sync with changes merged after 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4 #230

Closed

47 tasks

zero323 force-pushed the SPARK-30434 branch 6 times, most recently from ed01129 to 93c13dc Compare January 15, 2020 21:08

HyukjinKwon reviewed Jan 16, 2020

View reviewed changes

HyukjinKwon approved these changes Jan 16, 2020

View reviewed changes

zero323 added 3 commits January 16, 2020 14:43

Add dynamic sql.pandas stubs and update serializers

3439fcf

Adjust existing code for moves

819e89b

Update coverage matrix

0238e08

zero323 force-pushed the SPARK-30434 branch from 93c13dc to 0238e08 Compare January 16, 2020 13:49

zero323 merged commit ae50c65 into master Jan 16, 2020

zero323 deleted the SPARK-30434 branch January 16, 2020 13:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-30434 Move pandas related functionalities into 'pandas' sub-package #327

SPARK-30434 Move pandas related functionalities into 'pandas' sub-package #327

zero323 commented Jan 15, 2020

HyukjinKwon Jan 16, 2020 •

edited

Loading

zero323 Jan 16, 2020

HyukjinKwon Jan 16, 2020

zero323 Jan 16, 2020

HyukjinKwon left a comment

zero323 commented Jan 16, 2020

SPARK-30434 Move pandas related functionalities into 'pandas' sub-package #327

SPARK-30434 Move pandas related functionalities into 'pandas' sub-package #327

Conversation

zero323 commented Jan 15, 2020

HyukjinKwon Jan 16, 2020 • edited Loading

Choose a reason for hiding this comment

zero323 Jan 16, 2020

Choose a reason for hiding this comment

HyukjinKwon Jan 16, 2020

Choose a reason for hiding this comment

zero323 Jan 16, 2020

Choose a reason for hiding this comment

HyukjinKwon left a comment

Choose a reason for hiding this comment

zero323 commented Jan 16, 2020

HyukjinKwon Jan 16, 2020 •

edited

Loading