-
-
Notifications
You must be signed in to change notification settings - Fork 37
SPARK-30434 Move pandas related functionalities into 'pandas' sub-package #327
Conversation
ed01129
to
93c13dc
Compare
@@ -150,6 +150,22 @@ API Coverage | |||
+------------------------------------------------+---------------------+--------------------+------------+ | |||
| `pyspark.sql.group`_ | ✘ | ✔ | | | |||
+------------------------------------------------+---------------------+--------------------+------------+ | |||
| `pyspark.sql.pandas`_ | ✔ | ✘ | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intended pandas
to be Internal
too; however, I am maybe ignorant about the meaning of Internal
and Mixed
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That crossed my mind, by at least _ops
affect public API.
I think I should drop the idea of internal
anyway - these days stub generators are good to keep these in sync.
@@ -75,7 +75,7 @@ pandas_udf(lambda *xs: 42, "str", PandasUDFType.GROUPED_AGG) | |||
|
|||
|
|||
[case mapIterUdf] | |||
from pyspark.sql.functions import pandas_udf, PandasUDFType | |||
from pyspark.sql.pandas.functions import pandas_udf, PandasUDFType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also not so much aware of this code base; however, just wanted to note:
- original import will work as was.
- MAP ITER became
mapInPandas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
original import will work as was.
That's primarily to check both sides. In general MyPy is pretty strict about these things so
from pyspark.sql.pandas.functions import pandas_udf, PandasUDFType
would type check fine in pyspark.sql.functions
but, not as transitive import. For that I had to switch to aliases.
from pyspark.sql.pandas.functions import pandas_udf as pandas_udf, PandasUDFType as PandasUDFType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from a cursory look.
I couldn't ask for more. Thank you so much @HyukjinKwon! |
@HyukjinKwon If it is not to much to ask, I'd really appreciate if you could take a quick glance and let me know if you see any obvious issues.