Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23261] [PySpark] Rename Pandas UDFs #20428

Closed
wants to merge 4 commits into from

Conversation

gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Jan 29, 2018

What changes were proposed in this pull request?

Rename the public APIs and names of pandas udfs.

  • PANDAS SCALAR UDF -> SCALAR PANDAS UDF
  • PANDAS GROUP MAP UDF -> GROUPED MAP PANDAS UDF
  • PANDAS GROUP AGG UDF -> GROUPED AGG PANDAS UDF

How was this patch tested?

The existing tests

@gatorsmile
Copy link
Member Author

gatorsmile commented Jan 29, 2018

Had an offline discussion with @sameeragarwal and @cloud-fan . To be consistent with the other APIs, we would propose to make the above changes.

The major question is about the new name of Pandas UDAF. This might be not the best when we supporting partial aggregate.

Also cc @rxin @ueshin @HyukjinKwon @icexelloss @BryanCutler

@icexelloss
Copy link
Contributor

  • PANDAS SCALAR UDF -> SCALAR PANDAS UDF
    This doesn't really change the API so +1

  • PANDAS GROUP MAP UDF -> GROUPED MAP PANDAS UDF
    The API changes from PandasUDFType.GROUP_MAP to PandasUDFType.GROUPED_MAP. I think this is fine as well.

  • PANDAS GROUP AGG UDF -> PANDAS UDAF
    The API changes from PandasUDFType.GROUP_AGG to PandasUDFType.UDAF. I don't love this because it feels different from SCALAR and GROUPED_MAP. I would probably call it something like GROUPED_AGG. But since this type is not released in 2.3, we can always change later.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jan 30, 2018

Yup, fortunately(?) we are free to rename SQL_PANDAS_GROUP_AGG_UDF within 2.4.0 currently but I believe here is a good place to decide based on what we got so far. The proposal seems fine to me for now.

@viirya
Copy link
Member

viirya commented Jan 30, 2018

First two changes looks good. The last one, maybe PANDAS GROUP AGG UDF -> GROUPED AGG PANDAS UDF?

@SparkQA
Copy link

SparkQA commented Jan 30, 2018

Test build #86784 has finished for PR 20428 at commit 8f75cc8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sameeragarwal
Copy link
Member

+1 on GROUPED AGG as well

@HyukjinKwon
Copy link
Member

+1 on GROUPED AGG to me too, just to be clear.

@cloud-fan
Copy link
Contributor

+1 on GROUPED AGG too, we may add new UDF type when we support partial aggregate.

@SparkQA
Copy link

SparkQA commented Jan 30, 2018

Test build #86794 has finished for PR 20428 at commit 9a4aada.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@viirya
Copy link
Member

viirya commented Jan 30, 2018

Let's also update PR description too.

@SparkQA
Copy link

SparkQA commented Jan 30, 2018

Test build #86798 has finished for PR 20428 at commit 7a71c5a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Jan 30, 2018

Jenkins, retest this please.

@ueshin
Copy link
Member

ueshin commented Jan 30, 2018

LGTM.

@SparkQA
Copy link

SparkQA commented Jan 30, 2018

Test build #86804 has finished for PR 20428 at commit 7a71c5a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@cloud-fan
Copy link
Contributor

@gatorsmile shall we backport it to 2.3? excluding the GROUPED AGG.

@asfgit asfgit closed this in 7a2ada2 Jan 30, 2018
@gatorsmile
Copy link
Member Author

Yes. We need to backport it to 2.3

@gatorsmile
Copy link
Member Author

Will submit a new PR to 2.3

gatorsmile added a commit to gatorsmile/spark that referenced this pull request Jan 30, 2018
Rename the public APIs and names of pandas udfs.

- `PANDAS SCALAR UDF` -> `SCALAR PANDAS UDF`
- `PANDAS GROUP MAP UDF` -> `GROUPED MAP PANDAS UDF`
- `PANDAS GROUP AGG UDF` -> `GROUPED AGG PANDAS UDF`

The existing tests

Author: gatorsmile <[email protected]>

Closes apache#20428 from gatorsmile/renamePandasUDFs.
asfgit pushed a commit that referenced this pull request Jan 31, 2018
This PR is to backport #20428 to Spark 2.3 without adding the changes regarding `GROUPED AGG PANDAS UDF`

---

## What changes were proposed in this pull request?
Rename the public APIs and names of pandas udfs.

- `PANDAS SCALAR UDF` -> `SCALAR PANDAS UDF`
- `PANDAS GROUP MAP UDF` -> `GROUPED MAP PANDAS UDF`

## How was this patch tested?
The existing tests

Author: gatorsmile <[email protected]>

Closes #20439 from gatorsmile/backport2.3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants