Cancel jobs only in PRs #28

Ngone51 · 2021-01-11T06:10:39Z

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

… Actions ### What changes were proposed in this pull request? This is kind of a followup of #31104 but I decided to track it separately with a separate JIRA. Currently the jobs are being canceled in main repo branches. If a commit is merged, for example, to master branch before the test finishes, it cancels the previous builds. This is a problem because we cannot, for example, detect logical conflict properly. We should only cancel the jobs in PRs: ![Screen Shot 2021-01-11 at 3 22 24 PM](https://user-images.githubusercontent.com/6477701/104152015-c7f04b80-5421-11eb-9e40-6b0a0e5b8442.png) This PR proposes to don't do this in the main repo branch commits but only do it in PRs. ### Why are the changes needed? - To keep the test coverage - To run the test in the synced master branch instead of relying on the builds made in each PR with an outdated master branch - To detect test failures from logical conflicts from merging two conflicting PRs at the same time. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? I manually tested in - HyukjinKwon#27 - HyukjinKwon#28 I added Yi Wu as a co-author since he helped verifying the current fix in the PR above. I checked that it does not cancel in the main repo branch: ![Screen Shot 2021-01-11 at 3 58 52 PM](https://user-images.githubusercontent.com/6477701/104153656-3afbc100-5426-11eb-9309-85f6f4fd9ff3.png) I checked it cancels in PRs: ![Screen Shot 2021-01-11 at 3 58 45 PM](https://user-images.githubusercontent.com/6477701/104153658-3d5e1b00-5426-11eb-89f7-786c3ae6849a.png) Closes #31121 from HyukjinKwon/SPARK-34065. Lead-authored-by: hyukjinkwon <[email protected]> Co-authored-by: yi.wu <[email protected]> Co-authored-by: HyukjinKwon <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>

…onnect ### What changes were proposed in this pull request? Implement Arrow-optimized Python UDFs in Spark Connect. Please see apache#39384 for motivation and performance improvements of Arrow-optimized Python UDFs. ### Why are the changes needed? Parity with vanilla PySpark. ### Does this PR introduce _any_ user-facing change? Yes. In Spark Connect Python Client, users can: 1. Set `useArrow` parameter True to enable Arrow optimization for a specific Python UDF. ```sh >>> df = spark.range(2) >>> df.select(udf(lambda x : x + 1, useArrow=True)('id')).show() +------------+ |<lambda>(id)| +------------+ | 1| | 2| +------------+ # ArrowEvalPython indicates Arrow optimization >>> df.select(udf(lambda x : x + 1, useArrow=True)('id')).explain() == Physical Plan == *(2) Project [pythonUDF0#18 AS <lambda>(id)#16] +- ArrowEvalPython [<lambda>(id#14L)#15], [pythonUDF0#18], 200 +- *(1) Range (0, 2, step=1, splits=1) ``` 2. Enable `spark.sql.execution.pythonUDF.arrow.enabled` Spark Conf to make all Python UDFs Arrow-optimized. ```sh >>> spark.conf.set("spark.sql.execution.pythonUDF.arrow.enabled", True) >>> df.select(udf(lambda x : x + 1)('id')).show() +------------+ |<lambda>(id)| +------------+ | 1| | 2| +------------+ # ArrowEvalPython indicates Arrow optimization >>> df.select(udf(lambda x : x + 1)('id')).explain() == Physical Plan == *(2) Project [pythonUDF0#30 AS <lambda>(id)#28] +- ArrowEvalPython [<lambda>(id#26L)#27], [pythonUDF0#30], 200 +- *(1) Range (0, 2, step=1, splits=1) ``` ### How was this patch tested? Parity unit tests. Closes apache#40725 from xinrong-meng/connect_arrow_py_udf. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

HyukjinKwon and others added 4 commits January 11, 2021 14:45

Cancel jobs only in PRs

79672fc

Trigger the test

c23e722

Trigger the test

754911c

.

8f3fd42

github-actions bot added the INFRA label Jan 11, 2021

Ngone51 added 2 commits January 11, 2021 14:12

empty1

c451e37

empty2

c01a7db

HyukjinKwon merged this pull request into HyukjinKwon:master Jan 11, 2021

HyukjinKwon mentioned this pull request Jan 11, 2021

[SPARK-34065][INFRA] Cancel the duplicated jobs only in PRs at GitHub Actions apache/spark#31121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cancel jobs only in PRs #28

Cancel jobs only in PRs #28

Ngone51 commented Jan 11, 2021

Cancel jobs only in PRs #28

Cancel jobs only in PRs #28

Conversation

Ngone51 commented Jan 11, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?