Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4 #35871

Closed
wants to merge 1 commit into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Mar 16, 2022

What changes were proposed in this pull request?

This PR upgrade Py4J 0.10.9.4, with relevant documentation changes.

Why are the changes needed?

Py4J 0.10.9.3 has a resource leak issue when pinned thread mode is enabled - it's enabled by default in PySpark at 41af409.
We worked around this by enforcing users to use InheritableThread or inhteritable_thread_target as a workaround.
After upgrading, we don't need to enforce users anymore because it automatically cleans up, see also py4j/py4j#471

Does this PR introduce any user-facing change?

Yes, users don't have to use InheritableThread or inhteritable_thread_target to avoid resource leaking problem anymore.

How was this patch tested?

CI in this PR should test it out.

@HyukjinKwon
Copy link
Member Author

cc @WeichenXu123 FYI

@wangyum
Copy link
Member

wangyum commented Mar 16, 2022

Py4J 0.10.9.4 has a resource leak issue?

@HyukjinKwon
Copy link
Member Author

Oops, I meant 0.10.9.3. Just fixed.

@HyukjinKwon
Copy link
Member Author

Thanks!

HyukjinKwon added a commit that referenced this pull request Mar 16, 2022
### What changes were proposed in this pull request?

This PR upgrade Py4J 0.10.9.4, with relevant documentation changes.

### Why are the changes needed?

Py4J 0.10.9.3 has a resource leak issue when pinned thread mode is enabled - it's enabled by default in PySpark at 41af409.
We worked around this by enforcing users to use `InheritableThread` or `inhteritable_thread_target` as a workaround.
After upgrading, we don't need to enforce users anymore because it automatically cleans up, see also py4j/py4j#471

### Does this PR introduce _any_ user-facing change?

Yes, users don't have to use `InheritableThread` or `inhteritable_thread_target` to avoid resource leaking problem anymore.

### How was this patch tested?

CI in this PR should test it out.

Closes #35871 from HyukjinKwon/SPARK-38563.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 8193b40)
Signed-off-by: Hyukjin Kwon <[email protected]>
@HyukjinKwon
Copy link
Member Author

Merged to master, branch-3.3 and branch-3.2.

HyukjinKwon added a commit that referenced this pull request Mar 16, 2022
This PR upgrade Py4J 0.10.9.4, with relevant documentation changes.

Py4J 0.10.9.3 has a resource leak issue when pinned thread mode is enabled - it's enabled by default in PySpark at 41af409.
We worked around this by enforcing users to use `InheritableThread` or `inhteritable_thread_target` as a workaround.
After upgrading, we don't need to enforce users anymore because it automatically cleans up, see also py4j/py4j#471

Yes, users don't have to use `InheritableThread` or `inhteritable_thread_target` to avoid resource leaking problem anymore.

CI in this PR should test it out.

Closes #35871 from HyukjinKwon/SPARK-38563.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 8193b40)
Signed-off-by: Hyukjin Kwon <[email protected]>
@wangyum wangyum deleted the SPARK-38563 branch March 16, 2022 09:55
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @HyukjinKwon and @wangyum .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, @HyukjinKwon .

Is Py4J 0.10.9.4 tested with Python 3.10.x? This commit seems to break Python 3.10.x environment like the following.

apache-spark-master-python/python/lib/py4j-0.10.9.4-src.zip/py4j/clientserver.py", line 12, in <module>
06:58:13 ImportError: cannot import name 'Callable' from 'collections' (/Users/dongjoon/.pyenv/versions/3.10.1/lib/python3.10/collections/__init__.py)

@dongjoon-hyun
Copy link
Member

It seems that Py4J has no test coverage for Python 3.10.

Screen Shot 2022-03-17 at 2 08 31 PM

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Mar 17, 2022

Sorry but let me create reverting PRs for this.

@dongjoon-hyun
Copy link
Member

For Apache Spark 3.2, I'm fine because we didn't support Python 3.10 before.

@HyukjinKwon
Copy link
Member Author

Yeah I'm making another fix and release now. We can revert it for now too. Thanks for checking this!

@HyukjinKwon
Copy link
Member Author

I made a fix, and fixing CI now (py4j/py4j#477). I will make another release, and upgrade it back soon.

@dongjoon-hyun
Copy link
Member

Thank you!

HyukjinKwon added a commit that referenced this pull request Mar 18, 2022
### What changes were proposed in this pull request?

This PR is a retry of #35871 with bumping up the version to 0.10.9.5.
It was reverted because of Python 3.10 is broken, and Python 3.10 was not officially supported in Py4J.

In Py4J 0.10.9.5, the issue was fixed (py4j/py4j#475), and it added Python 3.10 support officially with CI set up (py4j/py4j#477).

### Why are the changes needed?

See #35871

### Does this PR introduce _any_ user-facing change?

See #35871

### How was this patch tested?

Py4J sets up Python 3.10 CI now, and I manually tested PySpark with Python 3.10 with this patch:

```bash
./bin/pyspark
```

```
import py4j
py4j.__version__
spark.range(10).show()
```

```
Using Python version 3.10.0 (default, Mar  3 2022 03:57:21)
Spark context Web UI available at http://172.30.5.50:4040
Spark context available as 'sc' (master = local[*], app id = local-1647571387534).
SparkSession available as 'spark'.
>>> import py4j
>>> py4j.__version__
'0.10.9.5'
>>> spark.range(10).show()
+---+
| id|
+---+
...
```

Closes #35907 from HyukjinKwon/SPARK-38563-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
HyukjinKwon added a commit that referenced this pull request Mar 18, 2022
### What changes were proposed in this pull request?

This PR is a retry of #35871 with bumping up the version to 0.10.9.5.
It was reverted because of Python 3.10 is broken, and Python 3.10 was not officially supported in Py4J.

In Py4J 0.10.9.5, the issue was fixed (py4j/py4j#475), and it added Python 3.10 support officially with CI set up (py4j/py4j#477).

### Why are the changes needed?

See #35871

### Does this PR introduce _any_ user-facing change?

See #35871

### How was this patch tested?

Py4J sets up Python 3.10 CI now, and I manually tested PySpark with Python 3.10 with this patch:

```bash
./bin/pyspark
```

```
import py4j
py4j.__version__
spark.range(10).show()
```

```
Using Python version 3.10.0 (default, Mar  3 2022 03:57:21)
Spark context Web UI available at http://172.30.5.50:4040
Spark context available as 'sc' (master = local[*], app id = local-1647571387534).
SparkSession available as 'spark'.
>>> import py4j
>>> py4j.__version__
'0.10.9.5'
>>> spark.range(10).show()
+---+
| id|
+---+
...
```

Closes #35907 from HyukjinKwon/SPARK-38563-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 97335ea)
Signed-off-by: Hyukjin Kwon <[email protected]>
HyukjinKwon added a commit that referenced this pull request Mar 18, 2022
This PR is a retry of #35871 with bumping up the version to 0.10.9.5.
It was reverted because of Python 3.10 is broken, and Python 3.10 was not officially supported in Py4J.

In Py4J 0.10.9.5, the issue was fixed (py4j/py4j#475), and it added Python 3.10 support officially with CI set up (py4j/py4j#477).

See #35871

See #35871

Py4J sets up Python 3.10 CI now, and I manually tested PySpark with Python 3.10 with this patch:

```bash
./bin/pyspark
```

```
import py4j
py4j.__version__
spark.range(10).show()
```

```
Using Python version 3.10.0 (default, Mar  3 2022 03:57:21)
Spark context Web UI available at http://172.30.5.50:4040
Spark context available as 'sc' (master = local[*], app id = local-1647571387534).
SparkSession available as 'spark'.
>>> import py4j
>>> py4j.__version__
'0.10.9.5'
>>> spark.range(10).show()
+---+
| id|
+---+
...
```

Closes #35907 from HyukjinKwon/SPARK-38563-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 97335ea)
Signed-off-by: Hyukjin Kwon <[email protected]>
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
This PR is a retry of apache#35871 with bumping up the version to 0.10.9.5.
It was reverted because of Python 3.10 is broken, and Python 3.10 was not officially supported in Py4J.

In Py4J 0.10.9.5, the issue was fixed (py4j/py4j#475), and it added Python 3.10 support officially with CI set up (py4j/py4j#477).

See apache#35871

See apache#35871

Py4J sets up Python 3.10 CI now, and I manually tested PySpark with Python 3.10 with this patch:

```bash
./bin/pyspark
```

```
import py4j
py4j.__version__
spark.range(10).show()
```

```
Using Python version 3.10.0 (default, Mar  3 2022 03:57:21)
Spark context Web UI available at http://172.30.5.50:4040
Spark context available as 'sc' (master = local[*], app id = local-1647571387534).
SparkSession available as 'spark'.
>>> import py4j
>>> py4j.__version__
'0.10.9.5'
>>> spark.range(10).show()
+---+
| id|
+---+
...
```

Closes apache#35907 from HyukjinKwon/SPARK-38563-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 97335ea)
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants