-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4 #35871
Conversation
cc @WeichenXu123 FYI |
Py4J 0.10.9.4 has a resource leak issue? |
Oops, I meant 0.10.9.3. Just fixed. |
Thanks! |
### What changes were proposed in this pull request? This PR upgrade Py4J 0.10.9.4, with relevant documentation changes. ### Why are the changes needed? Py4J 0.10.9.3 has a resource leak issue when pinned thread mode is enabled - it's enabled by default in PySpark at 41af409. We worked around this by enforcing users to use `InheritableThread` or `inhteritable_thread_target` as a workaround. After upgrading, we don't need to enforce users anymore because it automatically cleans up, see also py4j/py4j#471 ### Does this PR introduce _any_ user-facing change? Yes, users don't have to use `InheritableThread` or `inhteritable_thread_target` to avoid resource leaking problem anymore. ### How was this patch tested? CI in this PR should test it out. Closes #35871 from HyukjinKwon/SPARK-38563. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 8193b40) Signed-off-by: Hyukjin Kwon <[email protected]>
Merged to master, branch-3.3 and branch-3.2. |
This PR upgrade Py4J 0.10.9.4, with relevant documentation changes. Py4J 0.10.9.3 has a resource leak issue when pinned thread mode is enabled - it's enabled by default in PySpark at 41af409. We worked around this by enforcing users to use `InheritableThread` or `inhteritable_thread_target` as a workaround. After upgrading, we don't need to enforce users anymore because it automatically cleans up, see also py4j/py4j#471 Yes, users don't have to use `InheritableThread` or `inhteritable_thread_target` to avoid resource leaking problem anymore. CI in this PR should test it out. Closes #35871 from HyukjinKwon/SPARK-38563. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 8193b40) Signed-off-by: Hyukjin Kwon <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @HyukjinKwon and @wangyum .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, @HyukjinKwon .
Is Py4J 0.10.9.4
tested with Python 3.10.x
? This commit seems to break Python 3.10.x environment like the following.
apache-spark-master-python/python/lib/py4j-0.10.9.4-src.zip/py4j/clientserver.py", line 12, in <module>
06:58:13 ImportError: cannot import name 'Callable' from 'collections' (/Users/dongjoon/.pyenv/versions/3.10.1/lib/python3.10/collections/__init__.py)
It seems that Py4J has no test coverage for Python 3.10. |
Sorry but let me create reverting PRs for this.
|
For Apache Spark 3.2, I'm fine because we didn't support Python 3.10 before. |
Yeah I'm making another fix and release now. We can revert it for now too. Thanks for checking this! |
I made a fix, and fixing CI now (py4j/py4j#477). I will make another release, and upgrade it back soon. |
Thank you! |
### What changes were proposed in this pull request? This PR is a retry of #35871 with bumping up the version to 0.10.9.5. It was reverted because of Python 3.10 is broken, and Python 3.10 was not officially supported in Py4J. In Py4J 0.10.9.5, the issue was fixed (py4j/py4j#475), and it added Python 3.10 support officially with CI set up (py4j/py4j#477). ### Why are the changes needed? See #35871 ### Does this PR introduce _any_ user-facing change? See #35871 ### How was this patch tested? Py4J sets up Python 3.10 CI now, and I manually tested PySpark with Python 3.10 with this patch: ```bash ./bin/pyspark ``` ``` import py4j py4j.__version__ spark.range(10).show() ``` ``` Using Python version 3.10.0 (default, Mar 3 2022 03:57:21) Spark context Web UI available at http://172.30.5.50:4040 Spark context available as 'sc' (master = local[*], app id = local-1647571387534). SparkSession available as 'spark'. >>> import py4j >>> py4j.__version__ '0.10.9.5' >>> spark.range(10).show() +---+ | id| +---+ ... ``` Closes #35907 from HyukjinKwon/SPARK-38563-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? This PR is a retry of #35871 with bumping up the version to 0.10.9.5. It was reverted because of Python 3.10 is broken, and Python 3.10 was not officially supported in Py4J. In Py4J 0.10.9.5, the issue was fixed (py4j/py4j#475), and it added Python 3.10 support officially with CI set up (py4j/py4j#477). ### Why are the changes needed? See #35871 ### Does this PR introduce _any_ user-facing change? See #35871 ### How was this patch tested? Py4J sets up Python 3.10 CI now, and I manually tested PySpark with Python 3.10 with this patch: ```bash ./bin/pyspark ``` ``` import py4j py4j.__version__ spark.range(10).show() ``` ``` Using Python version 3.10.0 (default, Mar 3 2022 03:57:21) Spark context Web UI available at http://172.30.5.50:4040 Spark context available as 'sc' (master = local[*], app id = local-1647571387534). SparkSession available as 'spark'. >>> import py4j >>> py4j.__version__ '0.10.9.5' >>> spark.range(10).show() +---+ | id| +---+ ... ``` Closes #35907 from HyukjinKwon/SPARK-38563-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 97335ea) Signed-off-by: Hyukjin Kwon <[email protected]>
This PR is a retry of #35871 with bumping up the version to 0.10.9.5. It was reverted because of Python 3.10 is broken, and Python 3.10 was not officially supported in Py4J. In Py4J 0.10.9.5, the issue was fixed (py4j/py4j#475), and it added Python 3.10 support officially with CI set up (py4j/py4j#477). See #35871 See #35871 Py4J sets up Python 3.10 CI now, and I manually tested PySpark with Python 3.10 with this patch: ```bash ./bin/pyspark ``` ``` import py4j py4j.__version__ spark.range(10).show() ``` ``` Using Python version 3.10.0 (default, Mar 3 2022 03:57:21) Spark context Web UI available at http://172.30.5.50:4040 Spark context available as 'sc' (master = local[*], app id = local-1647571387534). SparkSession available as 'spark'. >>> import py4j >>> py4j.__version__ '0.10.9.5' >>> spark.range(10).show() +---+ | id| +---+ ... ``` Closes #35907 from HyukjinKwon/SPARK-38563-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 97335ea) Signed-off-by: Hyukjin Kwon <[email protected]>
This PR is a retry of apache#35871 with bumping up the version to 0.10.9.5. It was reverted because of Python 3.10 is broken, and Python 3.10 was not officially supported in Py4J. In Py4J 0.10.9.5, the issue was fixed (py4j/py4j#475), and it added Python 3.10 support officially with CI set up (py4j/py4j#477). See apache#35871 See apache#35871 Py4J sets up Python 3.10 CI now, and I manually tested PySpark with Python 3.10 with this patch: ```bash ./bin/pyspark ``` ``` import py4j py4j.__version__ spark.range(10).show() ``` ``` Using Python version 3.10.0 (default, Mar 3 2022 03:57:21) Spark context Web UI available at http://172.30.5.50:4040 Spark context available as 'sc' (master = local[*], app id = local-1647571387534). SparkSession available as 'spark'. >>> import py4j >>> py4j.__version__ '0.10.9.5' >>> spark.range(10).show() +---+ | id| +---+ ... ``` Closes apache#35907 from HyukjinKwon/SPARK-38563-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 97335ea) Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
This PR upgrade Py4J 0.10.9.4, with relevant documentation changes.
Why are the changes needed?
Py4J 0.10.9.3 has a resource leak issue when pinned thread mode is enabled - it's enabled by default in PySpark at 41af409.
We worked around this by enforcing users to use
InheritableThread
orinhteritable_thread_target
as a workaround.After upgrading, we don't need to enforce users anymore because it automatically cleans up, see also py4j/py4j#471
Does this PR introduce any user-facing change?
Yes, users don't have to use
InheritableThread
orinhteritable_thread_target
to avoid resource leaking problem anymore.How was this patch tested?
CI in this PR should test it out.