-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35303][PYTHON] Enable pinned thread mode by default #32429
Conversation
There are couple of todos such as updating migration guide so I marked it as a draft. I will take a look more and see if there are potential side effects to warn users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable for Apache Spark 3.2.0. I'll look forward to seeing the migration guide.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Thanks @srowen and @dongjoon-hyun. I will update the migration guide soon. |
2c2c519
to
59a13dd
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
a9d3926
to
ab7f2f6
Compare
This PR is ready for a review. cc @WeichenXu123 too FYI |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
retest this please |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #139924 has finished for PR 32429 at commit
|
Okay, if the users are using a Let me merge this and try it out. The change here is correct in principle. |
Merged to master. |
…hen starting the thread, and use inheritable thread in the current codebase ### What changes were proposed in this pull request? This PR is a followup of #32429 and #32644. I was thinking about creating separate PRs but decided to include all in this PR because it shares the same context, and should be easier to review together. This PR includes: - Use `InheritableThread` and `inheritable_thread_target` in the current code base to prevent potential resource leak (since we enabled pinned thread mode by default now at #32429) - Copy local properties when `start` at `InheritableThread` is called to mimic JVM behaviour. Previously it was copied when `InheritableThread` instance was created (related to #32644). - #32429 missed one place at `inheritable_thread_target` (https://github.com/apache/spark/blob/master/python/pyspark/util.py#L308). More specifically, I missed one place that should enable pinned thread mode by default. ### Why are the changes needed? To mimic the JVM behaviour about thread lifecycle ### Does this PR introduce _any_ user-facing change? Ideally no. One possible case is that users use `InheritableThread` with pinned thread mode enabled. In this case, the local properties will be copied when starting the thread instead of defining the `InheritableThread` object. This is a small difference that wouldn't likely affect end users. ### How was this patch tested? Existing tests should cover this. Closes #32962 from HyukjinKwon/SPARK-35498-SPARK-35303. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
@HyukjinKwon So pinned thread mode is enabled by default in spark 3.2.0 onwards? The only thing users need to do if they need to spawn threads, is use
Also, how using a |
Yes. For |
What changes were proposed in this pull request?
PySpark added pinned thread mode at #24898 to sync Python thread to JVM thread. Previously, one JVM thread could be reused which ends up with messed inheritance hierarchy such as thread local especially when multiple jobs run in parallel. To completely fix this, we should enable this mode by default.
Why are the changes needed?
To correctly support parallel job submission and management.
Does this PR introduce any user-facing change?
Yes, now Python thread is mapped to JVM thread one to one.
How was this patch tested?
Existing tests should cover it.