-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23809][SQL] Active SparkSession should be set by getOrCreate #20927
Conversation
@@ -34,6 +34,9 @@ private[spark] class TestSparkSession(sc: SparkContext) extends SparkSession(sc) | |||
this(new SparkConf) | |||
} | |||
|
|||
SparkSession.setDefaultSession(this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intentional conflict with #20926 , should resolve once one is merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will merge that one first.
LGTM |
Test build #88681 has finished for PR 20927 at commit
|
Per @marmbrus 's suggestion also added a |
Test build #88723 has finished for PR 20927 at commit
|
Test build #88724 has finished for PR 20927 at commit
|
@ericl Address the conflict? |
@gatorsmile done! |
LGTM pending Jenkins |
Test build #88856 has finished for PR 20927 at commit
|
Thanks! Merged to master |
* Returns the currently active SparkSession, otherwise the default one. If there is no default | ||
* SparkSession, throws an exception. | ||
* | ||
* @since 2.4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericl Could you submit a PR to Spark 2.3 without this new API?
## What changes were proposed in this pull request? Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists. The semantics here can be cleaned up if we also set the active session when the default session is set. Related: https://github.com/apache/spark/pull/20926/files ## How was this patch tested? Unit test, existing test. Note that if apache#20926 merges first we should also update the tests there. Author: Eric Liang <[email protected]> Closes apache#20927 from ericl/active-session-cleanup.
## What changes were proposed in this pull request? Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists. The semantics here can be cleaned up if we also set the active session when the default session is set. Related: https://github.com/apache/spark/pull/20926/files ## How was this patch tested? Unit test, existing test. Note that if apache#20926 merges first we should also update the tests there. Author: Eric Liang <[email protected]> Closes apache#20927 from ericl/active-session-cleanup.
## What changes were proposed in this pull request? Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists. The semantics here can be cleaned up if we also set the active session when the default session is set. Related: https://github.com/apache/spark/pull/20926/files ## How was this patch tested? Unit test, existing test. Note that if apache#20926 merges first we should also update the tests there. Author: Eric Liang <[email protected]> Closes apache#20927 from ericl/active-session-cleanup.
…OrCreate This backports #20927 to branch-2.3 ## What changes were proposed in this pull request? Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists. The semantics here can be cleaned up if we also set the active session when the default session is set. Related: https://github.com/apache/spark/pull/20926/files ## How was this patch tested? Unit test, existing test. Note that if #20926 merges first we should also update the tests there. Author: Eric Liang <[email protected]> Closes #20971 from ericl/backport-23809.
Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists. The semantics here can be cleaned up if we also set the active session when the default session is set. Related: https://github.com/apache/spark/pull/20926/files Unit test, existing test. Note that if apache#20926 merges first we should also update the tests there. Author: Eric Liang <[email protected]> Closes apache#20927 from ericl/active-session-cleanup. (cherry picked from commit 359375e) NOTE: This cherry-pick includes only some of the original changes, because LIHADOOP-54684 already made some identical changes as part of test resolution. This now ensures the entirety of the original SPARK-23809 code is backported. Also note that when SPARK-23809 was originally backported to branch-2.3 in PR apache#20971, the new `active` API was _not_ included since new APIs shouldn't generally be added in patch releases. That new API is exactly what is needed, so we backport directly from the original commit. RB=2575134 BUG=BDP-6088 G=spark-reviewers R=mmuralid,wyzhang,smahadik A=mmuralid,wyzhang
What changes were proposed in this pull request?
Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists.
The semantics here can be cleaned up if we also set the active session when the default session is set.
Related: https://github.com/apache/spark/pull/20926/files
How was this patch tested?
Unit test, existing test. Note that if #20926 merges first we should also update the tests there.