Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23809][SQL] Active SparkSession should be set by getOrCreate #20927

Closed
wants to merge 6 commits into from

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Mar 28, 2018

What changes were proposed in this pull request?

Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists.

The semantics here can be cleaned up if we also set the active session when the default session is set.

Related: https://github.com/apache/spark/pull/20926/files

How was this patch tested?

Unit test, existing test. Note that if #20926 merges first we should also update the tests there.

@@ -34,6 +34,9 @@ private[spark] class TestSparkSession(sc: SparkContext) extends SparkSession(sc)
this(new SparkConf)
}

SparkSession.setDefaultSession(this)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional conflict with #20926 , should resolve once one is merged

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will merge that one first.

@ericl
Copy link
Contributor Author

ericl commented Mar 28, 2018

@gatorsmile @marmbrus

@gatorsmile
Copy link
Member

LGTM

@SparkQA
Copy link

SparkQA commented Mar 29, 2018

Test build #88681 has finished for PR 20927 at commit 8f3cbf3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ericl
Copy link
Contributor Author

ericl commented Mar 30, 2018

Per @marmbrus 's suggestion also added a SparkSession.active call to cover any edge case where the thread-local might not be set correctly. Spark code should prefer to use this instead of getActive.get moving forward.

@SparkQA
Copy link

SparkQA commented Mar 30, 2018

Test build #88723 has finished for PR 20927 at commit f840ffa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 30, 2018

Test build #88724 has finished for PR 20927 at commit b1f59f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

@ericl Address the conflict?

@ericl
Copy link
Contributor Author

ericl commented Apr 3, 2018

@gatorsmile done!

@gatorsmile
Copy link
Member

LGTM pending Jenkins

@SparkQA
Copy link

SparkQA commented Apr 3, 2018

Test build #88856 has finished for PR 20927 at commit 6514673.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merged to master

* Returns the currently active SparkSession, otherwise the default one. If there is no default
* SparkSession, throws an exception.
*
* @since 2.4.0
Copy link
Member

@gatorsmile gatorsmile Apr 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericl Could you submit a PR to Spark 2.3 without this new API?

@asfgit asfgit closed this in 359375e Apr 4, 2018
ericl added a commit to ericl/spark that referenced this pull request Apr 4, 2018
## What changes were proposed in this pull request?

Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists.

The semantics here can be cleaned up if we also set the active session when the default session is set.

Related: https://github.com/apache/spark/pull/20926/files

## How was this patch tested?

Unit test, existing test. Note that if apache#20926 merges first we should also update the tests there.

Author: Eric Liang <[email protected]>

Closes apache#20927 from ericl/active-session-cleanup.
robert3005 pushed a commit to palantir/spark that referenced this pull request Apr 4, 2018
## What changes were proposed in this pull request?

Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists.

The semantics here can be cleaned up if we also set the active session when the default session is set.

Related: https://github.com/apache/spark/pull/20926/files

## How was this patch tested?

Unit test, existing test. Note that if apache#20926 merges first we should also update the tests there.

Author: Eric Liang <[email protected]>

Closes apache#20927 from ericl/active-session-cleanup.
mshtelma pushed a commit to mshtelma/spark that referenced this pull request Apr 5, 2018
## What changes were proposed in this pull request?

Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists.

The semantics here can be cleaned up if we also set the active session when the default session is set.

Related: https://github.com/apache/spark/pull/20926/files

## How was this patch tested?

Unit test, existing test. Note that if apache#20926 merges first we should also update the tests there.

Author: Eric Liang <[email protected]>

Closes apache#20927 from ericl/active-session-cleanup.
asfgit pushed a commit that referenced this pull request Apr 8, 2018
…OrCreate

This backports #20927 to branch-2.3

## What changes were proposed in this pull request?

Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists.

The semantics here can be cleaned up if we also set the active session when the default session is set.

Related: https://github.com/apache/spark/pull/20926/files

## How was this patch tested?

Unit test, existing test. Note that if #20926 merges first we should also update the tests there.

Author: Eric Liang <[email protected]>

Closes #20971 from ericl/backport-23809.
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
Currently, the active spark session is set inconsistently (e.g., in createDataFrame, prior to query execution). Many places in spark also incorrectly query active session when they should be calling activeSession.getOrElse(defaultSession) and so might get None even if a Spark session exists.

The semantics here can be cleaned up if we also set the active session when the default session is set.

Related: https://github.com/apache/spark/pull/20926/files

Unit test, existing test. Note that if apache#20926 merges first we should also update the tests there.

Author: Eric Liang <[email protected]>

Closes apache#20927 from ericl/active-session-cleanup.

(cherry picked from commit 359375e)

NOTE: This cherry-pick includes only some of the original changes, because LIHADOOP-54684 already made some identical
changes as part of test resolution. This now ensures the entirety of the original SPARK-23809 code is backported.
Also note that when SPARK-23809 was originally backported to branch-2.3 in PR apache#20971, the new `active` API was _not_
included since new APIs shouldn't generally be added in patch releases. That new API is exactly what is needed,
so we backport directly from the original commit.

RB=2575134
BUG=BDP-6088
G=spark-reviewers
R=mmuralid,wyzhang,smahadik
A=mmuralid,wyzhang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants