Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[luigi.contrib.pyspark_runner] SparkSession support in PySparkTask #2862

Merged
merged 6 commits into from
Jan 28, 2020

Conversation

drowoseque
Copy link
Contributor

Description

luigi.contrib.spark.PySparkTask now supports spark_session as a first argument in main().
To use it just enable pyspark_runner.use_spark_session = True in luigi.cfg
This change allows users to work with spark_sesion it in their jobs on Apache Spark 2+ version.

Motivation and Context

Apache Spark 2.0.0 have introduced a new entrypoint object which is SparkSession instead of SparkContext in previous releases.
luigi.contrib.spark.PySparkTask in its main function supports only sc argument which doesn't allow you to use spark session object.

Have you tested this? If so, how?

  • I've added unit tests into my PR
  • I've checked locally that this works as expected (for both old-style and new-style syntax)

@drowoseque
Copy link
Contributor Author

@dlstadther
@honnix
@ntim
please, take a look at that

@Tarrasch
Copy link
Contributor

You probably want other Spark users to comment on the usefulness of this PR. :)

@drowoseque
Copy link
Contributor Author

You probably want other Spark users to comment on the usefulness of this PR. :)

yes, but none of them (PysparkTask contributors) seem to be online last year except @ntim and you @Tarrasch

Copy link
Contributor

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check the details. But I'm fine with merge.

Can you get a coworker to read this code too?

luigi/contrib/pyspark_runner.py Outdated Show resolved Hide resolved
luigi/contrib/pyspark_runner.py Show resolved Hide resolved
@drowoseque
Copy link
Contributor Author

@b2arn
PTAL

@drowoseque drowoseque requested a review from Tarrasch December 15, 2019 11:31
@drowoseque
Copy link
Contributor Author

@Tarrasch done

@drowoseque
Copy link
Contributor Author

@honnix
@dlstadther
PTAL

luigi/contrib/pyspark_runner.py Show resolved Hide resolved
luigi/contrib/pyspark_runner.py Outdated Show resolved Hide resolved
luigi/contrib/pyspark_runner.py Show resolved Hide resolved
luigi/contrib/pyspark_runner.py Show resolved Hide resolved
luigi/contrib/pyspark_runner.py Show resolved Hide resolved
* kwarg reference in _entry_point_class
Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@drowoseque
Copy link
Contributor Author

@GoodDok
@mrk-its
PTAL as well

@drowoseque
Copy link
Contributor Author

@Tarrasch @honnix
please up

@honnix honnix merged commit 2d5fbc8 into spotify:master Jan 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants