-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[luigi.contrib.pyspark_runner] SparkSession support in PySparkTask #2862
Conversation
@dlstadther |
You probably want other Spark users to comment on the usefulness of this PR. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't check the details. But I'm fine with merge.
Can you get a coworker to read this code too?
@b2arn |
@Tarrasch done |
@honnix |
* kwarg reference in _entry_point_class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Description
luigi.contrib.spark.PySparkTask
now supportsspark_session
as a first argument inmain()
.To use it just enable
pyspark_runner.use_spark_session = True
inluigi.cfg
This change allows users to work with
spark_sesion
it in their jobs on Apache Spark 2+ version.Motivation and Context
Apache Spark 2.0.0 have introduced a new entrypoint object which is
SparkSession
instead ofSparkContext
in previous releases.luigi.contrib.spark.PySparkTask
in itsmain
function supports onlysc
argument which doesn't allow you to use spark session object.Have you tested this? If so, how?