-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6869][PySpark] Add pyspark archives path to PYTHONPATH #5580
Changes from all commits
f72987c
9f31dac
31e8e06
e0179be
0d2baf7
9396346
3b1e4c8
5192cca
f11f84a
e6b573b
4b8a3ed
e7bd971
9d87c3f
20402cd
150907b
f0b4ed8
008850a
1c8f664
c2ad0f9
66ffa43
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -468,6 +468,17 @@ private[spark] class Client( | |
env("SPARK_YARN_USER_ENV") = userEnvs | ||
} | ||
|
||
// if spark.submit.pyArchives is in sparkConf, append pyArchives to PYTHONPATH | ||
// that can be passed on to the ApplicationMaster and the executors. | ||
if (sparkConf.contains("spark.submit.pyArchives")) { | ||
var pythonPath = sparkConf.get("spark.submit.pyArchives") | ||
if (env.contains("PYTHONPATH")) { | ||
pythonPath = Seq(env.get("PYTHONPATH"), pythonPath).mkString(File.pathSeparator) | ||
} | ||
env("PYTHONPATH") = pythonPath | ||
sparkConf.setExecutorEnv("PYTHONPATH", pythonPath) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It feels there's something missing here. If the archives are something that is not There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if archives is not local:, SparkSubmit have put archives to dist files. |
||
} | ||
|
||
// In cluster mode, if the deprecated SPARK_JAVA_OPTS is set, we need to propagate it to | ||
// executors. But we can't just set spark.executor.extraJavaOptions, because the driver's | ||
// SparkContext will not let that set spark* system properties, which is expected behavior for | ||
|
@@ -1074,7 +1085,7 @@ object Client extends Logging { | |
val hiveConf = hiveClass.getMethod("getConf").invoke(hive) | ||
val hiveConfClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf") | ||
|
||
val hiveConfGet = (param:String) => Option(hiveConfClass | ||
val hiveConfGet = (param: String) => Option(hiveConfClass | ||
.getMethod("get", classOf[java.lang.String]) | ||
.invoke(hiveConf, param)) | ||
|
||
|
@@ -1096,7 +1107,7 @@ object Client extends Logging { | |
|
||
val hive2Token = new Token[DelegationTokenIdentifier]() | ||
hive2Token.decodeFromUrlString(tokenStr) | ||
credentials.addToken(new Text("hive.server2.delegation.token"),hive2Token) | ||
credentials.addToken(new Text("hive.server2.delegation.token"), hive2Token) | ||
logDebug("Added hive.Server2.delegation.token to conf.") | ||
hiveClass.getMethod("closeCurrent").invoke(null) | ||
} else { | ||
|
@@ -1141,13 +1152,13 @@ object Client extends Logging { | |
|
||
logInfo("Added HBase security token to credentials.") | ||
} catch { | ||
case e:java.lang.NoSuchMethodException => | ||
case e: java.lang.NoSuchMethodException => | ||
logInfo("HBase Method not found: " + e) | ||
case e:java.lang.ClassNotFoundException => | ||
case e: java.lang.ClassNotFoundException => | ||
logDebug("HBase Class not found: " + e) | ||
case e:java.lang.NoClassDefFoundError => | ||
case e: java.lang.NoClassDefFoundError => | ||
logDebug("HBase Class not found: " + e) | ||
case e:Exception => | ||
case e: Exception => | ||
logError("Exception when obtaining HBase security token: " + e) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we just make sure the zip is built during the build then we don't need to do the zip in the code. Just require it already there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that is no effect. maybe sometime we upgrade just with coping spark.jar. so that is good for this situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow, if you just upgrade spark.jar then there are no change to the python scripts so you don't need to put new pyspark.zip. If there are changes then you either need to copy over the new python scripts or put a new pyspark.zip on there. It seems putting new pyspark.zip on there would be easier. Although I guess you need the python scripts there anyway for client mode so you probably need both.
In many cases I wouldn't expect a user to have write permissions on the python/lib directory. I would expect that to be a privileged operation. In that case the zip would fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i agree with you. thanks