Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1774] Respect SparkSubmit --jars on YARN (client) #710

Closed
wants to merge 7 commits into from

Conversation

andrewor14
Copy link
Contributor

SparkSubmit ignores --jars for YARN client. This is a bug.

This PR also automatically adds the application jar to spark.jar. Previously, when running as yarn-client, you must specify the jar additionally through --files (because --jars didn't work). Now you don't have to explicitly specify it through either.

Tested on a YARN cluster.

@andrewor14
Copy link
Contributor Author

We may not always want to automatically add the jar to spark.jar. I will spend some time looking into cases in which we want to skip this (which likely involves yarn-cluster and python files). Do not merge this yet.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14842/

@sryza
Copy link
Contributor

sryza commented May 9, 2014

Maybe you're already aware of this - in YARN cluster mode, the app jar is renamed to "app.jar" on the cluster machine and addJar'd . So adding it to spark.jars with its original name wouldn't work. It could make sense to not rename it on the cluster machine, but there might be backwards compatibility issues there with supporting YARN jobs outside spark-submit.

@andrewor14
Copy link
Contributor Author

Yes, for yarn-cluster, we're not replacing the app.jar, so there should be no backward compatibility issues. The only question is whether adding it to spark.jar additionally is redundant. From my digging through the code so far I think it is, because SparkContext goes ahead and adds it to its HTTP server, even though it's already pulled in as app.jar on all containers.

@sryza
Copy link
Contributor

sryza commented May 9, 2014

That sounds correct to me (that it's redundant). Just looked through the code to confirm.

andrewor14 added 2 commits May 9, 2014 11:44
These scenarios already distribute the primary resource properly. See
code comments for more detail.
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

OptionAssigner(args.files, LOCAL | STANDALONE | MESOS, false, sysProp = "spark.files"),
OptionAssigner(args.files, LOCAL | STANDALONE | MESOS, true, sysProp = "spark.files"),
OptionAssigner(args.jars, LOCAL | STANDALONE | MESOS, false, sysProp = "spark.jars")
OptionAssigner(args.jars, ALL_CLUSTER_MGRS, false, sysProp = "spark.jars")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the line that makes SparkSubmit respect --jars for yarn-client mode

@andrewor14
Copy link
Contributor Author

Other than the two lines I pointed out in the comments, the rest are basically format and test fixes.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14850/

@pwendell
Copy link
Contributor

Thanks @andrewor14 - this looks good. I'm merging it!

asfgit pushed a commit that referenced this pull request May 11, 2014
SparkSubmit ignores `--jars` for YARN client. This is a bug.

This PR also automatically adds the application jar to `spark.jar`. Previously, when running as yarn-client, you must specify the jar additionally through `--files` (because `--jars` didn't work). Now you don't have to explicitly specify it through either.

Tested on a YARN cluster.

Author: Andrew Or <[email protected]>

Closes #710 from andrewor14/yarn-jars and squashes the following commits:

35d1928 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars
c27bf6c [Andrew Or] For yarn-cluster and python, do not add primaryResource to spark.jar
c92c5bf [Andrew Or] Minor cleanups
269f9f3 [Andrew Or] Fix format
013d840 [Andrew Or] Fix tests
1407474 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars
3bb75e8 [Andrew Or] Allow SparkSubmit --jars to take effect in yarn-client mode
(cherry picked from commit 83e0424)

Signed-off-by: Patrick Wendell <[email protected]>
@asfgit asfgit closed this in 83e0424 May 11, 2014
@andrewor14 andrewor14 deleted the yarn-jars branch May 13, 2014 00:32
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
SparkSubmit ignores `--jars` for YARN client. This is a bug.

This PR also automatically adds the application jar to `spark.jar`. Previously, when running as yarn-client, you must specify the jar additionally through `--files` (because `--jars` didn't work). Now you don't have to explicitly specify it through either.

Tested on a YARN cluster.

Author: Andrew Or <[email protected]>

Closes apache#710 from andrewor14/yarn-jars and squashes the following commits:

35d1928 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars
c27bf6c [Andrew Or] For yarn-cluster and python, do not add primaryResource to spark.jar
c92c5bf [Andrew Or] Minor cleanups
269f9f3 [Andrew Or] Fix format
013d840 [Andrew Or] Fix tests
1407474 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars
3bb75e8 [Andrew Or] Allow SparkSubmit --jars to take effect in yarn-client mode
Alexis-D pushed a commit to Alexis-D/spark that referenced this pull request Nov 16, 2020
…'s transitive packages (apache#710)

* use verbosity flag in install

* always pass down file instruction for executors conda env

* check style

* reuse yarn test to double check executor conda env

* fix yarn cluster test python file

* better documentation, tests

* update doc w/ checkstyle

Co-authored-by: Willi Raschkowski <[email protected]>

* remove unnecessary import test

Co-authored-by: Willi Raschkowski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants