Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6175] Fix standalone executor log links when ephemeral ports or SPARK_PUBLIC_DNS are used #4903

Closed
wants to merge 5 commits into from

Conversation

JoshRosen
Copy link
Contributor

This patch fixes two issues with the executor log viewing links added in Spark 1.3. In standalone mode, the log URLs might include a port value of 0 rather than the actual bound port of the UI, which broke the ability to view logs from workers whose web UIs had been configured to bind to ephemeral ports. In addition, the URLs used workers' local hostnames instead of respecting SPARK_PUBLIC_DNS, which prevented this feature from working properly on Spark EC2 clusters because the links would point to internal DNS names instead of external ones.

I included tests for both of these bugs:

  • We now browse to the URLs and verify that they point to the expected pages.
  • To test SPARK_PUBLIC_DNS, I changed the code that reads the environment variable to do so via SparkConf.getenv, then used a custom SparkConf subclass to mock the environment variable (this pattern is used elsewhere in Spark's tests).

@JoshRosen
Copy link
Contributor Author

/cc @ksakellis @andrewor14. Sorry to let these issues slip by in my code review of #3486.

@SparkQA
Copy link

SparkQA commented Mar 5, 2015

Test build #28281 has started for PR 4903 at commit cfec135.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 5, 2015

Test build #28282 has started for PR 4903 at commit 5577f41.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 5, 2015

Test build #28281 has finished for PR 4903 at commit cfec135.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28281/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Mar 5, 2015

Test build #28282 has finished for PR 4903 at commit 5577f41.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28282/
Test PASSed.

@@ -538,10 +539,10 @@ private[spark] object Worker extends Logging {
memory: Int,
masterUrls: Array[String],
workDir: String,
workerNumber: Option[Int] = None): (ActorSystem, Int) = {
workerNumber: Option[Int] = None,
conf: SparkConf = new SparkConf): (ActorSystem, Int) = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor with a custom conf is only called from LocalSparkCluster. We need to use the supplied SparkConf here so that the environment variable mocking works correctly.

@nchammas
Copy link
Contributor

nchammas commented Mar 5, 2015

Does this PR relate at all to #4038, which tries to fix spark-ec2 clusters launched into a private VPC with no public IPs?

@andrewor14
Copy link
Contributor

LGTM, feel free to merge

@JoshRosen
Copy link
Contributor Author

@nchammas It's not related; this is concerned with cases where links in the web UI use internal addresses rather than the public addresses configured through the SPARK_PUBLIC_DNS environment variable. That other issue is concerned with not being able to launch instances in VPCs that are configured without public DNS.

@JoshRosen
Copy link
Contributor Author

I ran some manual tests, too, using

SPARK_PUBLIC_DNS="helloworld" ./bin/spark-shell --master local-cluster[2,2,512]

and verified that the log URLs respect this environment variable. I'm going to merge this to master (1.4.0) and branch-1.3 (1.3.0). Thanks!

asfgit pushed a commit that referenced this pull request Mar 5, 2015
…r SPARK_PUBLIC_DNS are used

This patch fixes two issues with the executor log viewing links added in Spark 1.3.  In standalone mode, the log URLs might include a port value of 0 rather than the actual bound port of the UI, which broke the ability to view logs from workers whose web UIs had been configured to bind to ephemeral ports.  In addition, the URLs used workers' local hostnames instead of respecting SPARK_PUBLIC_DNS, which prevented this feature from working properly on Spark EC2 clusters because the links would point to internal DNS names instead of external ones.

I included tests for both of these bugs:

- We now browse to the URLs and verify that they point to the expected pages.
- To test SPARK_PUBLIC_DNS, I changed the code that reads the environment variable to do so via `SparkConf.getenv`, then used a custom SparkConf subclass to mock the environment variable (this pattern is used elsewhere in Spark's tests).

Author: Josh Rosen <[email protected]>

Closes #4903 from JoshRosen/SPARK-6175 and squashes the following commits:

5577f41 [Josh Rosen] Remove println
cfec135 [Josh Rosen] Use webUi.boundPort and publicAddress in log links
27918c7 [Josh Rosen] Add failing unit tests for standalone log URL viewing
c250fbe [Josh Rosen] Respect SparkConf in local-cluster Workers.
422a2ef [Josh Rosen] Use conf.getenv to read SPARK_PUBLIC_DNS

(cherry picked from commit 424a86a)
Signed-off-by: Josh Rosen <[email protected]>
@asfgit asfgit closed this in 424a86a Mar 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants