-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-2450 Adds executor log links to Web UI #3486
Conversation
Can one of the admins verify this patch? |
add to whitelist. Great to see this being implemented |
Test build #23952 has started for PR 3486 at commit
|
Test build #23952 has finished for PR 3486 at commit
|
Test FAILed. |
def extractLogUrls : Map[String, String] = { | ||
val prefix = "SPARK_LOG_URL_" | ||
sys.env.filterKeys(_.startsWith(prefix)) | ||
.map(e => (e._1.substring(prefix.length).toLowerCase, e._2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should only be indented 2 spaces
|
e78308e
to
864fc47
Compare
Test build #24011 has started for PR 3486 at commit
|
Test build #24012 has started for PR 3486 at commit
|
Test build #24011 has finished for PR 3486 at commit
|
Test PASSed. |
Test build #24012 has finished for PR 3486 at commit
|
Test PASSed. |
/** | ||
* Called when the driver registers a new executor. | ||
*/ | ||
def onExecutorAdded(executorAdded: SparkListenerExecutorAdded) { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. This is going to be one of those cases where it breaks existing code that extends this class. Not sure if there's a good workaround (even though it is marked as @DeveloperApi
). :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW doesn't this break the build? There are a few listeners in Spark code itself (e.g. EventLoggingListener
) which should have broken because of this.
(BTW fixing that listener means you'll probably need to touch JsonProtocol
to serialize these new events to the event log... and you'll need to be careful not to keep the log URLs in the replayed UIs since they'll most probably be broken links at that point. Meaning that probably the UI listener should nuke the log URLs when the "executor removed" message is handled.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah wait. I see. These methods have default implementations, so they'll only affect people extending SparkListener
from Java. Still, we should probably save these events to the log for replay later.
Hey @ksakellis is this still WIP? Should I start doing a detailed review yet? |
@andrewor14 Yes you can start doing a review. I wanted to post this just to get sign off on the strategy i'm using. If people feel good about this, i'll post more unit tests/fix comments before asking for it to be merged. |
context.system.eventStream.subscribe(self, classOf[RemotingLifecycleEvent]) | ||
} | ||
|
||
def extractLogUrls : Map[String, String] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no space before :
Test build #26782 has finished for PR 3486 at commit
|
Test FAILed. |
Test build #26777 timed out for PR 3486 at commit |
Test FAILed. |
Hi @ksakellis, This patch looks like it's in pretty good shape overall. When displaying the UI in the HistoryServer or the standalone Master's display of completed application UIs, the executor log links might not always work (e.g. if the worker serving the logs is dead). This will sometimes work, though, since it might be common for the standalone worker or Yarn node daemons to outlive the application. I don't see a great solution for this, but I'm open to any ideas / suggestions. I don't think this should necessarily block this PR, though. I think I understand the rationale for why we use environment variables for passing the log viewer links: the CoarseGrainedSchedulerBackend, which posts the ExecutorAdded event, is used in both Yarn and Standalone modes and therefore can't know how to map executor ids to container log paths (since it doesn't know about node manager URLs). However, each deployment mode has a different backend for launching executors, so we can determine the URL there and pass it back to the driver. I agree with Andrew that this dataflow seems a bit non-intuitive, but I don't necessarily see a better way to handle this without modifying a bunch of other components. I don't think that the current design precludes us from changing this in the future. We are kind of locking ourselves into exposing the logName => logUrl map in a listener event, but this seems reasonable (this is independent of how we get the data backing this event to the source where the event is posted to the bus). Two minor code changes that we need:
Sorry for the mess due to my earlier changes. |
Adds links to stderr/stdout in the executor tab of the webUI for: 1) Standalone 2) Yarn client 3) Yarn cluster This tries to add the log url support in a general way so as to make it easy to add support for all the cluster managers. This is done by using environment variables to pass to the executor the log urls. The SPARK_LOG_URL_ prefix is used and so additional logs besides stderr/stdout can also be added. To propagate this information to the UI we use the onExecutorAdded spark listener event. Although this commit doesn't add log urls when running on a mesos cluster, it should be possible to add using the same mechanism.
Reset listener after each test Don't null listener out at end of main().
df5ce42
to
d190936
Compare
Test build #26883 has started for PR 3486 at commit
|
Test build #26883 has finished for PR 3486 at commit
|
Test FAILed. |
Jenkins, retest this please. |
Test build #26894 has started for PR 3486 at commit
|
Test build #26894 has finished for PR 3486 at commit
|
Test PASSed. |
@ksakellis Thanks for cleaning up the mess from my earlier commits. This looks good to me, so I'm going to merge it into |
Adds links to stderr/stdout in the executor tab of the webUI for: 1) Standalone 2) Yarn client 3) Yarn cluster This tries to add the log url support in a general way so as to make it easy to add support for all the cluster managers. This is done by using environment variables to pass to the executor the log urls. The SPARK_LOG_URL_ prefix is used and so additional logs besides stderr/stdout can also be added. To propagate this information to the UI we use the onExecutorAdded spark listener event. Although this commit doesn't add log urls when running on a mesos cluster, it should be possible to add using the same mechanism. Author: Kostas Sakellis <[email protected]> Author: Josh Rosen <[email protected]> Closes #3486 from ksakellis/kostas-spark-2450 and squashes the following commits: d190936 [Josh Rosen] Fix a few minor style / formatting nits. Reset listener after each test Don't null listener out at end of main(). 8673fe1 [Kostas Sakellis] CR feedback. Hide the log column if there are no logs available 5bf6952 [Kostas Sakellis] [SPARK-2450] [CORE] Adds exeuctor log links to Web UI (cherry picked from commit 32e964c) Signed-off-by: Josh Rosen <[email protected]>
@@ -134,6 +135,12 @@ private[spark] class ExecutorRunner( | |||
// In case we are running this from within the Spark Shell, avoid creating a "scala" | |||
// parent process for the executor command | |||
builder.environment.put("SPARK_LAUNCH_WITH_SCALA", "0") | |||
|
|||
// Add webUI log urls | |||
val baseUrl = s"http://$host:$webUiPort/logPage/?appId=$appId&executorId=$execId&logType=" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like $host
here might be a machine's hostname, but we probably want this to reflect SPARK_PUBLIC_DNS (or whatever the new system property equivalent is) so that we generate an externally-accessible URL.
Adds links to stderr/stdout in the executor tab of the webUI for:
This tries to add the log url support in a general way so as to make it easy to add support for all the
cluster managers. This is done by using environment variables to pass to the executor the log urls. The
SPARK_LOG_URL_ prefix is used and so additional logs besides stderr/stdout can also be added.
To propagate this information to the UI we use the onExecutorAdded spark listener event.
Although this commit doesn't add log urls when running on a mesos cluster, it should be possible to add using the same mechanism.