Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18256] Improve the performance of event log replay in HistoryServer #15756

Closed
wants to merge 3 commits into from

Conversation

JoshRosen
Copy link
Contributor

@JoshRosen JoshRosen commented Nov 3, 2016

What changes were proposed in this pull request?

This patch significantly improves the performance of event log replay in the HistoryServer via two simple changes:

  • Don't use extractOpt: it turns out that json4s's extractOpt method uses exceptions for control flow, causing huge performance bottlenecks due to the overhead of initializing exceptions. To avoid this overhead, we can simply use our own Utils.jsonOption method. This patch replaces all uses of extractOpt with Utils.jsonOption and adds a style checker rule to ban the use of the slow extractOpt method.
  • Don't call Utils.getFormattedClassName for every event: the old code called Utils.getFormattedClassName dozens of times per replayed event in order to match up class names in events with SparkListener event names. By simply storing the results of these calls in constants rather than recomputing them, we're able to eliminate a huge performance hotspot by removing thousands of expensive Class.getSimpleName calls.

How was this patch tested?

Tested by profiling the replay of a long event log using YourKit. For an event log containing 1000+ jobs, each of which had thousands of tasks, the changes in this patch cut the replay time in half:

image

Prior to this patch's changes, the two slowest methods in log replay were internal exceptions thrown by Json4S and calls to Class.getSimpleName():

image

After this patch, these hotspots are completely eliminated.

@@ -540,7 +544,8 @@ private[spark] object JsonProtocol {

def taskStartFromJson(json: JValue): SparkListenerTaskStart = {
val stageId = (json \ "Stage ID").extract[Int]
val stageAttemptId = (json \ "Stage Attempt ID").extractOpt[Int].getOrElse(0)
val stageAttemptId =
Utils.jsonOption(json \ "Stage Attempt ID").map(_.extract[Int]).getOrElse(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference? Is extractOpt really slow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if so maybe we should add a scalastyle to ban it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait you already did 😄

@andrewor14
Copy link
Contributor

LGTM. That's a massive amount of time spent in Class.getSimpleName!

@SparkQA
Copy link

SparkQA commented Nov 3, 2016

Test build #68077 has finished for PR 15756 at commit 2717f79.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Nov 4, 2016

Test build #68149 has finished for PR 15756 at commit 2717f79.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Nov 4, 2016

LGTM

@yhuai
Copy link
Contributor

yhuai commented Nov 5, 2016

Cool. Merging to master!

@asfgit asfgit closed this in 0e3312e Nov 5, 2016
@JoshRosen JoshRosen deleted the speed-up-jsonprotocol branch November 5, 2016 19:50
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…erver

## What changes were proposed in this pull request?

This patch significantly improves the performance of event log replay in the HistoryServer via two simple changes:

- **Don't use `extractOpt`**: it turns out that `json4s`'s `extractOpt` method uses exceptions for control flow, causing huge performance bottlenecks due to the overhead of initializing exceptions. To avoid this overhead, we can simply use our own` Utils.jsonOption` method. This patch replaces all uses of `extractOpt` with `Utils.jsonOption` and adds a style checker rule to ban the use of the slow `extractOpt` method.
- **Don't call `Utils.getFormattedClassName` for every event**: the old code called` Utils.getFormattedClassName` dozens of times per replayed event in order to match up class names in events with SparkListener event names. By simply storing the results of these calls in constants rather than recomputing them, we're able to eliminate a huge performance hotspot by removing thousands of expensive `Class.getSimpleName` calls.

## How was this patch tested?

Tested by profiling the replay of a long event log using YourKit. For an event log containing 1000+ jobs, each of which had thousands of tasks, the changes in this patch cut the replay time in half:

![image](https://cloud.githubusercontent.com/assets/50748/19980953/31154622-a1bd-11e6-9be4-21fbb9b3f9a7.png)

Prior to this patch's changes, the two slowest methods in log replay were internal exceptions thrown by `Json4S` and calls to `Class.getSimpleName()`:

![image](https://cloud.githubusercontent.com/assets/50748/19981052/87416cce-a1bd-11e6-9f25-06a7cd391822.png)

After this patch, these hotspots are completely eliminated.

Author: Josh Rosen <[email protected]>

Closes apache#15756 from JoshRosen/speed-up-jsonprotocol.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants