Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20084][Core] Remove internal.metrics.updatedBlockStatuses from history files. #17412

Closed

Conversation

rdblue
Copy link
Contributor

@rdblue rdblue commented Mar 24, 2017

What changes were proposed in this pull request?

Remove accumulator updates for internal.metrics.updatedBlockStatuses from SparkListenerTaskEnd entries in the history file. These can cause history files to grow to hundreds of GB because the value of the accumulator contains all tracked blocks.

How was this patch tested?

Current History UI tests cover use of the history file.

@SparkQA
Copy link

SparkQA commented Mar 24, 2017

Test build #75173 has finished for PR 17412 at commit 386846c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Mar 24, 2017

I'm fine with this, assuming there's no need for this information anywhere. (Basically, the question I asked in #16714).

@rdblue
Copy link
Contributor Author

rdblue commented Mar 27, 2017

@vanzin, I looked through the listeners in the Spark UI for uses of it and didn't find any, plus I haven't seen anything in the UI that is obviously based on it. As I said on the issue, the updates duplicate data in metrics so it would be odd for a component to go to accumulators to get this info.

I'm deploying this internally today, maybe it makes sense to give it a couple days to see if we get complaints about something not working.

@rdblue
Copy link
Contributor Author

rdblue commented Mar 30, 2017

@vanzin, we've been running this in production for a few days now and haven't had any problems. I think it should be safe to merge. Could you take another look? Thanks!

@vanzin
Copy link
Contributor

vanzin commented Mar 31, 2017

LGTM. Merging to master / 2.1.

asfgit pushed a commit that referenced this pull request Mar 31, 2017
… history files.

## What changes were proposed in this pull request?

Remove accumulator updates for internal.metrics.updatedBlockStatuses from SparkListenerTaskEnd entries in the history file. These can cause history files to grow to hundreds of GB because the value of the accumulator contains all tracked blocks.

## How was this patch tested?

Current History UI tests cover use of the history file.

Author: Ryan Blue <[email protected]>

Closes #17412 from rdblue/SPARK-20084-remove-block-accumulator-info.

(cherry picked from commit c4c03ee)
Signed-off-by: Marcelo Vanzin <[email protected]>
@asfgit asfgit closed this in c4c03ee Mar 31, 2017
@zhouyejoe
Copy link
Contributor

@rdblue Hi, why not the blockstatusupdates are not filtering out in executorMetricsUpdate? This line https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala#L245
While I am working on SPARK-21961, I filtered those blockstatusupdates while reading from logs in Spark History Server, but it causing some unit test failure.
Should it not be filtered out in both executorMetricsUpdateFromJson and executorMetricsUpdateToJson?
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants