Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark-23306] Fix the oom caused by contention #20480

Closed
wants to merge 1 commit into from

Conversation

zhzhan
Copy link
Contributor

@zhzhan zhzhan commented Feb 1, 2018

What changes were proposed in this pull request?

here is race condition in TaskMemoryManger, which may cause OOM.

The memory released may be taken by another task because there is a gap between releaseMemory and acquireMemory, e.g., UnifiedMemoryManager, causing the OOM. if the current is the only one that can perform spill. It can happen to BytesToBytesMap, as it only spill required bytes.

Loop on current consumer if it still has memory to release.

How was this patch tested?

The race contention is hard to reproduce, but the current logic seems causing the issue.

Please review http://spark.apache.org/contributing.html before opening a pull request.

@gatorsmile
Copy link
Member

cc @jiangxb1987 @cloud-fan

@SparkQA
Copy link

SparkQA commented Feb 2, 2018

Test build #86943 has finished for PR 20480 at commit df96f0c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 2, 2018

Test build #86944 has finished for PR 20480 at commit afe40e5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM this should be correct logically, and I can't think out a better way to resolve it. cc @cloud-fan

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in b3a0428 Feb 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants