Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14007] [SQL] Manage the memory used by hash map in shuffled hash join #11826

Closed
wants to merge 4 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Mar 18, 2016

What changes were proposed in this pull request?

This PR try acquire the memory for hash map in shuffled hash join, fail the task if there is no enough memory (otherwise it could OOM the executor).

It also removed unused HashedRelation.

How was this patch tested?

Existing unit tests. Manual tests with TPCDS Q78.

@SparkQA
Copy link

SparkQA commented Mar 18, 2016

Test build #53546 has finished for PR 11826 at commit 312e4ff.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies davies changed the title [SPARK-14007] [SQL] Cleanup hash2 [SPARK-14007] [SQL] Manage the memory used by hash map in shuffled hash join Mar 18, 2016
@SparkQA
Copy link

SparkQA commented Mar 18, 2016

Test build #53547 has finished for PR 11826 at commit 6382eea.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 18, 2016

Test build #2652 has finished for PR 11826 at commit 6382eea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor Author

davies commented Mar 18, 2016

cc @rxin

@rxin
Copy link
Contributor

rxin commented Mar 19, 2016

cc @sameeragarwal for review

val copiedIter = iter.map { row =>
// It's hard to guess what's exactly memory will be used, we have a raft guess here.
// TODO: use BytesToBytesMap instead of HashMap for memory efficiency
val needed = 150 + row.getSizeInBytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain (and possibly comment in code) the reason behind choosing 150?

@sameeragarwal
Copy link
Member

LGTM, just one clarifying question.

@davies
Copy link
Contributor Author

davies commented Mar 21, 2016

Merging this into master.

@asfgit asfgit closed this in 9b4e15b Mar 21, 2016
@SparkQA
Copy link

SparkQA commented Mar 21, 2016

Test build #53695 has finished for PR 11826 at commit 6a99c42.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val got = memoryManager.acquireExecutionMemory(
Math.max(memoryManager.pageSizeBytes(), needed), MemoryMode.ON_HEAP, null)
if (got < needed) {
throw new SparkException("Can't acquire enough memory to build hash map in shuffled" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing space between shuffled and hash

roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
…sh join

## What changes were proposed in this pull request?

This PR try acquire the memory for hash map in shuffled hash join, fail the task if there is no enough memory (otherwise it could OOM the executor).

It also removed unused HashedRelation.

## How was this patch tested?

Existing unit tests. Manual tests with TPCDS Q78.

Author: Davies Liu <[email protected]>

Closes apache#11826 from davies/cleanup_hash2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants