[SPARK-14007] [SQL] Manage the memory used by hash map in shuffled hash join #11826

davies · 2016-03-18T17:58:10Z

What changes were proposed in this pull request?

This PR try acquire the memory for hash map in shuffled hash join, fail the task if there is no enough memory (otherwise it could OOM the executor).

It also removed unused HashedRelation.

How was this patch tested?

Existing unit tests. Manual tests with TPCDS Q78.

SparkQA · 2016-03-18T18:03:53Z

Test build #53546 has finished for PR 11826 at commit 312e4ff.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-18T18:21:26Z

Test build #53547 has finished for PR 11826 at commit 6382eea.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-18T23:27:16Z

Test build #2652 has finished for PR 11826 at commit 6382eea.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2016-03-18T23:57:52Z

cc @rxin

rxin · 2016-03-19T02:45:11Z

cc @sameeragarwal for review

sameeragarwal · 2016-03-21T07:48:11Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoin.scala

+    val copiedIter = iter.map { row =>
+      // It's hard to guess what's exactly memory will be used, we have a raft guess here.
+      // TODO: use BytesToBytesMap instead of HashMap for memory efficiency
+      val needed = 150 + row.getSizeInBytes


Can you please explain (and possibly comment in code) the reason behind choosing 150?

sameeragarwal · 2016-03-21T07:53:11Z

LGTM, just one clarifying question.

davies · 2016-03-21T18:21:21Z

Merging this into master.

SparkQA · 2016-03-21T21:38:23Z

Test build #53695 has finished for PR 11826 at commit 6a99c42.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tedyu · 2016-03-22T00:02:10Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoin.scala

+        val got = memoryManager.acquireExecutionMemory(
+          Math.max(memoryManager.pageSizeBytes(), needed), MemoryMode.ON_HEAP, null)
+        if (got < needed) {
+          throw new SparkException("Can't acquire enough memory to build hash map in shuffled" +


nit: missing space between shuffled and hash

…sh join ## What changes were proposed in this pull request? This PR try acquire the memory for hash map in shuffled hash join, fail the task if there is no enough memory (otherwise it could OOM the executor). It also removed unused HashedRelation. ## How was this patch tested? Existing unit tests. Manual tests with TPCDS Q78. Author: Davies Liu <[email protected]> Closes apache#11826 from davies/cleanup_hash2.

Davies Liu added 2 commits March 18, 2016 10:41

acquire memory for hash map

d84e581

tiny fix

312e4ff

update comments

6382eea

davies changed the title ~~[SPARK-14007] [SQL] Cleanup hash2~~ [SPARK-14007] [SQL] Manage the memory used by hash map in shuffled hash join Mar 18, 2016

sameeragarwal reviewed Mar 21, 2016
View reviewed changes

address commments

6a99c42

asfgit closed this in 9b4e15b Mar 21, 2016

tedyu reviewed Mar 22, 2016
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-14007] [SQL] Manage the memory used by hash map in shuffled hash join #11826

[SPARK-14007] [SQL] Manage the memory used by hash map in shuffled hash join #11826

davies commented Mar 18, 2016

SparkQA commented Mar 18, 2016

SparkQA commented Mar 18, 2016

SparkQA commented Mar 18, 2016

davies commented Mar 18, 2016

rxin commented Mar 19, 2016

sameeragarwal Mar 21, 2016

sameeragarwal commented Mar 21, 2016

davies commented Mar 21, 2016

SparkQA commented Mar 21, 2016

tedyu Mar 22, 2016

[SPARK-14007] [SQL] Manage the memory used by hash map in shuffled hash join #11826

[SPARK-14007] [SQL] Manage the memory used by hash map in shuffled hash join #11826

Conversation

davies commented Mar 18, 2016

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Mar 18, 2016

SparkQA commented Mar 18, 2016

SparkQA commented Mar 18, 2016

davies commented Mar 18, 2016

rxin commented Mar 19, 2016

sameeragarwal Mar 21, 2016

Choose a reason for hiding this comment

sameeragarwal commented Mar 21, 2016

davies commented Mar 21, 2016

SparkQA commented Mar 21, 2016

tedyu Mar 22, 2016

Choose a reason for hiding this comment