-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25081][Core]Nested spill in ShuffleExternalSorter should not access released memory page (branch-2.2) #22072
Conversation
…ry page This issue is pretty similar to [SPARK-21907](https://issues.apache.org/jira/browse/SPARK-21907). "allocateArray" in [ShuffleInMemorySorter.reset](https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99) may trigger a spill and cause ShuffleInMemorySorter access the released `array`. Another task may get the same memory page from the pool. This will cause two tasks access the same memory page. When a task reads memory written by another task, many types of failures may happen. Here are some examples I have seen: - JVM crash. (This is easy to reproduce in a unit test as we fill newly allocated and deallocated memory with 0xa5 and 0x5a bytes which usually points to an invalid memory address) - java.lang.IllegalArgumentException: Comparison method violates its general contract! - java.lang.NullPointerException at org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:384) - java.lang.UnsupportedOperationException: Cannot grow BufferHolder by size -536870912 because the size after growing exceeds size limitation 2147483632 This PR resets states in `ShuffleInMemorySorter.reset` before calling `allocateArray` to fix the issue. The new unit test will make JVM crash without the fix. Closes apache#22062 from zsxwing/SPARK-25081. Authored-by: Shixiong Zhu <[email protected]> Signed-off-by: Shixiong Zhu <[email protected]>
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Test build #94581 has finished for PR 22072 at commit
|
retest this please |
Test build #94589 has finished for PR 22072 at commit
|
|
jenkins retest this please |
Test build #94597 has started for PR 22072 at commit |
retest this please |
ok to test |
Test build #94624 has finished for PR 22072 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Merged to 2.2.
…access released memory page (branch-2.2) ## What changes were proposed in this pull request? Backport #22062 to branch-2.2. Just two minor differences in the test: - branch-2.2 doesn't have `SparkOutOfMemoryError`. It's using `OutOfMemoryError` directly. - MockitoSugar is in a different package in old scalatest. ## How was this patch tested? Jenkins Closes #22072 from zsxwing/SPARK-25081-2.2. Authored-by: Shixiong Zhu <[email protected]> Signed-off-by: Xiao Li <[email protected]>
…access released memory page (branch-2.2) ## What changes were proposed in this pull request? Backport apache#22062 to branch-2.2. Just two minor differences in the test: - branch-2.2 doesn't have `SparkOutOfMemoryError`. It's using `OutOfMemoryError` directly. - MockitoSugar is in a different package in old scalatest. ## How was this patch tested? Jenkins Closes apache#22072 from zsxwing/SPARK-25081-2.2. Authored-by: Shixiong Zhu <[email protected]> Signed-off-by: Xiao Li <[email protected]>
…access released memory page (branch-2.2) ## What changes were proposed in this pull request? Backport apache#22062 to branch-2.2. Just two minor differences in the test: - branch-2.2 doesn't have `SparkOutOfMemoryError`. It's using `OutOfMemoryError` directly. - MockitoSugar is in a different package in old scalatest. ## How was this patch tested? Jenkins Closes apache#22072 from zsxwing/SPARK-25081-2.2. Authored-by: Shixiong Zhu <[email protected]> Signed-off-by: Xiao Li <[email protected]>
…access released memory page (branch-2.2) ## What changes were proposed in this pull request? Backport apache#22062 to branch-2.2. Just two minor differences in the test: - branch-2.2 doesn't have `SparkOutOfMemoryError`. It's using `OutOfMemoryError` directly. - MockitoSugar is in a different package in old scalatest. ## How was this patch tested? Jenkins Closes apache#22072 from zsxwing/SPARK-25081-2.2. Authored-by: Shixiong Zhu <[email protected]> Signed-off-by: Xiao Li <[email protected]>
What changes were proposed in this pull request?
Backport #22062 to branch-2.2. Just two minor differences in the test:
SparkOutOfMemoryError
. It's usingOutOfMemoryError
directly.How was this patch tested?
Jenkins