-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23806] Broadcast.unpersist can cause fatal exception when used… #20924
Conversation
… with dynamic allocation
cc @cloud-fan |
Test build #88671 has finished for PR 20924 at commit
|
Jenkins, test this please |
Test build #88678 has finished for PR 20924 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
val futures = requiredBlockManagers.map { bm => | ||
bm.slaveEndpoint.ask[Int](removeMsg).recover { | ||
case e: IOException => | ||
logWarning(s"Error trying to remove broadcast $broadcastId", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: also show the BlockManager id which we can't remove the broadcast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, we can also add block manager id for the RDD case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan are you changing this under separate jira then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty minor, I think we don't need to create a new JIRA ticket. Anyone has time to send a PR?
thanks, merging to master/2.3! |
… with dynamic allocation ## What changes were proposed in this pull request? ignore errors when you are waiting for a broadcast.unpersist. This is handling it the same way as doing rdd.unpersist in https://issues.apache.org/jira/browse/SPARK-22618 ## How was this patch tested? Patch was tested manually against a couple jobs that exhibit this behavior, with the change the application no longer dies due to this and just prints the warning. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Thomas Graves <[email protected]> Closes #20924 from tgravescs/SPARK-23806. (cherry picked from commit 641aec6) Signed-off-by: Wenchen Fan <[email protected]>
## What changes were proposed in this pull request? Address #20924 (comment), show block manager id when remove RDD/Broadcast fails. ## How was this patch tested? N/A Author: Xingbo Jiang <[email protected]> Closes #20960 from jiangxb1987/bmid. (cherry picked from commit 7cf9fab) Signed-off-by: hyukjinkwon <[email protected]>
## What changes were proposed in this pull request? Address #20924 (comment), show block manager id when remove RDD/Broadcast fails. ## How was this patch tested? N/A Author: Xingbo Jiang <[email protected]> Closes #20960 from jiangxb1987/bmid.
## What changes were proposed in this pull request? Address apache#20924 (comment), show block manager id when remove RDD/Broadcast fails. ## How was this patch tested? N/A Author: Xingbo Jiang <[email protected]> Closes apache#20960 from jiangxb1987/bmid.
… with dynamic allocation ## What changes were proposed in this pull request? ignore errors when you are waiting for a broadcast.unpersist. This is handling it the same way as doing rdd.unpersist in https://issues.apache.org/jira/browse/SPARK-22618 ## How was this patch tested? Patch was tested manually against a couple jobs that exhibit this behavior, with the change the application no longer dies due to this and just prints the warning. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Thomas Graves <[email protected]> Closes apache#20924 from tgravescs/SPARK-23806.
## What changes were proposed in this pull request? Address apache#20924 (comment), show block manager id when remove RDD/Broadcast fails. ## How was this patch tested? N/A Author: Xingbo Jiang <[email protected]> Closes apache#20960 from jiangxb1987/bmid.
… with dynamic allocation ignore errors when you are waiting for a broadcast.unpersist. This is handling it the same way as doing rdd.unpersist in https://issues.apache.org/jira/browse/SPARK-22618 Patch was tested manually against a couple jobs that exhibit this behavior, with the change the application no longer dies due to this and just prints the warning. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Thomas Graves <[email protected]> Closes apache#20924 from tgravescs/SPARK-23806. (cherry picked from commit 641aec6) Signed-off-by: Wenchen Fan <[email protected]> Change-Id: I2974094962416d98704777c8aa73d1d0bf8ce4df
… with dynamic allocation apache#20924 What changes were proposed in this pull request? ignore errors when you are waiting for a broadcast.unpersist. This is handling it the same way as doing rdd.unpersist in https://issues.apache.org/jira/browse/SPARK-22618 How was this patch tested? Patch was tested manually against a couple jobs that exhibit this behavior, with the change the application no longer dies due to this and just prints the warning. Please review http://spark.apache.org/contributing.html before opening a pull request.
… with dynamic allocation ignore errors when you are waiting for a broadcast.unpersist. This is handling it the same way as doing rdd.unpersist in https://issues.apache.org/jira/browse/SPARK-22618 Patch was tested manually against a couple jobs that exhibit this behavior, with the change the application no longer dies due to this and just prints the warning. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Thomas Graves <[email protected]> Closes apache#20924 from tgravescs/SPARK-23806. (cherry-picked from commit 641aec6) Ref: LIHADOOP-39138 RB=1414361 BUG=LIHADOOP-39138 R=fli,mshen,yezhou A=yezhou
… with dynamic allocation
What changes were proposed in this pull request?
ignore errors when you are waiting for a broadcast.unpersist. This is handling it the same way as doing rdd.unpersist in https://issues.apache.org/jira/browse/SPARK-22618
How was this patch tested?
Patch was tested manually against a couple jobs that exhibit this behavior, with the change the application no longer dies due to this and just prints the warning.
Please review http://spark.apache.org/contributing.html before opening a pull request.