-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] ClusterLoadFallbackPolicy is not strictness when a shuffle with big partitions to register #30
Conversation
@@ -31,7 +31,8 @@ class RssShuffleFallbackPolicyRunner(sparkConf: SparkConf) extends Logging { | |||
def applyAllFallbackPolicy(dependency: ShuffleDependency[_, _, _], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dependency.partitioner.numPartitions
seems enough, could you please help to optimize this?Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have retrieved the param [dependency] from this function
private def handleGetClusterLoadStatus(context: RpcCallContext): Unit = { | ||
val (_, _, _, result) = getClusterLoad | ||
private def handleGetClusterLoadStatus(context: RpcCallContext, numPartitions: Int): Unit = { | ||
val (_, _, _, result) = getClusterLoad(numPartitions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need so much return values, please remove useless return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The [getClusterLoad] is common function, we may need the top three return-values in the other place. I can separate the value through creating new function.
LGTM |
[BUG] ClusterLoadFallbackPolicy is not strictness when a shuffle with big partitions to register
What changes were proposed in this pull request?
If a shuffle with 5w partitions to register, and cluster has 3w but the slots usage is not in high load status, 5w allocation slots will failed.
Why are the changes needed?
This change will avoid this circumstance that a shuffle that it will cost too much slots over clusters' continues process of RSS handling
What are the items that need reviewer attention?
Related issues.
#17
Related pull requests.
How was this patch tested?
/cc @FMX
/assign @wangshengjie