-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-41525][K8S] Improve onNewSnapshots
to use unique lists of known executor IDs and PVC names
#39070
Conversation
…s in onNewSnapshots
onNewSnapshots
onNewSnapshots
onNewSnapshots
to use unique list of known executor IDs and PVC names
cc @viirya and @attilapiros |
onNewSnapshots
to use unique list of known executor IDs and PVC namesonNewSnapshots
to use unique list of known executor IDs and PVC names
onNewSnapshots
to use unique list of known executor IDs and PVC namesonNewSnapshots
to use unique lists of known executor IDs and PVC names
@@ -152,7 +152,7 @@ class ExecutorPodsAllocator( | |||
applicationId: String, | |||
schedulerBackend: KubernetesClusterSchedulerBackend, | |||
snapshots: Seq[ExecutorPodsSnapshot]): Unit = { | |||
val k8sKnownExecIds = snapshots.flatMap(_.executorPods.keys) | |||
val k8sKnownExecIds = snapshots.flatMap(_.executorPods.keys).distinct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code is in Spark 3.1.2 and Spark 3.2.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So snapshots
may contain duplicates too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snapshots
is defined as a sequence of snapshot history originally. Logically, there can be a few snapshot which the same contents, but in the most cases, they will be different by object metadata or status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying it.
@@ -162,7 +162,7 @@ class ExecutorPodsAllocator( | |||
val k8sKnownPVCNames = snapshots.flatMap(_.executorPods.values.map(_.pod)).flatMap { pod => | |||
pod.getSpec.getVolumes.asScala | |||
.flatMap { v => Option(v.getPersistentVolumeClaim).map(_.getClaimName) } | |||
} | |||
}.distinct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is since Spark 3.2.0.
Thank you, @viirya ! |
All tests passed except the master branch dependency test failure. Merged to master. |
What changes were proposed in this pull request?
This PR improve
ExecutorPodsAllocator.onNewSnapshots
by removing duplications atk8sKnownExecIds
andk8sKnownPVCNames
. In the large cluster, this causes inefficiency.Why are the changes needed?
The existing variables have lots of duplications because
snapshots
isSeq[ExecutorPodsSnapshot]
.For example, if we print out the values, it looks like the following.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Manual review because this is an improvement on the local variable computation.