Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41525][K8S] Improve onNewSnapshots to use unique lists of known executor IDs and PVC names #39070

Closed
wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Dec 15, 2022

What changes were proposed in this pull request?

This PR improve ExecutorPodsAllocator.onNewSnapshots by removing duplications at k8sKnownExecIds and k8sKnownPVCNames. In the large cluster, this causes inefficiency.

Why are the changes needed?

The existing variables have lots of duplications because snapshots is Seq[ExecutorPodsSnapshot].

val k8sKnownExecIds = snapshots.flatMap(_.executorPods.keys)

For example, if we print out the values, it looks like the following.

22/12/15 07:09:37 INFO ExecutorPodsAllocator: 1
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 2
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 1
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 2
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 1
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 2
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 1
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 2
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 1
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 2
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 1
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 2
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 3
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 1
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 2
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 3
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 1
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 2
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 3
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 1
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 2
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 3
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 1
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 2
22/12/15 07:09:37 INFO ExecutorPodsAllocator: 3

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual review because this is an improvement on the local variable computation.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-41525][K8S] Use unique list of known executor IDs and PVC names in onNewSnapshots [SPARK-41525][K8S] Use unique list of known executor IDs and PVC names in onNewSnapshots Dec 15, 2022
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-41525][K8S] Use unique list of known executor IDs and PVC names in onNewSnapshots [SPARK-41525][K8S] Improve onNewSnapshots to use unique list of known executor IDs and PVC names Dec 15, 2022
@dongjoon-hyun
Copy link
Member Author

cc @viirya and @attilapiros

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-41525][K8S] Improve onNewSnapshots to use unique list of known executor IDs and PVC names [SPARK-41525][K8S] Improve onNewSnapshots to use unique list of known executor IDs and PVC names Dec 15, 2022
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-41525][K8S] Improve onNewSnapshots to use unique list of known executor IDs and PVC names [SPARK-41525][K8S] Improve onNewSnapshots to use unique lists of known executor IDs and PVC names Dec 15, 2022
@@ -152,7 +152,7 @@ class ExecutorPodsAllocator(
applicationId: String,
schedulerBackend: KubernetesClusterSchedulerBackend,
snapshots: Seq[ExecutorPodsSnapshot]): Unit = {
val k8sKnownExecIds = snapshots.flatMap(_.executorPods.keys)
val k8sKnownExecIds = snapshots.flatMap(_.executorPods.keys).distinct
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code is in Spark 3.1.2 and Spark 3.2.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So snapshots may contain duplicates too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapshots is defined as a sequence of snapshot history originally. Logically, there can be a few snapshot which the same contents, but in the most cases, they will be different by object metadata or status.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying it.

@@ -162,7 +162,7 @@ class ExecutorPodsAllocator(
val k8sKnownPVCNames = snapshots.flatMap(_.executorPods.values.map(_.pod)).flatMap { pod =>
pod.getSpec.getVolumes.asScala
.flatMap { v => Option(v.getPersistentVolumeClaim).map(_.getClaimName) }
}
}.distinct
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is since Spark 3.2.0.

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya !

@dongjoon-hyun
Copy link
Member Author

All tests passed except the master branch dependency test failure. Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants