-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-39006][K8S] Show a directional error message for executor PVC dynamic allocation failure #36374
Conversation
…llocation Failure
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for changing your direction. This is much better.
@@ -120,6 +122,20 @@ private[spark] class MountVolumesFeatureStep(conf: KubernetesConf) | |||
override def getAdditionalKubernetesResources(): Seq[HasMetadata] = { | |||
additionalResources.toSeq | |||
} | |||
|
|||
private def checkPVCClaimNameWhenMultiExecutors(claimName: String): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkPVCClaimNameWhenMultiExecutors
-> checkPVCClaimName
because this PR checks all cases instead of WhenMultiExecutors
and doesn't raise exception for single instance case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change it as checkPVCClaimName
:)
private def checkPVCClaimNameWhenMultiExecutors(claimName: String): Unit = { | ||
val invalidClaimName = | ||
if (!claimName.contains(PVC_ON_DEMAND) && !claimName.contains(ENV_EXECUTOR_ID)) true | ||
else false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is misleading because this is totally valid when there is only one executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, check it when multiple executors
else false | ||
|
||
val executorInstances = conf.get(EXECUTOR_INSTANCES) | ||
if (executorInstances.isEmpty) return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please try to avoid return
in Scala method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, remove return
if (invalidClaimName && executorInstances.get > 1) { | ||
throw new IllegalArgumentException("PVC ClaimName should contain " + | ||
PVC_ON_DEMAND + " or " + ENV_EXECUTOR_ID + | ||
" when multiple executors are required") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message looks informative but please include the current claim name in the error message too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have added the current claim name
@@ -148,6 +149,26 @@ class MountVolumesFeatureStepSuite extends SparkFunSuite { | |||
assert(executorPVC.getClaimName.endsWith("-exec-1-pvc-0")) | |||
} | |||
|
|||
test("SPARK-39006 Show a directional error message for PVC Dynamic Allocation Failure") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We need
:
, e.g.,SPARK-39006
->SPARK-39006:
. - After this PR, we don't have PVC Dynamic Allocation Failure. Please revise the test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have two changes here:
- Add
:
afterSPARK-39006
- Change test case as
Check PVC ClaimName
mountReadOnly = true, | ||
KubernetesPVCVolumeConf("testClaimName") | ||
) | ||
val conf = new SparkConf().set(EXECUTOR_INSTANCES, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a positive test case where executor instance is 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine, added a positive test here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments. Thanks, @dcoliversun .
@dongjoon-hyun Thanks for your comments, I have updated code. Please review again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @dcoliversun .
Merged to master for Apache Spark 3.4.
Thank you @dongjoon-hyun |
can we expect the fix to be in next spark version release 3.4.1? |
Could you be more specific, @0xdarkman ? Which JIRA do you want for Apache Spark 3.4.1? You can check Apache Spark 3.4.1 RC1 vote for the detail. |
For this PR (SPARK-39006), it's reverted cleanly via SPARK-43342 at Apache Spark 3.4.1. |
I have been migrating from spark 3.3.2 and deployed spark 3.4.0 and had error with PVC ReadWriteMany access so this is how I come across this issue. if it is fix, change reverted or whatever it does not matter to me as far as I can use new spark version with PVC access similarly to what spark version did prior v 3.4.0. |
Could you try Apache Spark 3.4.1 in advance? RC1 binary is already there and the vote will finished in 24 hours.
If you can participate the community vote, it would be great, @0xdarkman . |
I am running this version 3.4.1 and it looks good. |
Thank you for confirming. Apache Spark 3.4.1 is released officially. |
What changes were proposed in this pull request?
This PR aims to show a directional error message for executor PVC dynamic allocation failure.
Why are the changes needed?
#29846 supports dynamic PVC creation/deletion for K8s executors.
#29557 support execId placeholder in executor PVC conf.
If not set
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName
withonDemand
orSPARK_EXECUTOR_ID
, spark will continue to try to create the executor pod.After this PR, spark can show a directional error message for this situation.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Add unit test.