Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-38194][YARN][MESOS][K8S] Make memory overhead factor configurable #35504
[SPARK-38194][YARN][MESOS][K8S] Make memory overhead factor configurable #35504
Changes from 9 commits
16df0d7
2514140
fbfee7b
d241a9f
f65d0e4
740e970
9e36055
4f8e7e6
bcb764c
0bf3a2f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason should be in here, before kubernetesConf.get(MEMORY_OVERHEAD_FACTOR) was used as default factor, it's
0.4
according my real debug watch on ITRun PySpark on simple pi.py example
.But current EXECUTOR_MEMORY_OVERHEAD_FACTOR has more priority than so MEMORY_OVERHEAD_FACTOR is be overrited. (so 0.1 by default). So that the default behavior changed.
But I haven't found why
kubernetesConf.get(MEMORY_OVERHEAD_FACTOR)
is0.4
yet, I couldn't find a code in IT to set this explictly.cc @Kimahriman @tgravescs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala
Lines 62 to 63 in cd86df8
I found it, it is propagated to executors from driver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah thanks this is what I was also just looking at but I'm not sure how it was propagated to the executors. I was looking at through the KubernetesDriverconf somehow or possible through the pod system properties:
MEMORY_OVERHEAD_FACTOR.key -> overheadFactor.toString.
If you find it let me know, still investigating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I think I see how this is happening:
We build the driver spec, which includes the added system properties:
spec.systemProperties ++ addedSystemProperties
Added system properties in driver feature steps add the memory overhead setting there:
Then the KubernetesClientUtils.buildSparkConfDirFilesMap is called which propagates it to the executors (I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I think so.
spark/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala
Line 72 in cd86df8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Then, it's simpler because this PR was backported manually after feature freeze. :)
We are currently discussing on the whitelisting about late arrivals like this PR, aren't we, @tgravescs ?
We can discuss there together to get those input you need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please unblock Apache Spark 3.3 K8s module QA period by reverting this. We can land it back after having a healthy commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I think I get it, those "system properties" end up as default spark configs on the executor. Clear as mud
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#35900 revert pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your decision, @tgravescs .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as in
BasicDriverFeatureStep
- change order of config query.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went with
if (conf.contains...
to more easily handle the type conversions and default value of the backup setting