K8s jobs slow due to hard cpu limits #352

FANNG1 · 2017-06-19T10:25:53Z

run a simple wordCount app, comparing the running time on yarn and k8s. jobs on k8s are much slower,

some time are wasted on pod start(10s more)
the map task execute time (espically the first task) also takes much more time than app on yarn(6s more). notice that k8s jobs takes more gc time, Is anyone else encounter this? and known why?

FANNG1 · 2017-06-19T10:26:04Z

some configure:

spark.eventLog.dir hdfs://10.196.132.104:49000/spark/eventlog
spark.history.fs.logDirectory hdfs://10.196.132.104:49000/spark/eventlog
spark.eventLog.enabled true
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.initialExecutors 1
spark.dynamicAllocation.maxExecutors 1
spark.executor.memory 1g
spark.executor.cores 1
spark.driver.memory 1g

spark.shuffle.service.enabled true
spark.kubernetes.shuffle.namespace default
spark.kubernetes.shuffle.dir /data2/spark
spark.local.dir /data2/spark
spark.kubernetes.shuffle.labels app=spark-shuffle-service,spark-version=2.1.0

spark.kubernetes.executor.memoryOverhead 5000

FANNG1 · 2017-06-19T10:54:19Z

update logs, and could see task 0 on k8s tasks 10s, while on yarn takes only 4s. kubelet and NM are on the same machine, had same jvm paramters ,please skip the logs added for debug
yarn executor log
k8s executor log

FANNG1 · 2017-06-22T03:40:53Z

found the main reason is in k8s cpu limit is the same as cpu request, in this case is 1 cores, while in yarn, executor could exceed cpu request,add a pull request to add "spark.kubernetes.executor.limit.cores" to specify cpu limit of executor

ash211 · 2017-06-22T06:16:56Z

Great debugging @sandflee ! What you're proposing around cpu limits is definitely a plausible explanation for some of the discrepancy in performance between YARN and k8s.

Looking at these YARN docs it appears that YARN doesn't do cpu limiting of a YARN container at the OS level without configuring cgroups, and because it's not on by default probably most YARN installations therefore don't have cpu limiting. So I suspect when a YARN scheduler allocates vcores to an application the number of cores is actually just an un-enforced request to the application, which may exceed the specified core usage either intentionally or unintentionally.

So in your testing you were probably comparing a strictly 1core executor in k8s against a 1core-but-actually-unlimited executor in YARN. Indeed those performance results should be different!

Bottom line is, making this core limit configurable makes sense to me.

I suspect in many of my own deployments I would want to emulate the YARN behavior of allowing the pod to use an unlimited amount of CPU on the kubelet, especially when no other pods are on the kubelet. Do you (or @foxish) know a way to set no cpu limit, or unlimited cpu?

foxish · 2017-06-22T06:19:39Z

Setting a request and no limit would be the way to make sure that there is only a minimum guarantee and no upper bound on usage. Making that the default makes sense to me. We should have the limit be optional using the option that @sandflee wrote in his PR.

foxish · 2017-06-22T06:22:15Z

CPU is considered a "compressible" resource (unlike memory), so, there is no harm in making it unbounded. If the system has insufficient cpu, the executor's CPU usage will be throttled (down to the request of 1 CPU), but the executor will not be killed. See also: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-qos.md#compressible-resource-guarantees

liyinan926 · 2017-06-22T15:49:56Z

Spark already has spark.cores.max applicable to standalone and Mesos (static) deployments that limits the total number of cores across the whole cluster. So the core limit per executor can be derived as spark.cores.max/spark.executor.cores. Should we consider using spark.cores.max instead of coming up with a Kubernetes specific one, given that we honor spark.executor.cores? This is particularly an issue if an application that already uses spark.cores.max is being migrated from a standalone Spark cluster to run on Kubernetes. It is reasonable to assume that the user of the application expects the same behavior here.

ash211 · 2017-06-22T16:38:00Z

I don't think we can rely on always using spark.cores.max since it's not set by all applications. Especially those migrating from YARN, where that setting has no effect.

And the arithmetic spark.cores.max/spark.executor.cores would provide the count of executors, rather than the core limit per executor. That's (total cores across cluster) / (cores per executor) = (executor count). Possibly as a different default when spark.cores.max is set we could change the default per-executor cpu limit from unbounded to bounded at spark.cores.max, which applies the whole-cluster cpu limit also as the per-executor cpu limit.

But because cpu is a compressible resource, I think a better default would be unlimited cpu.

liyinan926 · 2017-06-22T17:23:24Z

My bad, it should be spark.core.max/executor-instances. I agree that the default should be unlimited and the PR is doing the right thing. I'm just concerned about introducing yet another config key who's purpose can be served by using spark.cores.max. It's certainly true that spark.cores.max is not always set. But with a sensible default of unlimited, it's not really an issue. We just need to document it clearly so people know what to do when they do need to limit the cores per executor.

kimoonkim · 2017-06-22T22:06:47Z

@liyinan926 Thanks for debugging this executor core limit issue. This is quite interesting.

it should be spark.core.max/executor-instances.

I am curious how this would work with dynamic allocation when the number of executors varies as the job goes. Is this suggested only for static allocation? Then, do we need another flag anyway if we want to limit max cores per executor for dynamic allocation?

liyinan926 · 2017-06-22T22:23:58Z

@kimoonkim That's a good point. Changing the number of executors will cause the limit per executor to change as well if the value of spark.cores.max stays the same. With Kubernetes, it's probably not possible because the cpu limit is specified when an executor pod gets created.

tangzhankun · 2017-06-23T04:23:54Z

@sandflee Have you tested the result with your new patch?

If I remember correctly, the spec.containers[].resources.requests.cpu is converted to --cpu-shares flag in the docker run command while spec.containers[].resources.limits.cpu is converted to --cpu-quota. So I think that we should declare -cpu-shares with k8s client instead of hard limit cpu time(cpu-quota) if we want to utilize other idle cpu cores?

FANNG1 · 2017-06-23T08:38:14Z

yes, I have test the patch, set hard limit is a user option with default no limit. user could limit cpu usage if needed, such as co-running a online service and a spark job.

ash211 · 2017-06-23T18:36:12Z

Closed by #356

ash211 · 2017-06-23T18:37:35Z

Thanks again for debugging this @sandflee ! Happy to discuss any more performance discrepancies you find vs YARN or any other improvements you might find in future issues.

tangzhankun · 2017-06-26T02:05:42Z

@sandflee I checked the patch and thanks!

FANNG1 mentioned this issue Jun 22, 2017

Config for hard cpu limit on pods; default unlimited #356

Merged

ash211 changed the title ~~jobs on k8s are slow~~ K8s jobs slow due to hard cpu limits Jun 23, 2017

ash211 closed this as completed Jun 23, 2017

ifilonenko pushed a commit to ifilonenko/spark that referenced this issue Feb 26, 2019

Fix circle checkout for tags (apache-spark-on-k8s#352)

b7410ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8s jobs slow due to hard cpu limits #352

K8s jobs slow due to hard cpu limits #352

FANNG1 commented Jun 19, 2017 •

edited

Loading

FANNG1 commented Jun 19, 2017

FANNG1 commented Jun 19, 2017 •

edited

Loading

FANNG1 commented Jun 22, 2017

ash211 commented Jun 22, 2017

foxish commented Jun 22, 2017

foxish commented Jun 22, 2017 •

edited

Loading

liyinan926 commented Jun 22, 2017

ash211 commented Jun 22, 2017

liyinan926 commented Jun 22, 2017

kimoonkim commented Jun 22, 2017

liyinan926 commented Jun 22, 2017

tangzhankun commented Jun 23, 2017

FANNG1 commented Jun 23, 2017

ash211 commented Jun 23, 2017

ash211 commented Jun 23, 2017

tangzhankun commented Jun 26, 2017

K8s jobs slow due to hard cpu limits #352

K8s jobs slow due to hard cpu limits #352

Comments

FANNG1 commented Jun 19, 2017 • edited Loading

FANNG1 commented Jun 19, 2017

FANNG1 commented Jun 19, 2017 • edited Loading

FANNG1 commented Jun 22, 2017

ash211 commented Jun 22, 2017

foxish commented Jun 22, 2017

foxish commented Jun 22, 2017 • edited Loading

liyinan926 commented Jun 22, 2017

ash211 commented Jun 22, 2017

liyinan926 commented Jun 22, 2017

kimoonkim commented Jun 22, 2017

liyinan926 commented Jun 22, 2017

tangzhankun commented Jun 23, 2017

FANNG1 commented Jun 23, 2017

ash211 commented Jun 23, 2017

ash211 commented Jun 23, 2017

tangzhankun commented Jun 26, 2017

FANNG1 commented Jun 19, 2017 •

edited

Loading

FANNG1 commented Jun 19, 2017 •

edited

Loading

foxish commented Jun 22, 2017 •

edited

Loading