Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

K8s jobs slow due to hard cpu limits #352

Closed
FANNG1 opened this issue Jun 19, 2017 · 16 comments
Closed

K8s jobs slow due to hard cpu limits #352

FANNG1 opened this issue Jun 19, 2017 · 16 comments

Comments

@FANNG1
Copy link

FANNG1 commented Jun 19, 2017

run a simple wordCount app, comparing the running time on yarn and k8s. jobs on k8s are much slower,

  1. some time are wasted on pod start(10s more)
  2. the map task execute time (espically the first task) also takes much more time than app on yarn(6s more). notice that k8s jobs takes more gc time, Is anyone else encounter this? and known why?
@FANNG1
Copy link
Author

FANNG1 commented Jun 19, 2017

some configure:

spark.eventLog.dir hdfs://10.196.132.104:49000/spark/eventlog
spark.history.fs.logDirectory hdfs://10.196.132.104:49000/spark/eventlog
spark.eventLog.enabled true
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.initialExecutors 1
spark.dynamicAllocation.maxExecutors 1
spark.executor.memory 1g
spark.executor.cores 1
spark.driver.memory 1g

spark.shuffle.service.enabled true
spark.kubernetes.shuffle.namespace default
spark.kubernetes.shuffle.dir /data2/spark
spark.local.dir /data2/spark
spark.kubernetes.shuffle.labels app=spark-shuffle-service,spark-version=2.1.0

spark.kubernetes.executor.memoryOverhead 5000

@FANNG1
Copy link
Author

FANNG1 commented Jun 19, 2017

update logs, and could see task 0 on k8s tasks 10s, while on yarn takes only 4s. kubelet and NM are on the same machine, had same jvm paramters ,please skip the logs added for debug
yarn executor log
k8s executor log

@FANNG1
Copy link
Author

FANNG1 commented Jun 22, 2017

found the main reason is in k8s cpu limit is the same as cpu request, in this case is 1 cores, while in yarn, executor could exceed cpu request,add a pull request to add "spark.kubernetes.executor.limit.cores" to specify cpu limit of executor

@ash211
Copy link

ash211 commented Jun 22, 2017

Great debugging @sandflee ! What you're proposing around cpu limits is definitely a plausible explanation for some of the discrepancy in performance between YARN and k8s.

Looking at these YARN docs it appears that YARN doesn't do cpu limiting of a YARN container at the OS level without configuring cgroups, and because it's not on by default probably most YARN installations therefore don't have cpu limiting. So I suspect when a YARN scheduler allocates vcores to an application the number of cores is actually just an un-enforced request to the application, which may exceed the specified core usage either intentionally or unintentionally.

So in your testing you were probably comparing a strictly 1core executor in k8s against a 1core-but-actually-unlimited executor in YARN. Indeed those performance results should be different!

Bottom line is, making this core limit configurable makes sense to me.

I suspect in many of my own deployments I would want to emulate the YARN behavior of allowing the pod to use an unlimited amount of CPU on the kubelet, especially when no other pods are on the kubelet. Do you (or @foxish) know a way to set no cpu limit, or unlimited cpu?

@foxish
Copy link
Member

foxish commented Jun 22, 2017

Setting a request and no limit would be the way to make sure that there is only a minimum guarantee and no upper bound on usage. Making that the default makes sense to me. We should have the limit be optional using the option that @sandflee wrote in his PR.

@foxish
Copy link
Member

foxish commented Jun 22, 2017

CPU is considered a "compressible" resource (unlike memory), so, there is no harm in making it unbounded. If the system has insufficient cpu, the executor's CPU usage will be throttled (down to the request of 1 CPU), but the executor will not be killed. See also: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-qos.md#compressible-resource-guarantees

@liyinan926
Copy link
Member

Spark already has spark.cores.max applicable to standalone and Mesos (static) deployments that limits the total number of cores across the whole cluster. So the core limit per executor can be derived as spark.cores.max/spark.executor.cores. Should we consider using spark.cores.max instead of coming up with a Kubernetes specific one, given that we honor spark.executor.cores? This is particularly an issue if an application that already uses spark.cores.max is being migrated from a standalone Spark cluster to run on Kubernetes. It is reasonable to assume that the user of the application expects the same behavior here.

@ash211
Copy link

ash211 commented Jun 22, 2017

I don't think we can rely on always using spark.cores.max since it's not set by all applications. Especially those migrating from YARN, where that setting has no effect.

And the arithmetic spark.cores.max/spark.executor.cores would provide the count of executors, rather than the core limit per executor. That's (total cores across cluster) / (cores per executor) = (executor count). Possibly as a different default when spark.cores.max is set we could change the default per-executor cpu limit from unbounded to bounded at spark.cores.max, which applies the whole-cluster cpu limit also as the per-executor cpu limit.

But because cpu is a compressible resource, I think a better default would be unlimited cpu.

@liyinan926
Copy link
Member

My bad, it should be spark.core.max/executor-instances. I agree that the default should be unlimited and the PR is doing the right thing. I'm just concerned about introducing yet another config key who's purpose can be served by using spark.cores.max. It's certainly true that spark.cores.max is not always set. But with a sensible default of unlimited, it's not really an issue. We just need to document it clearly so people know what to do when they do need to limit the cores per executor.

@kimoonkim
Copy link
Member

@liyinan926 Thanks for debugging this executor core limit issue. This is quite interesting.

it should be spark.core.max/executor-instances.

I am curious how this would work with dynamic allocation when the number of executors varies as the job goes. Is this suggested only for static allocation? Then, do we need another flag anyway if we want to limit max cores per executor for dynamic allocation?

@liyinan926
Copy link
Member

@kimoonkim That's a good point. Changing the number of executors will cause the limit per executor to change as well if the value of spark.cores.max stays the same. With Kubernetes, it's probably not possible because the cpu limit is specified when an executor pod gets created.

@tangzhankun
Copy link

@sandflee Have you tested the result with your new patch?

If I remember correctly, the spec.containers[].resources.requests.cpu is converted to --cpu-shares flag in the docker run command while spec.containers[].resources.limits.cpu is converted to --cpu-quota. So I think that we should declare -cpu-shares with k8s client instead of hard limit cpu time(cpu-quota) if we want to utilize other idle cpu cores?

@FANNG1
Copy link
Author

FANNG1 commented Jun 23, 2017

yes, I have test the patch, set hard limit is a user option with default no limit. user could limit cpu usage if needed, such as co-running a online service and a spark job.

@ash211 ash211 changed the title jobs on k8s are slow K8s jobs slow due to hard cpu limits Jun 23, 2017
@ash211
Copy link

ash211 commented Jun 23, 2017

Closed by #356

@ash211 ash211 closed this as completed Jun 23, 2017
@ash211
Copy link

ash211 commented Jun 23, 2017

Thanks again for debugging this @sandflee ! Happy to discuss any more performance discrepancies you find vs YARN or any other improvements you might find in future issues.

@tangzhankun
Copy link

@sandflee I checked the patch and thanks!

ifilonenko pushed a commit to ifilonenko/spark that referenced this issue Feb 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants