%title: Kubernetes Workload Troubleshooting %author: Murat Karakaş %date: 05-05-2020
-> # Road Map <-
-
Fundemantal Commands
-
Pod Failures
-
Pod With Volumes
-
Selective Pod Placement
-
I can't access to my pod
-
Being Proactive
-> # Fundemantal Commands <-
-
"kubectl get po" => check pod status
-
"kubectl logs -f" => pod logs
-
"kubectl describe po/deploy/" => show details
-
"kubectl exec -it pod-id" sh/bash
-
"kubectl run -it busybox --image=yauritux/busybox-curl --rm --restart=Never -- sh" => Best friend
-
"kubectl get x --show-labels" => selectors
-> # Pod failures <-
-
Pod Status : CrashLoopBackOff
-
Pod Status : ErrImagePull
-
Pod Status : ImagePullBackOff
-
Pod Ready 0/1
-
Pod Restart count > 0
-
Pod Status: Pending
-> # Pod With Volumes <-
Common Scenario
-
Pod defines storage requirements with PersistentVolumeClaim
-
Claim auto creates or bounds to existing volume
-
Pod starts
+-----------+ +-----------+ +-----------+
| | defines | | storageClass | |
| Pod |----------->| PVC |-------------> matches ----> | PV |
| | | | capacity | |
+-----------+ +-----------+ +-----------+
If pvc does not matches/creates volume , pod will stucked in pending state
-> # Selective Pod Placement <-
Kubernetes provides several mechanisms for custom pod placement and scheduling
Some of them are basic, some of them gives more advenced controls
-
Node Selector
-
Affinity & AntiAffinity
-
Taint & Toleration
If pod placement or scheduling rule does not match , it will stucked in pending state
-> # I can't access to my pod <-
If pod need to be accessed from another pod or external application/user, typical solution is defining "Service" and optionaly "Ingress"(Extenal loadbalancer solution,better alternative to NodePort)
+-----------+ +------------+ +-----------+
| | network call | k8s-dns | resolve | |
| Pod |--------------->| + |-------------------> | Pod |
| | | kube-proxy | load balance | |
+-----------+ +------------+ +-----------+
+ +
Updates/Triggers + + Selectors
+--------------+
|
|
|
|
+-----------+
| |
| Service |
| |
+-----------+
-> # I can't access to my pod <-
Common issues:
-
Wrong service name or servive in another namespace
-
Pod selector does not match pod labels
-
Pods are not in ready state (Readiness probe!)
-
Service definition has in valid target port
-> # Being Proactive <-
-
Monitor Infrastructure & k8s components
-
Define alerts based on metrics
-
Always define resource requets and limits(must) for applications
-
Use liveness,readiness and startup probes
-
Centralized Logs
-
Set limits (defaults & namespace wide)
-
Dev/Prod parity