Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs and wrestic containers don't complete #547

Closed
jenwirth opened this issue Oct 29, 2021 · 8 comments
Closed

Jobs and wrestic containers don't complete #547

jenwirth opened this issue Oct 29, 2021 · 8 comments
Labels
bug Something isn't working

Comments

@jenwirth
Copy link

jenwirth commented Oct 29, 2021

Description

Hi,

I created the a schedule (see below) and it is performing backup (in my case of a database). However the jobs never complete. I am using a minio instance on prem.

NAME                                  COMPLETIONS   DURATION   AGE
backup-backup-schedule-backup-5kghh   0/1           23m        23m
backup-backup-schedule-backup-9k8qd   0/1           28m        28m
backup-backup-schedule-backup-jcmzm   0/1           8m22s      8m23s
backup-backup-schedule-backup-kxznx   0/1           13m        13m
backup-backup-schedule-backup-pzl5x   0/1           18m        18m
backup-backup-schedule-backup-vxv9x   0/1           38m        38m
backup-backup-schedule-backup-zcsbm   0/1           3m23s      3m23s
backup-backup-schedule-backup-zs48r   0/1           33m        33m

This is due to the wrestic containers not completing (i assume):

NAME                                        READY   STATUS     RESTARTS   AGE
backup-backup-schedule-backup-5kghh-fq268   1/2     NotReady   1          25m
backup-backup-schedule-backup-9k8qd-5jq4c   1/2     NotReady   1          30m
backup-backup-schedule-backup-jcmzm-wl2h5   1/2     NotReady   1          10m
backup-backup-schedule-backup-kxznx-5rvcl   1/2     NotReady   1          15m
backup-backup-schedule-backup-mrmc4-b5xpg   1/2     NotReady   1          16s
backup-backup-schedule-backup-pzl5x-z5crx   1/2     NotReady   1          20m
backup-backup-schedule-backup-vxv9x-gv8k8   1/2     NotReady   2          40m
backup-backup-schedule-backup-zcsbm-5w95g   1/2     NotReady   1          5m17s
backup-backup-schedule-backup-zs48r-fjpdx   1/2     NotReady   1          35m

Additional Context

The definition of the schedule:

apiVersion: backup.appuio.ch/v1alpha1
kind: Schedule
metadata:
  name: backup-schedule
spec:
  failedJobsHistoryLimit: 3
  successfulJobsHistoryLimit: 2
  backup:
    schedule: '*/5 * * * *'
    failedJobsHistoryLimit: 2
    successfulJobsHistoryLimit: 2
  prune:
    schedule: '0 1 * * 0'
    retention:
      keepLast: 5
      keepDaily: 14
  backend:
    repoPasswordSecretRef:
      name: backup-credentials
      key: repo-password
    s3:
      endpoint: REDACTED
      bucket: k8s-backups
      secretAccessKeySecretRef:
        name: backup-credentials
        key: password
      accessKeyIDSecretRef:
        name: backup-credentials
        key: username

Logs

I1029 07:30:04.936046       1 main.go:43] wrestic "level"=0 "msg"="Wrestic Version: v0.3.2-dirty"  
I1029 07:30:04.936405       1 main.go:44] wrestic "level"=0 "msg"="Operator Build Date: Fri Jul 30 12:58:49 UTC 2021"  
I1029 07:30:04.936413       1 main.go:45] wrestic "level"=0 "msg"="Go Version: go1.16.6"  
I1029 07:30:04.936419       1 main.go:46] wrestic "level"=0 "msg"="Go OS/Arch: linux/amd64"  
I1029 07:30:04.936879       1 main.go:212] wrestic "level"=0 "msg"="setting up a signal handler"  
I1029 07:30:04.937000       1 command.go:57] wrestic/RepoInit/command "level"=0 "msg"="restic command"  "args"=["init"] "path"="/usr/local/bin/restic"
I1029 07:30:04.937062       1 command.go:83] wrestic/RepoInit/command "level"=0 "msg"="Defining RESTIC_PROGRESS_FPS"  "frequency"=0.016666666666666666
I1029 07:30:05.234127       1 unlock.go:10] wrestic/unlock "level"=0 "msg"="unlocking repository"  "all"=false
I1029 07:30:05.234177       1 command.go:57] wrestic/unlock/command "level"=0 "msg"="restic command"  "args"=["unlock"] "path"="/usr/local/bin/restic"
I1029 07:30:05.234218       1 command.go:83] wrestic/unlock/command "level"=0 "msg"="Defining RESTIC_PROGRESS_FPS"  "frequency"=0.016666666666666666
E1029 07:30:05.966065       1 logging.go:122] wrestic/unlock/restic "msg"="created new cache in /root/.cache/restic" "error"="error during command"  
E1029 07:30:05.994062       1 logging.go:122] wrestic/unlock/restic "msg"="successfully removed locks" "error"="error during command"  
I1029 07:30:05.999037       1 snapshots.go:39] wrestic/snapshots "level"=0 "msg"="getting list of snapshots"  
I1029 07:30:05.999083       1 command.go:57] wrestic/snapshots/command "level"=0 "msg"="restic command"  "args"=["snapshots","--json"] "path"="/usr/local/bin/restic"
I1029 07:30:05.999103       1 command.go:83] wrestic/snapshots/command "level"=0 "msg"="Defining RESTIC_PROGRESS_FPS"  "frequency"=0.016666666666666666
I1029 07:30:07.157646       1 pod_list.go:50] wrestic/k8sClient "level"=0 "msg"="listing all pods"  "annotation"="k8up.syn.tools/backupcommand" "namespace"="fin-val"
I1029 07:30:07.178308       1 pod_list.go:80] wrestic/k8sClient "level"=0 "msg"="adding to backup list"  "namespace"="fin-val" "pod"="risk-db"
I1029 07:30:07.179073       1 pod_exec.go:45] wrestic/k8sExec "level"=0 "msg"="executing command"  "command"="/bin/bash, -c, mysqldump -uroot -p\"${DB_BACKUP_PASSWORD}\" --all-databases" "namespace"="fin-val" "pod"="risk-db"
I1029 07:30:07.179204       1 stdinbackup.go:16] wrestic/stdinBackup "level"=0 "msg"="starting stdin backup"  "extension"="" "filename"="/fin-val-mariadb"
I1029 07:30:07.179227       1 command.go:57] wrestic/stdinBackup/command "level"=0 "msg"="restic command"  "args"=["backup","--json","--stdin","--stdin-filename","/fin-val-mariadb","--host","fin-val"] "path"="/usr/local/bin/restic"
I1029 07:30:07.179239       1 command.go:83] wrestic/stdinBackup/command "level"=0 "msg"="Defining RESTIC_PROGRESS_FPS"  "frequency"=0.016666666666666666
I1029 07:30:08.515120       1 logging.go:160] wrestic/stdinBackup/progress "level"=0 "msg"="backup finished"  "changed files"=1 "errors"=0 "new files"=0
I1029 07:30:08.515161       1 logging.go:161] wrestic/stdinBackup/progress "level"=0 "msg"="stats"  "bytes added"=12759 "bytes processed"=1455146 "time"=0.613062834
E1029 07:30:08.515213       1 stats.go:206] wrestic/MountCollector "msg"="can't list mounted folders for stats" "error"="open /data: no such file or directory"  
I1029 07:30:08.545395       1 main.go:157] wrestic "level"=0 "msg"="backups of annotated jobs have finished successfully"  
I1029 07:30:08.545427       1 backup.go:21] wrestic/backup "level"=0 "msg"="starting backup"  
I1029 07:30:08.545444       1 backup.go:24] wrestic/backup "level"=0 "msg"="backupdir does not exist, skipping"  "dirname"="/data"


### Expected Behavior

For the containers/pods/jobs to complete

### Version of K8up

v1.2.0

### Version of Kubernetes

1.20.11

### Distribution of Kubernetes

on prem
@jenwirth jenwirth added the bug Something isn't working label Oct 29, 2021
@tobru
Copy link
Contributor

tobru commented Oct 29, 2021

Would you mind testing this again with the recently released v2.0.0 of K8up?

To get ideas what could be wrong we need more information:

  • A kubectl describe $pod as it seems that there is something preventing the Pod to start correctly (READY 1/2).
  • Logs of the failing Pod containers.

@jenwirth
Copy link
Author

jenwirth commented Oct 29, 2021

The log for one of the wrestic containers was included in the original message. A describe of the generated Deployment can be found below. I notice now there's a Readiness probe failing.

I can't find the helm release for v2.0.0, so i can't update yet.

Name:         backup-backup-schedule-backup-5kghh-fq268
Namespace:    fin-val
Priority:     0
Node:         k8s-node-11/10.0.0.6
Start Time:   Fri, 29 Oct 2021 09:30:01 +0200
Labels:       controller-uid=494189a4-2130-44ef-b010-ae0f5cbdecaa
              job-name=backup-backup-schedule-backup-5kghh
              security.istio.io/tlsMode=istio
              service.istio.io/canonical-name=backup-backup-schedule-backup-5kghh
              service.istio.io/canonical-revision=latest
Annotations:  cni.projectcalico.org/podIP: 192.168.246.78/32
              cni.projectcalico.org/podIPs: 192.168.246.78/32
              kubectl.kubernetes.io/default-container: backup
              kubectl.kubernetes.io/default-logs-container: backup
              prometheus.io/path: /stats/prometheus
              prometheus.io/port: 15020
              prometheus.io/scrape: true
              sidecar.istio.io/status:
                {"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-data","istio-podinfo","istio-token","istiod-...
Status:       Running
IP:           192.168.246.78
IPs:
  IP:           192.168.246.78
Controlled By:  Job/backup-backup-schedule-backup-5kghh
Init Containers:
  istio-init:
    Container ID:  cri-o://6ceaf9bbd531162c161838bc921040dfee2090963718a2c29da0c91c90a30917
    Image:         docker.io/istio/proxyv2:1.11.3
    Image ID:      docker.io/istio/proxyv2@sha256:28513eb3706315b26610a53e0d66b29b09a334e3164393b9a0591f34fe47a6fd
    Port:          <none>
    Host Port:     <none>
    Args:
      istio-iptables
      -p
      15001
      -z
      15006
      -u
      1337
      -m
      REDIRECT
      -i
      *
      -x
      
      -b
      *
      -d
      15090,15021,15020
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 29 Oct 2021 09:30:02 +0200
      Finished:     Fri, 29 Oct 2021 09:30:02 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  1Gi
    Requests:
      cpu:        100m
      memory:     128Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from pod-executor-token-49drf (ro)
Containers:
  backup:
    Container ID:   cri-o://4ecf0a2003629bbbbd8dd6b44382093321a5037738f86391d74e604351f968fe
    Image:          quay.io/vshn/wrestic:v0.3.2
    Image ID:       quay.io/vshn/wrestic@sha256:ca6088a60c2d375f6ad69111396da86d4b8064582b82e32ab7a6385fea0b76d3
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 29 Oct 2021 09:30:04 +0200
      Finished:     Fri, 29 Oct 2021 09:30:08 +0200
    Ready:          False
    Restart Count:  1
    Environment:
      RESTIC_PASSWORD:           <set to the key 'password' in secret 'backup-credentials'>  Optional: false
      AWS_ACCESS_KEY_ID:         <set to the key 'username' in secret 'backup-credentials'>  Optional: false
      RESTIC_REPOSITORY:         s3:REDACTED
      PROM_URL:                  
      FILEEXTENSION_ANNOTATION:  k8up.syn.tools/file-extension
      AWS_SECRET_ACCESS_KEY:     <set to the key 'password' in secret 'backup-credentials'>  Optional: false
      STATS_URL:                 
      BACKUPCOMMAND_ANNOTATION:  k8up.syn.tools/backupcommand
      HOSTNAME:                  REDACTED
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from pod-executor-token-49drf (ro)
  istio-proxy:
    Container ID:  cri-o://0d36f8b97aa68801a32f3c375adbd130ffdfefd5788ea08b8bf601fda600100f
    Image:         docker.io/istio/proxyv2:1.11.3
    Image ID:      docker.io/istio/proxyv2@sha256:28513eb3706315b26610a53e0d66b29b09a334e3164393b9a0591f34fe47a6fd
    Port:          15090/TCP
    Host Port:     0/TCP
    Args:
      proxy
      sidecar
      --domain
      $(POD_NAMESPACE).svc.cluster.local
      --proxyLogLevel=warning
      --proxyComponentLogLevel=misc:error
      --log_output_level=default:info
      --concurrency
      2
    State:          Running
      Started:      Fri, 29 Oct 2021 09:30:03 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   128Mi
    Readiness:  http-get http://:15021/healthz/ready delay=1s timeout=3s period=2s #success=1 #failure=30
    Environment:
      JWT_POLICY:                    third-party-jwt
      PILOT_CERT_PROVIDER:           istiod
      CA_ADDR:                       istiod.istio-system.svc:15012
      POD_NAME:                      backup-backup-schedule-backup-5kghh-fq268 (v1:metadata.name)
      POD_NAMESPACE:                 fin-val (v1:metadata.namespace)
      INSTANCE_IP:                    (v1:status.podIP)
      SERVICE_ACCOUNT:                (v1:spec.serviceAccountName)
      HOST_IP:                        (v1:status.hostIP)
      PROXY_CONFIG:                  {}
                                     
      ISTIO_META_POD_PORTS:          [
                                     ]
      ISTIO_META_APP_CONTAINERS:     backup
      ISTIO_META_CLUSTER_ID:         Kubernetes
      ISTIO_META_INTERCEPTION_MODE:  REDIRECT
      ISTIO_META_WORKLOAD_NAME:      backup-backup-schedule-backup-5kghh
      ISTIO_META_OWNER:              kubernetes://apis/batch/v1/namespaces/fin-val/jobs/backup-backup-schedule-backup-5kghh
      ISTIO_META_MESH_ID:            cluster.local
      TRUST_DOMAIN:                  cluster.local
    Mounts:
      /etc/istio/pod from istio-podinfo (rw)
      /etc/istio/proxy from istio-envoy (rw)
      /var/lib/istio/data from istio-data (rw)
      /var/run/secrets/istio from istiod-ca-cert (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from pod-executor-token-49drf (ro)
      /var/run/secrets/tokens from istio-token (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  istio-envoy:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  istio-data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  istio-podinfo:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.labels -> labels
      metadata.annotations -> annotations
  istio-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  43200
  istiod-ca-cert:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      istio-ca-root-cert
    Optional:  false
  pod-executor-token-49drf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  pod-executor-token-49drf
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  47m                 default-scheduler  Successfully assigned fin-val/backup-backup-schedule-backup-5kghh-fq268 to k8s-node-11
  Normal   Pulled     47m                 kubelet            Container image "docker.io/istio/proxyv2:1.11.3" already present on machine
  Normal   Created    47m                 kubelet            Created container istio-init
  Normal   Started    47m                 kubelet            Started container istio-init
  Normal   Pulled     47m                 kubelet            Container image "docker.io/istio/proxyv2:1.11.3" already present on machine
  Normal   Created    47m                 kubelet            Created container istio-proxy
  Normal   Started    47m                 kubelet            Started container istio-proxy
  Normal   Pulled     47m (x2 over 47m)   kubelet            Container image "quay.io/vshn/wrestic:v0.3.2" already present on machine
  Normal   Created    47m (x2 over 47m)   kubelet            Created container backup
  Normal   Started    47m (x2 over 47m)   kubelet            Started container backup
  Warning  Unhealthy  6m15s (x2 over 7m)  kubelet            Readiness probe failed: Get "http://192.168.246.78:15021/healthz/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

@jenwirth
Copy link
Author

it might be related to istio: istio/istio#11045

@jenwirth
Copy link
Author

This definitely looks like a side-car/istio problem. Is there a way for me to instrument the creation of the backup container. One work around is to change the command

istio/istio#6324 (comment)

@Kidswiss
Copy link
Contributor

@jenwirth Currently we don't provide such options.

Two options for this would come to my mind:

  • add a support istio flag to the CRDs which sets the command
  • expose the ability to set the commands

The second option could lead to breaking jobs though if the user configures the actual k8up commands wrongly.

@tobru
Copy link
Contributor

tobru commented Oct 29, 2021

I can't find the helm release for v2.0.0, so i can't update yet.

You're right, that one is coming soon appuio/charts#354

@jenwirth
Copy link
Author

For now i will manually delete the jobs (they will be daily).

If you add support for Istio, soon you will be adding flags for other mesh implementation. Perhaps we could add "pre/post commands" instead?

Also keep in mind that this is a workaround for a larger problem see kubernetes/kubernetes#25908 and kubernetes/enhancements#2872.

@ccremer
Copy link
Contributor

ccremer commented Nov 2, 2021

I can't find the helm release for v2.0.0, so i can't update yet.

You're right, that one is coming soon appuio/charts#354

Meanwhile Helm chart for v2.0 is out: https://github.com/appuio/charts/releases/tag/k8up-2.0.0

@ccremer ccremer closed this as completed Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants