prometheus-server CrashLoopBackOff #15742

ratnakarreddyg · 2019-07-22T10:25:37Z

Describe the bug
A clear and concise description of what the bug is.

Version of Helm and Kubernetes:
Helm Version: v2.14.2
Kubernetes Version: v1.15.0

Which chart:
stable/prometheus

What happened:
Pod named prometheus-server-66fbdff99b-z4vbj always in CrashLoopBackOff state

What you expected to happen:
prometheus-server pod supposed to start and running

How to reproduce it (as minimally and precisely as possible):
helm install stable/prometheus --name prometheus --namespace prometheus --set server.global.scrape_interval=5s,server.global.evaluation_interval=5s

Anything else we need to know:

marinakog · 2019-07-30T22:49:48Z

I have the same issue: here describe of pod:Name: prometheus-server-55479c9d54-6gh9t
Namespace: monitoring
Priority: 0
PriorityClassName:
Node: phx3187268/100.111.143.19
Start Time: Tue, 30 Jul 2019 14:38:10 -0700
Labels: app=prometheus
chart=prometheus-8.15.0
component=server
heritage=Tiller
pod-template-hash=1103575810
release=prometheus
Annotations:
Status: Running
IP: 192.168.0.30
Controlled By: ReplicaSet/prometheus-server-55479c9d54
Containers:
prometheus-server-configmap-reload:
Container ID: docker://405fd0c96cb567d3182a7e6d2baa1d6ff5c7ae062fe79f7f3b8ceebc3032ec46
Image: jimmidyson/configmap-reload:v0.2.2
Image ID: docker-pullable://jimmidyson/configmap-reload@sha256:befec9f23d2a9da86a298d448cc9140f56a457362a7d9eecddba192db1ab489e
Port:
Host Port:
Args:
--volume-dir=/etc/config
--webhook-url=http://127.0.0.1:9090/-/reload
State: Running
Started: Tue, 30 Jul 2019 14:38:25 -0700
Ready: True
Restart Count: 0
Environment:
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-server-token-kb7pt (ro)
prometheus-server:
Container ID: docker://d5d45806e69bda9abfad75a6210d03ad7d6e9ecbc292de51af56440fc95cf162
Image: prom/prometheus:v2.11.1
Image ID: docker-pullable://prom/prometheus@sha256:8f34c18cf2ccaf21e361afd18e92da2602d0fa23a8917f759f906219242d8572
Port: 9090/TCP
Host Port: 0/TCP
Args:
--storage.tsdb.retention.time=15d
--config.file=/etc/config/prometheus.yml
--storage.tsdb.path=/data
--web.console.libraries=/etc/prometheus/console_libraries
--web.console.templates=/etc/prometheus/consoles
--web.enable-lifecycle
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 30 Jul 2019 14:38:52 -0700
Finished: Tue, 30 Jul 2019 14:38:52 -0700
Ready: False
Restart Count: 2
Liveness: http-get http://:9090/-/healthy delay=30s timeout=30s period=10s #success=1 #failure=3
Readiness: http-get http://:9090/-/ready delay=30s timeout=30s period=10s #success=1 #failure=3
Environment:
Mounts:
/data from storage-volume (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-server-token-kb7pt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-server
Optional: false
storage-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: pvc-prometheus
ReadOnly: false
prometheus-server-token-kb7pt:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-server-token-kb7pt
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message

Normal Scheduled 47s default-scheduler Successfully assigned monitoring/prometheus-server-55479c9d54-6gh9t to phx3187268
Normal Pulling 44s kubelet, phx3187268 pulling image "jimmidyson/configmap-reload:v0.2.2"
Normal Pulled 32s kubelet, phx3187268 Successfully pulled image "jimmidyson/configmap-reload:v0.2.2"
Normal Created 32s kubelet, phx3187268 Created container
Normal Started 32s kubelet, phx3187268 Started container
Normal Pulling 32s kubelet, phx3187268 pulling image "prom/prometheus:v2.11.1"
Normal Pulled 26s kubelet, phx3187268 Successfully pulled image "prom/prometheus:v2.11.1"
Warning BackOff 20s (x3 over 23s) kubelet, phx3187268 Back-off restarting failed container
Normal Created 5s (x3 over 25s) kubelet, phx3187268 Created container
Normal Started 5s (x3 over 25s) kubelet, phx3187268 Started container
Normal Pulled 5s (x2 over 24s) kubelet, phx3187268 Container image "prom/prometheus:v2.11.1" already present on machine
Warning DNSConfigForming 4s (x8 over 45s) kubelet, phx3187268 Search Line limits were exceeded, some search paths have been omitted, the applied search line is: monitoring.svc.cluster.local svc.cluster.local cluster.local devweblogicphx.oraclevcn.com subnet3ad3phx.devweblogicphx.oraclevcn.com us.oracle.com

taylorfturner · 2019-07-31T18:50:31Z

Seeing a very similar type of activity on my dask-scheduler pod when implementing stable/dask in a ticket that I opened #15979

marinakog · 2019-07-31T20:41:32Z

Here the log
[opc@marina-kogan-sandbox prometheus]$ kubectl -n monitoring logs prometheus-server-5bc5568444-5s8bk -c prometheus-server
level=info ts=2019-07-31T18:24:39.386Z caller=main.go:329 msg="Starting Prometheus" version="(version=2.11.1, branch=HEAD, revision=e5b22494857deca4b806f74f6e3a6ee30c251763)"
level=info ts=2019-07-31T18:24:39.386Z caller=main.go:330 build_context="(go=go1.12.7, user=root@d94406f2bb6f, date=20190710-13:51:17)"
level=info ts=2019-07-31T18:24:39.386Z caller=main.go:331 host_details="(Linux 4.14.35-1902.2.0.el7uek.x86_64 #2 SMP Fri Jun 14 21:15:44 PDT 2019 x86_64 prometheus-server-5bc5568444-5s8bk (none))"
level=info ts=2019-07-31T18:24:39.386Z caller=main.go:332 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-07-31T18:24:39.386Z caller=main.go:333 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-07-31T18:24:39.387Z caller=main.go:652 msg="Starting TSDB ..."
level=info ts=2019-07-31T18:24:39.387Z caller=web.go:448 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2019-07-31T18:24:39.388Z caller=main.go:521 msg="Stopping scrape discovery manager..."
level=info ts=2019-07-31T18:24:39.388Z caller=main.go:535 msg="Stopping notify discovery manager..."
level=info ts=2019-07-31T18:24:39.388Z caller=main.go:557 msg="Stopping scrape manager..."
level=info ts=2019-07-31T18:24:39.388Z caller=main.go:531 msg="Notify discovery manager stopped"
level=info ts=2019-07-31T18:24:39.388Z caller=main.go:517 msg="Scrape discovery manager stopped"
level=info ts=2019-07-31T18:24:39.388Z caller=main.go:551 msg="Scrape manager stopped"
level=info ts=2019-07-31T18:24:39.388Z caller=manager.go:776 component="rule manager" msg="Stopping rule manager..."
level=info ts=2019-07-31T18:24:39.388Z caller=manager.go:782 component="rule manager" msg="Rule manager stopped"
level=info ts=2019-07-31T18:24:39.388Z caller=notifier.go:602 component=notifier msg="Stopping notification manager..."
level=info ts=2019-07-31T18:24:39.388Z caller=main.go:722 msg="Notifier manager stopped"
level=error ts=2019-07-31T18:24:39.391Z caller=main.go:731 err="opening storage failed: lock DB directory: open /data/lock: permission denied"

stale · 2019-08-30T21:24:24Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

nightcatnl · 2019-09-04T10:06:44Z

having the same issue

Events:
Type Reason Age From Message

Normal Scheduled 7m21s default-scheduler Successfully assigned monitoring/prometheus-server-75959db9-5v6dm to docker02
Normal Pulled 7m20s kubelet, docker02 Container image "jimmidyson/configmap-reload:v0.2.2" already present on machine
Normal Created 7m20s kubelet, docker02 Created container prometheus-server-configmap-reload
Normal Started 7m20s kubelet, docker02 Started container prometheus-server-configmap-reload
Normal Pulled 6m30s (x4 over 7m20s) kubelet, docker02 Container image "prom/prometheus:v2.11.1" already present on machine
Normal Created 6m30s (x4 over 7m20s) kubelet, docker02 Created container prometheus-server
Normal Started 6m30s (x4 over 7m20s) kubelet, docker02 Started container prometheus-server
Warning BackOff 2m19s (x27 over 7m19s) kubelet, docker02 Back-off restarting failed container

elliotpryde · 2019-09-06T21:08:45Z

Also seeing this when simply running helm install stable/prometheus.

Helm Version: v2.14.3
Kubernetes Version: v1.14.6

level=info ts=2019-09-06T21:03:04.361Z caller=main.go:329 msg="Starting Prometheus" version="(version=2.11.1, branch=HEAD, revision=e5b22494857deca4b806f74f6e3a6ee30c251763)"
level=info ts=2019-09-06T21:03:04.361Z caller=main.go:330 build_context="(go=go1.12.7, user=root@d94406f2bb6f, date=20190710-13:51:17)"
level=info ts=2019-09-06T21:03:04.361Z caller=main.go:331 host_details="(Linux 4.9.184-linuxkit #1 SMP Tue Jul 2 22:58:16 UTC 2019 x86_64 kissing-warthog-prometheus-server-b94c6d879-n8jj9 (none))"
level=info ts=2019-09-06T21:03:04.361Z caller=main.go:332 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-09-06T21:03:04.361Z caller=main.go:333 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-09-06T21:03:04.362Z caller=web.go:448 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2019-09-06T21:03:04.362Z caller=main.go:652 msg="Starting TSDB ..."
level=info ts=2019-09-06T21:03:04.363Z caller=main.go:521 msg="Stopping scrape discovery manager..."
level=info ts=2019-09-06T21:03:04.363Z caller=main.go:535 msg="Stopping notify discovery manager..."
level=info ts=2019-09-06T21:03:04.363Z caller=main.go:557 msg="Stopping scrape manager..."
level=info ts=2019-09-06T21:03:04.363Z caller=main.go:531 msg="Notify discovery manager stopped"
level=info ts=2019-09-06T21:03:04.363Z caller=main.go:517 msg="Scrape discovery manager stopped"
level=info ts=2019-09-06T21:03:04.363Z caller=main.go:551 msg="Scrape manager stopped"
level=info ts=2019-09-06T21:03:04.363Z caller=manager.go:776 component="rule manager" msg="Stopping rule manager..."
level=info ts=2019-09-06T21:03:04.363Z caller=manager.go:782 component="rule manager" msg="Rule manager stopped"
level=info ts=2019-09-06T21:03:04.363Z caller=notifier.go:602 component=notifier msg="Stopping notification manager..."
level=info ts=2019-09-06T21:03:04.363Z caller=main.go:722 msg="Notifier manager stopped"
level=error ts=2019-09-06T21:03:04.364Z caller=main.go:731 err="opening storage failed: lock DB directory: open /data/lock: permission denied"

ivanstreams · 2019-10-01T17:47:56Z

I'm seeing the same problem. Is there a solution or workaround?

tjg184 · 2019-10-17T18:32:16Z

Same problem here using helm install stable/prometheus.

caller=main.go:731 err="opening storage failed: lock DB directory: open /data/lock: permission denied"

ivanstreams · 2019-10-17T19:39:11Z

Tried using "server.skipTSDBLock=true". It bypasses that step, but fails in the next:

main.go:731 err="opening storage failed: create dir: mkdir /data/wal: permission denied"

Then tried using server.persistentVolume.mountPath=/tmp as a test and it also fails:

main.go:731 err="opening storage failed: create dir: mkdir /tmp/wal: permission denied"

KshamaG · 2019-10-21T09:40:20Z

I was seeing the same error . I was able to resolve the issue by applying the workaround given here.
Note: Replace prometheus-alertmanager with prometheus-server in the workaround steps.

guswns531 · 2019-10-22T01:48:55Z

I solved this problem with the below way.

kubectl edit deploy prometheus-server -n prometheus

from

      securityContext:
        fsGroup: 65534
        runAsGroup: 65534
        runAsNonRoot: true
        runAsUser: 65534

to

  securityContext:
    fsGroup: 0
    runAsGroup: 0
    runAsUser: 0

honestly, I am not sure this change won't cause another problem. but now it works

mram0509 · 2019-11-16T03:25:12Z

I am having the same issue - Changing securityContext does not fix it.
Has any body found a workaround for this?

stale · 2019-12-16T03:34:51Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale · 2019-12-30T03:41:34Z

This issue is being automatically closed due to inactivity.

ZacHaque · 2020-06-25T16:28:12Z

nice worked setting -

securityContext:
fsGroup: 0
runAsGroup: 0
runAsUser: 0

uweeby · 2020-07-31T19:50:43Z

this fixed my issue also

securityContext:
fsGroup: 0
runAsGroup: 0
runAsUser: 0

so is there some other part of the setup/config that is expected to be done ahead of time thats missing?

smakintel · 2021-06-14T08:11:04Z

Worked for me. (using Rancher , edited prometheus-server deployment YAML file)

      securityContext:
        fsGroup: 0
        runAsGroup: 0
        runAsNonRoot: false
        runAsUser: 0

tutstechnology · 2021-10-14T14:30:13Z

That worked for me. Thank you very much.

  securityContext:
    fsGroup: 0
    runAsGroup: 0
    runAsNonRoot: false
    runAsUser: 0

I did the installation via helm, and edited the values file and put these values.

stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 30, 2019

stale bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 4, 2019

stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 16, 2019

stale bot closed this as completed Dec 30, 2019

ahlfors mentioned this issue Mar 27, 2023

[BUG]Pod kb-addon-prometheus-server in CrashLoopBackOff because too many open files apecloud/kubeblocks#2244

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prometheus-server CrashLoopBackOff #15742

prometheus-server CrashLoopBackOff #15742

ratnakarreddyg commented Jul 22, 2019

marinakog commented Jul 30, 2019

taylorfturner commented Jul 31, 2019

marinakog commented Jul 31, 2019

stale bot commented Aug 30, 2019

nightcatnl commented Sep 4, 2019

elliotpryde commented Sep 6, 2019

ivanstreams commented Oct 1, 2019

tjg184 commented Oct 17, 2019

ivanstreams commented Oct 17, 2019

KshamaG commented Oct 21, 2019 •

edited

Loading

guswns531 commented Oct 22, 2019 •

edited

Loading

mram0509 commented Nov 16, 2019

stale bot commented Dec 16, 2019

stale bot commented Dec 30, 2019

ZacHaque commented Jun 25, 2020 •

edited

Loading

uweeby commented Jul 31, 2020 •

edited

Loading

smakintel commented Jun 14, 2021 •

edited

Loading

tutstechnology commented Oct 14, 2021 •

edited

Loading

prometheus-server CrashLoopBackOff #15742

prometheus-server CrashLoopBackOff #15742

Comments

ratnakarreddyg commented Jul 22, 2019

marinakog commented Jul 30, 2019

taylorfturner commented Jul 31, 2019

marinakog commented Jul 31, 2019

stale bot commented Aug 30, 2019

nightcatnl commented Sep 4, 2019

elliotpryde commented Sep 6, 2019

ivanstreams commented Oct 1, 2019

tjg184 commented Oct 17, 2019

ivanstreams commented Oct 17, 2019

KshamaG commented Oct 21, 2019 • edited Loading

guswns531 commented Oct 22, 2019 • edited Loading

mram0509 commented Nov 16, 2019

stale bot commented Dec 16, 2019

stale bot commented Dec 30, 2019

ZacHaque commented Jun 25, 2020 • edited Loading

uweeby commented Jul 31, 2020 • edited Loading

smakintel commented Jun 14, 2021 • edited Loading

tutstechnology commented Oct 14, 2021 • edited Loading

KshamaG commented Oct 21, 2019 •

edited

Loading

guswns531 commented Oct 22, 2019 •

edited

Loading

ZacHaque commented Jun 25, 2020 •

edited

Loading

uweeby commented Jul 31, 2020 •

edited

Loading

smakintel commented Jun 14, 2021 •

edited

Loading

tutstechnology commented Oct 14, 2021 •

edited

Loading