monasca agent running error #433

zhangjianweibj · 2018-11-05T08:15:55Z

kubernetes version:Kubernetes v1.9.5

zhangjianweibj · 2018-11-05T08:21:30Z

hello,i run command in kubernetes environment(1.9.5)
$ helm repo add monasca http://monasca.io/monasca-helm
$ helm install monasca/monasca --name monasca --namespace monitoring
find agent and aggregator pods crash,is anything wrong? thanks

timothyb89 · 2018-11-05T19:24:03Z

It looks like RBAC isn't turned on, have you set rbac.create to true in your helm values (e.g. by adding --set rbac.create=true to the install/upgrade command)?

zhangjianweibj · 2018-11-06T02:09:52Z

@timothyb89 very thanks,i reinstall monasca with rbac value.monasca works well.but aggregator and cleanup pods still crash.

zhangjianweibj · 2018-11-06T02:11:52Z

aggerator pod no error log.

timothyb89 · 2018-11-06T18:23:36Z

The cleanup job will have trouble as it's probably running with the old configuration that had no RBAC enabled - you can just delete the job (kubectl delete job monasca-cleanup-job-...) and any leftover pods manually. I'm not sure about the aggregator, did you check the previous log? e.g. kubectl logs -p monasca-aggregator-...

The most likely cause for the aggregator crashing is that it received no metrics. That's normal for the first hour or so after a fresh deployment, but if it keeps crashing it might be a sign of Kafka issues or the agent pods failing to collect any metrics. Logs for both of those would be helpful if things continue to crash.

zhangjianweibj · 2018-11-07T01:28:18Z

@timothyb89 very thanks.i will reinstall monasca.i think aggregator crashing is the pod resource limit and request too low.you see the aggregator pod restart 5 times.,but no error logs appear.

zhangjianweibj · 2018-11-07T06:15:15Z

why monasca-thresh pod restart 40 times in 4h?
thresh pod logs:

zhangjianweibj · 2018-11-07T06:25:39Z

timothyb89 · 2018-11-08T16:48:43Z

Hmm, I don't see any errors in those logs - are those the previous container logs (kubectl logs -p ...)?

It looks like thresh is running alright in that 2nd screenshot ("no left over resources ..." is unrelated to CPU/memory resources), so I think it's either being OOM killed and logging nothing or is actually running alright in that log and we need to look at the logs generated before the last crash.

zhangjianweibj · 2018-11-09T08:58:32Z

kubectl logs -p monasca-thresh-74758d6db-fk8zg -n monitoring ,result is:

resource limit:cpu 2 memory:2G,is resource too low to run thresh pod or jvm heap size not enough？

timothyb89 · 2018-11-09T18:46:16Z

That definitely should be enough memory, at least if you only have a few agents running. You might need more resources if you have more nodes (like 10+) but it's probably fine as-is.

Based on those errors, It looks like thresh is having trouble keeping its connection to zookeeper. Do the zookeeper logs show anything interesting? Possibly some network trouble between nodes/pods?

zhangjianweibj · 2018-11-12T01:06:35Z

ok,thanks.zookeeper pod contain many error logs.but it seems those errors can not cause thresh pod crashed.
kubectl logs monasca-zookeeper-5bc74dc5f-dk6zz -n monitoring |grep Error

zhangjianweibj · 2018-11-12T02:18:46Z

@timothyb89 i thank the reason is resource not enough for storm.i edit deployment and daemonset ,set resource.limit.cpu=4 resource.limit.memory=8G THRESH_STACK_SIZE=4096K.
now it is works well,not restart any more.

but i find some metircs contin negative number?is this a bug?

and slave6 has two agents

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

monasca agent running error #433

monasca agent running error #433

zhangjianweibj commented Nov 5, 2018

zhangjianweibj commented Nov 5, 2018

timothyb89 commented Nov 5, 2018

zhangjianweibj commented Nov 6, 2018

zhangjianweibj commented Nov 6, 2018

timothyb89 commented Nov 6, 2018

zhangjianweibj commented Nov 7, 2018

zhangjianweibj commented Nov 7, 2018

zhangjianweibj commented Nov 7, 2018

timothyb89 commented Nov 8, 2018

zhangjianweibj commented Nov 9, 2018

timothyb89 commented Nov 9, 2018

zhangjianweibj commented Nov 12, 2018

zhangjianweibj commented Nov 12, 2018

monasca agent running error #433

monasca agent running error #433

Comments

zhangjianweibj commented Nov 5, 2018

zhangjianweibj commented Nov 5, 2018

timothyb89 commented Nov 5, 2018

zhangjianweibj commented Nov 6, 2018

zhangjianweibj commented Nov 6, 2018

timothyb89 commented Nov 6, 2018

zhangjianweibj commented Nov 7, 2018

zhangjianweibj commented Nov 7, 2018

zhangjianweibj commented Nov 7, 2018

timothyb89 commented Nov 8, 2018

zhangjianweibj commented Nov 9, 2018

timothyb89 commented Nov 9, 2018

zhangjianweibj commented Nov 12, 2018

zhangjianweibj commented Nov 12, 2018