-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
monasca agent running error #433
Comments
hello,i run command in kubernetes environment(1.9.5) |
It looks like RBAC isn't turned on, have you set |
@timothyb89 very thanks,i reinstall monasca with rbac value.monasca works well.but aggregator and cleanup pods still crash. |
The cleanup job will have trouble as it's probably running with the old configuration that had no RBAC enabled - you can just delete the job ( The most likely cause for the aggregator crashing is that it received no metrics. That's normal for the first hour or so after a fresh deployment, but if it keeps crashing it might be a sign of Kafka issues or the agent pods failing to collect any metrics. Logs for both of those would be helpful if things continue to crash. |
@timothyb89 very thanks.i will reinstall monasca.i think aggregator crashing is the pod resource limit and request too low.you see the aggregator pod restart 5 times.,but no error logs appear. |
Hmm, I don't see any errors in those logs - are those the previous container logs ( It looks like thresh is running alright in that 2nd screenshot ("no left over resources ..." is unrelated to CPU/memory resources), so I think it's either being OOM killed and logging nothing or is actually running alright in that log and we need to look at the logs generated before the last crash. |
That definitely should be enough memory, at least if you only have a few agents running. You might need more resources if you have more nodes (like 10+) but it's probably fine as-is. Based on those errors, It looks like thresh is having trouble keeping its connection to zookeeper. Do the zookeeper logs show anything interesting? Possibly some network trouble between nodes/pods? |
@timothyb89 i thank the reason is resource not enough for storm.i edit deployment and daemonset ,set resource.limit.cpu=4 resource.limit.memory=8G THRESH_STACK_SIZE=4096K. but i find some metircs contin negative number?is this a bug? and slave6 has two agents |
kubernetes version:Kubernetes v1.9.5
The text was updated successfully, but these errors were encountered: