-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
waiting for ClickHouse cluster to become available #368
Comments
@guidoiaquinti Hi Guido, Do you have any idea here? Thanks |
👋 Hi @visla-xugeng! I'm sorry to hear about this issue. Can you please check both CH pods to see if there's anything out of the ordinary in the logs? We've been tracking a possible regression in the upstream It happens when the Can you please verify the above? The current workaround is to manually kill the operator pod. |
@guidoiaquinti Thanks for your response. I checked the logs, did not see what you mentioned.
From the log of chi-posthog-posthog-0-0-0, I see one error about zookeeper. However, from the zookeeper pod, I did not see any errors.
Log from zookeeper. (Assume the zookeeper is running up and ready)
|
Update: Troubleshooting came back to the original point. Let me re-organize my findings here.
2: The job, posthog-migrate-2022-05-10-09-24-27--1-zdlwx, still in running status not complete. It has 2 containers inside. One is wait-for-service-dependencies which is terminated and completed. I think this is good. The second container is migrate-job, which is still running but no log shown up. I think something got stuck, but could not figure it out.
3: Other pods, Event, plugins, web, worker, are all in "Init:1/2" status. All these pods have 3 containers inside. Only one is completed and terminated, which is wait-for-service-dependencies. Other two, wait-for-migrations (in running status) and posthog-events (in waiting status). I think this is related to the pod, posthog-migrate-2022-05-10-09-24-27--1-zdlwx. Since this pod could not finish it's job, so all other pods are waiting for the update. As a result, all of these pods cannot be up and running. In general, based on the analysis above, I think the issue is on the pod, posthog-migrate-2022-05-10-09-24-27--1-zdlwx. But I cannot dig deeper. Do you have any idea? |
|
describe of migrate pod
|
values.yaml
|
@hazzadous about editting command section "set -x", I tried several times all failed.
|
Thanks 🙏 Re editing indeed you’d need to apply this as a new manifest as command is immutable. It’s late here in London, I’ll have a look in the morning. One thing that is obviously interesting is that you have both Postgres enabled and externalpostres settings which isn’t the typical use and I’m not sure about the behaviour there |
Oops no that’s pgbouncer! I’ll look in more detail in the morning! |
One last thing that may not be relevant but I’ll mention anyway, the migrations do not run via pgbouncer iirc so you’ll need to make sure security groups are set such that Postgres is directly open to migration pods. although it looks like you are using the same node groups? |
@hazzadous Thanks a lot. I will update more details later. (Have a good night.) |
😊 |
@hazzadous some updates
2: I double checked the security group of my external postgresql, which allowed traffic from the whole subnet of pods. So, migration pod should be able to connect to my external postgresql directly. 3: Yes, all of the posthog pods are in the same node group. |
This issue has 2029 words at 14 comments. Issues this long are hard to read or contribute to, and tend to take very long to reach a conclusion. Instead, why not:
Is this issue intended to be sprawling? Consider adding label |
@visla-xugeng interestingly from your I would expect this to run very quickly. You can verify this by updating the job definition. I'm not sure if you can edit this or if you'd need to create a new one. Then remove the notify line. If that works then we need to figure out a way to make it not hang. It could be security groups, my go to whenever something is hanging! |
(you could also, while in the migrate pod, run the migration part manually.) |
@hazzadous Thanks for your update. Does the migrate pod also need to access the Postgres DB? I login the migrate pod shell and try to use psql but command not found in this pod. |
@visla-xugeng yes it access PostgreSQL directly. It doesn't have psql installed. You should be able to install it, but you might need to use an image that allows you to install things on it. But you could just try running the |
@hazzadous I found one interesting thing. Remember that I am using external redis, external postgresql when I install PostHog, then I get the issue. Today I test to skip the external redis and stay with the internal one(keep using the external posgresql), the installation go through smoothly without any problem. The migration pod run as expected. All pods are running up for now.
I tested several times, once I ONLY use internal redis, the installation can be finished without any problem. If enabled the external redis and disabled the internal one, the installation will fail. Any thoughts here? Thanks, |
@visla-xugeng ok sounds like the thing to do now is debug the connectivity between the migration pod and the external redis. Can you spin up a pod in local.node_group_name and try to connect? If not then make sure security groups are setup and that your able to route to the redis address. |
@hazzadous I checked the security group, it looks good.
If I remove --tls, I will not be able to build this connection. Is there any place to setup the TLS for the migrate pod?When I build a new Redis without password setting, the posthog chart can be installed without problem.
|
I don't think we have any setting for redis TLS afaik. That would need to be added to the chart/application although I'd need to look closer to verify that. |
@hazzadous I attached a screenshot of my redis, which disabled password. |
@visla-xugeng I know it's not an ideal solution but would, at least for now, using the provided in cluster Redis be an acceptable solution. There shouldn't be anything in there that requires durability so moving to ElastiCache would be relatively straight forward. Having said that it's still very annoying that it's not working. I have this working on our cluster 🤔 |
@hazzadous Thanks for the quick update. I will switch to the internal redis. Hope you can figure out why the external redis did not work as expected. |
I get this issue when trying a new fresh install of Posthog via Helm Chart... |
Bug description
I tried to install the PostHog in a brand new EKS in AWS by using helm commands. But still get several pods in Init:1/2 status.
Expected behavior
All pods should be up running
Actual behavior
Several pods in
Init:1/2
status and get some errors from podsHow to reproduce
Follow the instruction, you can get these errors
Environment
I deployed the chart in EKS on AWS
Additional context
logs from pod, posthog-migrate , container, wait-for-service-dependencies
logs from pod, posthog-events, container, wait-for-service-dependencies
The text was updated successfully, but these errors were encountered: