-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitnami/clickhouse] Connection to Clickhouse-Keeper broken/unresponsive #15935
Comments
Hi, Does the issue happen when using the zookeeper included in the chart? Just to pin-point where the issue could be |
Hi, I have it configured like this now keeper:
enabled: false zookeeper:
enabled: true
replicaCount: 3 And now the command just completes normally
|
Thanks @marcleibold for letting us know. Have you faced the issue with the default |
Hi @fmulero ,
|
Hi @marcleibold I've reproduced the same issue in a simpler scenario, just enabling keeper: helm install myrelease bitnami/clickhouse --set keeper.enabled=true --set zookeeper.enabled=false I've checked $ echo stat | nc localhost 2181
ClickHouse Keeper version: v23.3.1.2823-testing-46e85357ce2da2a99f56ee83a079e892d7ec3726
Clients:
10.42.1.26:45740(recved=0,sent=0)
10.42.1.26:49358(recved=5005,sent=5006)
Latency min/avg/max: 0/0/6
Received: 5005
Sent: 5006
Connections: 1
Outstanding: 0
Zxid: 961
Mode: follower
Node count: 80 It seems something is misconfigured about keeper. I need a further investigation, please bear with us. |
I think the issue may be here, I don't think |
It seems like that is the issue. I also do not see the
|
Although the variable should be set in this script. The line also works completely fine as I just tested inside of my container: I have no name!@clickhouse-replicated-shard1-0:/$ echo $KEEPER_SERVER_ID
I have no name!@clickhouse-replicated-shard1-0:/$ if [[ -f "/bitnami/clickhouse/keeper/data/myid" ]]; then
export KEEPER_SERVER_ID="$(cat /bitnami/clickhouse/keeper/data/myid)"
else
HOSTNAME="$(hostname -s)"
if [[ $HOSTNAME =~ (.*)-([0-9]+)$ ]]; then
export KEEPER_SERVER_ID=${BASH_REMATCH[2]}
else
echo "Failed to get index from hostname $HOST"
exit 1
fi fi
I have no name!@clickhouse-replicated-shard1-0:/$ echo $KEEPER_SERVER_ID
0
I have no name!@clickhouse-replicated-shard1-0:/$ The script is also present in the configmap and all, but it is apparently just not executed for some reason. |
Another thing I checked: There should be a process called |
Thanks a lot for all the clues! I did some changes and tests but it is taking me more than expected and I have also some issues with shards. I've just opened an internal task to address it. We will keep you posted on any news. |
Sorry, there is no updates on this 😞 |
Any workaround here? |
Not as far as I know, just use the built-in Zookeeper |
Is there any update? |
Sorry, there is no updates on this. I'll try to bump the priority but we are a small team we can't give you any ETA, sorry. |
Hi this issue is affecting us since we can't switch over to clickhouse-keeper completely and zookeeper isn't officially support by clickhouse anymore. |
This is not true. We still support ZooKeeper for the sake of backward compatibility and our users. However, ClickHouse Keeper proved to be much better and we've implemented several extensions which allow us to get better performance in certain scenarios. |
We have a support contract with clickhouse and they really want us to use clickhouse-keeper. |
Any updates? |
would like this to be fixed. |
I've just bumped the priority |
I have been messing with the chart and I am pretty sure the issue is that a set of keeper replicas is created for every shard. Looking over the documentation for shards and for replicas, I believe that all nodes should share a single set of keepers. Now whether the right thing to do is to create a separate statefulset of keepers (which would probably be easiest) or to only point servers to the keepers on shard 0, I will leave up to the maintainers. |
Any release date decided for the fix of this issue? |
Any updates? |
values.yaml <node>
- <host from_env="{{ printf "KEEPER_NODE_%d" $node }}"></host>
+ <host from_env="{{ printf "ZOOKEEPER_NODE_%d" $node }}"></host>
<port>{{ $.Values.service.ports.keeper }}</port>
</node> statefulset.yaml {{- if $.Values.keeper.enabled }}
{{- $replicas := $.Values.replicaCount | int }}
{{- range $j, $r := until $replicas }}
- name: {{ printf "KEEPER_NODE_%d" $j }}
value: {{ printf "%s-shard%d-%d.%s.%s.svc.%s" (include "common.names.fullname" $ ) $i $j (include "clickhouse.headlessServiceName" $) (include "common.names.namespace" $) $.Values.clusterDomain }}
+ - name: {{ printf "ZOOKEEPER_NODE_%d" $j }}
+ value: {{ printf "%s-shard%d-%d.%s.%s.svc.%s" (include "common.names.fullname" $ ) 0 $j (include "clickhouse.headlessServiceName" $) (include "common.names.namespace" $) $.Values.clusterDomain }}
{{- end }} |
Name and Version
bitnami/clickhouse 3.1.5
What architecture are you using?
amd64
What steps will reproduce the bug?
Result: Pods are running without any suspicious logs, but when you either exec into them or execute some command from the web UI, which is executed "ON CLUSTER", the progress indicator never goes past 49%. This was tried with a
CREATE TABLE
statement trying to create aReplicatedMergeTree
on the cluster.The Clickhouse cluster consists of 2 shards and 2 replicas.
Are you using any custom parameters or values?
Our
values.yaml
:What is the expected behavior?
The expected behaviour is normal creation of the tables within the
distributed_ddl_task_timeout
What do you see instead?
The table creation (tested with the
clickhouse-client
command after exec-ing into the pod) is stuck at 49% progress.When aborted, the table seems to have been created.
When trying to drop the tables, the same problem occurs:
The tables seem to have been created, but the command doesn't finish, therefore I believe Clickhouse-Keeper doesn't answer the command, but executes it.
When trying to create the table again, because I assumed Keeper executed the last command, the command tells me, that the replica already exists, not the table itself. So the problem seems to lay somewhere with the replicas
Additional information
No response
The text was updated successfully, but these errors were encountered: