[bitnami/clickhouse] Connection to Clickhouse-Keeper broken/unresponsive #15935

marcleibold · 2023-04-03T10:41:52Z

Name and Version

bitnami/clickhouse 3.1.5

What architecture are you using?

amd64

What steps will reproduce the bug?

In a GKE (Google Kubernetes Engine) Cluster
With the attached values.yaml
Apply with Terraform (shouldn't be any different than standard helm)

Result: Pods are running without any suspicious logs, but when you either exec into them or execute some command from the web UI, which is executed "ON CLUSTER", the progress indicator never goes past 49%. This was tried with a CREATE TABLE statement trying to create a ReplicatedMergeTree on the cluster.
The Clickhouse cluster consists of 2 shards and 2 replicas.

Are you using any custom parameters or values?

Our values.yaml:

fullnameOverride: clickhouse-replicated

# ClickHouse Parameters

image:
  registry: docker.io
  repository: bitnami/clickhouse
  tag: "23-debian-11"
  pullPolicy: IfNotPresent

shards: ${CLICKHOUSE_SHARDS_COUNT}
replicaCount: ${CLICKHOUSE_REPLICAS_COUNT}

containerPorts:
  http: 8123
  https: 8443
  tcp: 9000
  tcpSecure: 9440
  keeper: 2181
  keeperSecure: 3181
  keeperInter: 9444
  mysql: 9004
  postgresql: 9005
  interserver: 9009
  metrics: 8001

auth:
  username: clickhouse_operator
  password: "${CLICKHOUSE_PASSWORD}"

logLevel: trace

keeper:
  enabled: true

zookeeper:
  enabled: false

defaultConfigurationOverrides: |
  <clickhouse>
    <!-- Macros -->
    <macros>
      <shard from_env="CLICKHOUSE_SHARD_ID"></shard>
      <replica from_env="CLICKHOUSE_REPLICA_ID"></replica>
      <layer>{{ include "common.names.fullname" . }}</layer>
    </macros>
    <!-- Log Level -->
    <logger>
      <level>{{ .Values.logLevel }}</level>
    </logger>
    {{- if or (ne (int .Values.shards) 1) (ne (int .Values.replicaCount) 1)}}
    <!-- Cluster configuration - Any update of the shards and replicas requires helm upgrade -->
    <remote_servers>
      <default>
        {{- $shards := $.Values.shards | int }}
        {{- range $shard, $e := until $shards }}
        <shard>
            <internal_replication>true</internal_replication>
            {{- $replicas := $.Values.replicaCount | int }}
            {{- range $i, $_e := until $replicas }}
            <replica>
                <host>{{ printf "%s-shard%d-%d.%s.%s.svc.%s" (include "common.names.fullname" $ ) $shard $i (include "clickhouse.headlessServiceName" $) (include "common.names.namespace" $) $.Values.clusterDomain }}</host>
                <port>{{ $.Values.service.ports.tcp }}</port>
            </replica>
            {{- end }}
        </shard>
        {{- end }}
      </default>
    </remote_servers>
    {{- end }}
    {{- if .Values.keeper.enabled }}
    <!-- keeper configuration -->
    <keeper_server>
      {{/*ClickHouse keeper configuration using the helm chart */}}
      <tcp_port>{{ $.Values.containerPorts.keeper }}</tcp_port>
      {{- if .Values.tls.enabled }}
      <tcp_port_secure>{{ $.Values.containerPorts.keeperSecure }}</tcp_port_secure>
      {{- end }}
      <server_id from_env="KEEPER_SERVER_ID"></server_id>
      <log_storage_path>/bitnami/clickhouse/keeper/coordination/log</log_storage_path>
      <snapshot_storage_path>/bitnami/clickhouse/keeper/coordination/snapshots</snapshot_storage_path>
      <coordination_settings>
          <operation_timeout_ms>10000</operation_timeout_ms>
          <session_timeout_ms>30000</session_timeout_ms>
          <raft_logs_level>trace</raft_logs_level>
      </coordination_settings>
      <raft_configuration>
      {{- $nodes := .Values.replicaCount | int }}
      {{- range $node, $e := until $nodes }}
      <server>
        <id>{{ $node | int }}</id>
        <hostname from_env="{{ printf "KEEPER_NODE_%d" $node }}"></hostname>
        <port>{{ $.Values.service.ports.keeperInter }}</port>
      </server>
      {{- end }}
      </raft_configuration>
    </keeper_server>
    {{- end }}
    {{- if or .Values.keeper.enabled .Values.zookeeper.enabled .Values.externalZookeeper.servers }}
    <!-- Zookeeper configuration -->
    <zookeeper>
      {{- if or .Values.keeper.enabled }}
      {{- $nodes := .Values.replicaCount | int }}
      {{- range $node, $e := until $nodes }}
      <node>
        <host from_env="{{ printf "KEEPER_NODE_%d" $node }}"></host>
        <port>{{ $.Values.service.ports.keeper }}</port>
      </node>
      {{- end }}
      {{- else if .Values.zookeeper.enabled }}
      {{/* Zookeeper configuration using the helm chart */}}
      {{- $nodes := .Values.zookeeper.replicaCount | int }}
      {{- range $node, $e := until $nodes }}
      <node>
        <host from_env="{{ printf "KEEPER_NODE_%d" $node }}"></host>
        <port>{{ $.Values.zookeeper.service.ports.client }}</port>
      </node>
      {{- end }}
      {{- else if .Values.externalZookeeper.servers }}
      {{/* Zookeeper configuration using an external instance */}}
      {{- range $node :=.Values.externalZookeeper.servers }}
      <node>
        <host>{{ $node }}</host>
        <port>{{ $.Values.externalZookeeper.port }}</port>
      </node>
      {{- end }}
      {{- end }}
    </zookeeper>
    {{- end }}
    <distributed_ddl>
        <path>/clickhouse/task_queue/ddl</path>
    </distributed_ddl>
    {{- if .Values.tls.enabled }}
    <!-- TLS configuration -->
    <tcp_port_secure from_env="CLICKHOUSE_TCP_SECURE_PORT"></tcp_port_secure>
    <https_port from_env="CLICKHOUSE_HTTPS_PORT"></https_port>
    <openSSL>
        <server>
            {{- $certFileName := default "tls.crt" .Values.tls.certFilename }}
            {{- $keyFileName := default "tls.key" .Values.tls.certKeyFilename }}
            <certificateFile>/bitnami/clickhouse/certs/{{$certFileName}}</certificateFile>
            <privateKeyFile>/bitnami/clickhouse/certs/{{$keyFileName}}</privateKeyFile>
            <verificationMode>none</verificationMode>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
            {{- if or .Values.tls.autoGenerated .Values.tls.certCAFilename }}
            {{- $caFileName := default "ca.crt" .Values.tls.certCAFilename }}
            <caConfig>/bitnami/clickhouse/certs/{{$caFileName}}</caConfig>
            {{- else }}
            <loadDefaultCAFile>true</loadDefaultCAFile>
            {{- end }}
        </server>
        <client>
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
            <verificationMode>none</verificationMode>
            <invalidCertificateHandler>
                <name>AcceptCertificateHandler</name>
            </invalidCertificateHandler>
        </client>
    </openSSL>
    {{- end }}
    {{- if .Values.metrics.enabled }}
     <!-- Prometheus metrics -->
     <prometheus>
        <endpoint>/metrics</endpoint>
        <port from_env="CLICKHOUSE_METRICS_PORT"></port>
        <metrics>true</metrics>
        <events>true</events>
        <asynchronous_metrics>true</asynchronous_metrics>
    </prometheus>
    {{- end }}
    <profiles>
      <default>
        <distributed_ddl_task_timeout>900</distributed_ddl_task_timeout>
      </default>
    </profiles>
  </clickhouse>

extraVolumes:
  - name: clickhouse-client-config
    configMap:
      name: clickhouse-client-config

extraVolumeMounts:
  - name: clickhouse-client-config
    mountPath: /etc/clickhouse-client/

initdbScripts:
  create_bigtable.sh: |
    <init script (not working)>

# TLS configuration

tls:
  enabled: true
  autoGenerated: false
  certificatesSecret: clickhouse-tls-secret
  certFilename: tls.crt
  certKeyFilename: tls.key
  certCAFilename: ca.crt

# Traffic Exposure Parameters

## ClickHouse service parameters

## http: ClickHouse service HTTP port
## https: ClickHouse service HTTPS port
## tcp: ClickHouse service TCP port
## tcpSecure: ClickHouse service TCP (secure) port
## keeper: ClickHouse keeper TCP container port
## keeperSecure: ClickHouse keeper TCP (secure) container port
## keeperInter: ClickHouse keeper interserver TCP container port
## mysql: ClickHouse service MySQL port
## postgresql: ClickHouse service PostgreSQL port
## interserver: ClickHouse service Interserver port
## metrics: ClickHouse service metrics port

service:
  type: LoadBalancer  
  ports:
    https: 443
  loadBalancerIP: "${LOAD_BALANCER_IP}"

## Persistence Parameters

persistence:
  enabled: true
  accessModes:
    - ReadWriteOnce
  size: ${CLICKHOUSE_DATA_VOLUME_SIZE}

## Prometheus metrics

metrics:
  enabled: true
  podAnnotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "{{ .Values.containerPorts.metrics }}"

serviceAccount:
  create: true

What is the expected behavior?

The expected behaviour is normal creation of the tables within the distributed_ddl_task_timeout

What do you see instead?

The table creation (tested with the clickhouse-client command after exec-ing into the pod) is stuck at 49% progress.

CREATE TABLE logs_replicated ON CLUSTER default
(
    `gateway_flow_id` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
PRIMARY KEY gateway_flow_id
ORDER BY gateway_flow_id
SETTINGS index_granularity = 8192

Query id: 2abb72cc-3a48-4416-8e46-f9edbf219463

┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   3 │                1 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   2 │                0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
← Progress: 2.00 rows, 262.00 B (0.07 rows/s., 9.19 B/s.)  49%

When aborted, the table seems to have been created.

SHOW TABLES

Query id: 8be31ab0-5c0c-40a2-a733-8c5bcb57f35f

┌─name────────────┐
│ logs_replicated │
└─────────────────┘

1 row in set. Elapsed: 0.002 sec.

When trying to drop the tables, the same problem occurs:

DROP TABLE logs_replicated ON CLUSTER default

Query id: b0112235-fdca-4334-b116-b68a451e8dba

┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   3 │                0 │
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   2 │                0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
↗ Progress: 2.00 rows, 262.00 B (0.23 rows/s., 30.60 B/s.)  49%

The tables seem to have been created, but the command doesn't finish, therefore I believe Clickhouse-Keeper doesn't answer the command, but executes it.

When trying to create the table again, because I assumed Keeper executed the last command, the command tells me, that the replica already exists, not the table itself. So the problem seems to lay somewhere with the replicas

CREATE TABLE logs_replicated ON CLUSTER default
(
    `gateway_flow_id` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
PRIMARY KEY gateway_flow_id
ORDER BY gateway_flow_id
SETTINGS index_granularity = 8192

Query id: c83fb364-73fa-4dfb-b419-8581628c97fb

┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │    253 │ Code: 253. DB::Exception: Replica /clickhouse/tables/shard1/default/logs_replicated/replicas/clickhouse-replicated-shard1-0 already exists. (REPLICA_ALREADY_EXISTS) (version 23.3.1.2823 (official build)) │                   3 │                0 │
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │    253 │ Code: 253. DB::Exception: Replica /clickhouse/tables/shard1/default/logs_replicated/replicas/clickhouse-replicated-shard1-1 already exists. (REPLICA_ALREADY_EXISTS) (version 23.3.1.2823 (official build)) │                   2 │                0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────┴──────────────────┘
↓ Progress: 2.00 rows, 668.00 B (0.09 rows/s., 31.49 B/s.)  49%

Additional information

No response

The text was updated successfully, but these errors were encountered:

javsalgar · 2023-04-04T08:19:40Z

Hi,

Does the issue happen when using the zookeeper included in the chart? Just to pin-point where the issue could be

marcleibold · 2023-04-04T09:58:47Z

Hi,

I have it configured like this now

keeper:
   enabled: false

zookeeper:
   enabled: true
   replicaCount: 3

And now the command just completes normally

CREATE TABLE logs_replicated ON CLUSTER default
(
    `gateway_flow_id` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
PRIMARY KEY gateway_flow_id
ORDER BY gateway_flow_id
SETTINGS index_granularity = 8192

Query id: 0c7dd092-a396-4fe6-9ca9-0001a867c370

┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard0-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   3 │                0 │
│ clickhouse-replicated-shard0-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   2 │                0 │
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   1 │                0 │
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   0 │                0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

4 rows in set. Elapsed: 0.321 sec.

fmulero · 2023-04-10T11:46:51Z

Thanks @marcleibold for letting us know. Have you faced the issue with the default defaultConfigurationOverrides ? Have you changed that value when moved on to zookeeper?

marcleibold · 2023-04-11T08:39:43Z

Hi @fmulero ,
I did not change anything when I tried it out with zookeeper, so the defaultConfigurationOverrides were still the same as described above.
And when I now try to remove the defaultConfigurationOverrides from the values.yaml completely and try the CREATE TABLE command again, it is again stuck on 49%

CREATE TABLE logs_replicated ON CLUSTER default
(
    `gateway_flow_id` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
PRIMARY KEY gateway_flow_id
ORDER BY gateway_flow_id
SETTINGS index_granularity = 8192

Query id: 3cbb9139-279a-4853-9038-a2208a08444a

┌─host────────────────────────────────────────────────────────────────────────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   3 │                0 │
│ clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local │ 9000 │      0 │       │                   2 │                0 │
└─────────────────────────────────────────────────────────────────────────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
↖ Progress: 2.00 rows, 262.00 B (0.21 rows/s., 27.98 B/s.)  49%

fmulero · 2023-04-12T08:48:14Z

Hi @marcleibold

I've reproduced the same issue in a simpler scenario, just enabling keeper:

helm install myrelease bitnami/clickhouse --set keeper.enabled=true --set zookeeper.enabled=false

I've checked keeper status and it seems there is no active clients (10.42.1.26 is the ip of my pod).

$ echo stat | nc localhost 2181
ClickHouse Keeper version: v23.3.1.2823-testing-46e85357ce2da2a99f56ee83a079e892d7ec3726
Clients:
 10.42.1.26:45740(recved=0,sent=0)
 10.42.1.26:49358(recved=5005,sent=5006)

Latency min/avg/max: 0/0/6
Received: 5005
Sent: 5006
Connections: 1
Outstanding: 0
Zxid: 961
Mode: follower
Node count: 80

It seems something is misconfigured about keeper. I need a further investigation, please bear with us.

roberthorn · 2023-04-12T18:42:40Z

I think the issue may be here, I don't think KEEPER_SERVER_ID is actually set anywhere

marcleibold · 2023-04-13T09:55:56Z

It seems like that is the issue. I also do not see the KEEPER_SERVER_ID when I run set in one of the containers

I have no name!@clickhouse-replicated-shard1-0:/$ set
APP_VERSION=23.3.1
BASH=/bin/bash
BASHOPTS=checkwinsize:cmdhist:complete_fullquote:expand_aliases:extquote:force_fignore:globasciiranges:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
BASH_ALIASES=()
BASH_ARGC=([0]="0")
BASH_ARGV=()
BASH_CMDS=()
BASH_LINENO=()
BASH_SOURCE=()
BASH_VERSINFO=([0]="5" [1]="1" [2]="4" [3]="1" [4]="release" [5]="x86_64-pc-linux-gnu")
BASH_VERSION='5.1.4(1)-release'
BITNAMI_APP_NAME=clickhouse
BITNAMI_DEBUG=false
CLICKHOUSE_ADMIN_PASSWORD=<redacted>
CLICKHOUSE_ADMIN_USER=<redacted>
CLICKHOUSE_HTTPS_PORT=8443
CLICKHOUSE_HTTP_PORT=8123
CLICKHOUSE_INTERSERVER_HTTP_PORT=9009
CLICKHOUSE_KEEPER_INTER_PORT=9444
CLICKHOUSE_KEEPER_PORT=2181
CLICKHOUSE_KEEPER_SECURE_PORT=3181
CLICKHOUSE_METRICS_PORT=8001
CLICKHOUSE_MYSQL_PORT=9004
CLICKHOUSE_POSTGRESQL_PORT=9005
CLICKHOUSE_REPLICATED_PORT=tcp://10.0.46.111:8123
CLICKHOUSE_REPLICATED_PORT_2181_TCP=tcp://10.0.46.111:2181
CLICKHOUSE_REPLICATED_PORT_2181_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_2181_TCP_PORT=2181
CLICKHOUSE_REPLICATED_PORT_2181_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_3181_TCP=tcp://10.0.46.111:3181
CLICKHOUSE_REPLICATED_PORT_3181_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_3181_TCP_PORT=3181
CLICKHOUSE_REPLICATED_PORT_3181_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_443_TCP=tcp://10.0.46.111:443
CLICKHOUSE_REPLICATED_PORT_443_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_443_TCP_PORT=443
CLICKHOUSE_REPLICATED_PORT_443_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_8001_TCP=tcp://10.0.46.111:8001
CLICKHOUSE_REPLICATED_PORT_8001_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_8001_TCP_PORT=8001
CLICKHOUSE_REPLICATED_PORT_8001_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_8123_TCP=tcp://10.0.46.111:8123
CLICKHOUSE_REPLICATED_PORT_8123_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_8123_TCP_PORT=8123
CLICKHOUSE_REPLICATED_PORT_8123_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9000_TCP=tcp://10.0.46.111:9000
CLICKHOUSE_REPLICATED_PORT_9000_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9000_TCP_PORT=9000
CLICKHOUSE_REPLICATED_PORT_9000_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9004_TCP=tcp://10.0.46.111:9004
CLICKHOUSE_REPLICATED_PORT_9004_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9004_TCP_PORT=9004
CLICKHOUSE_REPLICATED_PORT_9004_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9005_TCP=tcp://10.0.46.111:9005
CLICKHOUSE_REPLICATED_PORT_9005_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9005_TCP_PORT=9005
CLICKHOUSE_REPLICATED_PORT_9005_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9009_TCP=tcp://10.0.46.111:9009
CLICKHOUSE_REPLICATED_PORT_9009_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9009_TCP_PORT=9009
CLICKHOUSE_REPLICATED_PORT_9009_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9440_TCP=tcp://10.0.46.111:9440
CLICKHOUSE_REPLICATED_PORT_9440_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9440_TCP_PORT=9440
CLICKHOUSE_REPLICATED_PORT_9440_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_PORT_9444_TCP=tcp://10.0.46.111:9444
CLICKHOUSE_REPLICATED_PORT_9444_TCP_ADDR=10.0.46.111
CLICKHOUSE_REPLICATED_PORT_9444_TCP_PORT=9444
CLICKHOUSE_REPLICATED_PORT_9444_TCP_PROTO=tcp
CLICKHOUSE_REPLICATED_SERVICE_HOST=10.0.46.111
CLICKHOUSE_REPLICATED_SERVICE_PORT=8123
CLICKHOUSE_REPLICATED_SERVICE_PORT_HTTP=8123
CLICKHOUSE_REPLICATED_SERVICE_PORT_HTTPS=443
CLICKHOUSE_REPLICATED_SERVICE_PORT_HTTP_INTERSRV=9009
CLICKHOUSE_REPLICATED_SERVICE_PORT_HTTP_METRICS=8001
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP=9000
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_KEEPER=2181
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_KEEPERINTER=9444
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_KEEPERTLS=3181
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_MYSQL=9004
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_POSTGRESQL=9005
CLICKHOUSE_REPLICATED_SERVICE_PORT_TCP_SECURE=9440
CLICKHOUSE_REPLICA_ID=clickhouse-replicated-shard1-0
CLICKHOUSE_SHARD_ID=shard1
CLICKHOUSE_TCP_PORT=9000
CLICKHOUSE_TCP_SECURE_PORT=9440
CLICKHOUSE_TLS_CA_FILE=/opt/bitnami/clickhouse/certs/ca.crt
CLICKHOUSE_TLS_CERT_FILE=/opt/bitnami/clickhouse/certs/tls.crt
CLICKHOUSE_TLS_KEY_FILE=/opt/bitnami/clickhouse/certs/tls.key
COLUMNS=155
DIRSTACK=()
EUID=1001
GROUPS=()
HISTFILE=//.bash_history
HISTFILESIZE=500
HISTSIZE=500
HOME=/
HOSTNAME=clickhouse-replicated-shard1-0
HOSTTYPE=x86_64
IFS=$' \t\n'
KEEPER_NODE_0=clickhouse-replicated-shard1-0.clickhouse-replicated-headless.default.svc.cluster.local
KEEPER_NODE_1=clickhouse-replicated-shard1-1.clickhouse-replicated-headless.default.svc.cluster.local
KUBERNETES_PORT=tcp://10.0.32.1:443
KUBERNETES_PORT_443_TCP=tcp://10.0.32.1:443
KUBERNETES_PORT_443_TCP_ADDR=10.0.32.1
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_HOST=10.0.32.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443
LINES=17
MACHTYPE=x86_64-pc-linux-gnu
MAILCHECK=60
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu
OS_ARCH=amd64
OS_FLAVOUR=debian-11
OS_NAME=linux
PATH=/opt/bitnami/common/bin:/opt/bitnami/clickhouse/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PIPESTATUS=([0]="1")
PPID=0
PS1='${debian_chroot:+($debian_chroot)}\u@\h:\w\$ '
PS2='> '
PS4='+ '
PWD=/
SHELL=/bin/sh
SHELLOPTS=braceexpand:emacs:hashall:histexpand:history:interactive-comments:monitor
SHLVL=1
TERM=xterm
UID=1001
_=']'
clickhouseCTL_API=3

marcleibold · 2023-04-13T14:32:41Z

Although the variable should be set in this script.

The line also works completely fine as I just tested inside of my container:

I have no name!@clickhouse-replicated-shard1-0:/$ echo $KEEPER_SERVER_ID

I have no name!@clickhouse-replicated-shard1-0:/$ if [[ -f "/bitnami/clickhouse/keeper/data/myid" ]]; then
    export KEEPER_SERVER_ID="$(cat /bitnami/clickhouse/keeper/data/myid)"
else
    HOSTNAME="$(hostname -s)"
    if [[ $HOSTNAME =~ (.*)-([0-9]+)$ ]]; then
        export KEEPER_SERVER_ID=${BASH_REMATCH[2]}
    else
        echo "Failed to get index from hostname $HOST"
        exit 1
fi  fi
I have no name!@clickhouse-replicated-shard1-0:/$ echo $KEEPER_SERVER_ID
0
I have no name!@clickhouse-replicated-shard1-0:/$

The script is also present in the configmap and all, but it is apparently just not executed for some reason.

marcleibold · 2023-04-20T10:35:41Z

Another thing I checked:
since the last line in the script is the following:
exec /opt/bitnami/scripts/clickhouse/entrypoint.sh /opt/bitnami/scripts/clickhouse/run.sh -- --listen_host=0.0.0.0

There should be a process called setup.sh running, after the script is run. (Which is also the case when it is run manually)
This process is not there when I run top, therefore the issue is almost definitely where the script is supposed to get executed

fmulero · 2023-05-02T10:08:01Z

Thanks a lot for all the clues! I did some changes and tests but it is taking me more than expected and I have also some issues with shards. I've just opened an internal task to address it. We will keep you posted on any news.

fmulero · 2023-09-01T14:06:35Z

Sorry, there is no updates on this 😞

exfly · 2023-09-20T10:45:05Z

Any workaround here?

marcleibold · 2023-09-20T10:46:48Z

Any workaround here?

Not as far as I know, just use the built-in Zookeeper

ozeanith · 2023-12-22T15:45:31Z

Is there any update?

fmulero · 2023-12-26T07:33:06Z

Sorry, there is no updates on this. I'll try to bump the priority but we are a small team we can't give you any ETA, sorry.

mike-fischer1 · 2024-02-15T16:37:50Z

Hi this issue is affecting us since we can't switch over to clickhouse-keeper completely and zookeeper isn't officially support by clickhouse anymore.

nikitamikhaylov · 2024-02-28T23:56:43Z

zookeeper isn't officially support by clickhouse anymore.

This is not true. We still support ZooKeeper for the sake of backward compatibility and our users. However, ClickHouse Keeper proved to be much better and we've implemented several extensions which allow us to get better performance in certain scenarios.

mike-fischer1 · 2024-02-29T03:01:56Z

zookeeper isn't officially support by clickhouse anymore.

This is not true. We still support ZooKeeper for the sake of backward compatibility and our users. However, ClickHouse Keeper proved to be much better and we've implemented several extensions which allow us to get better performance in certain scenarios.

We have a support contract with clickhouse and they really want us to use clickhouse-keeper.

mike-fischer1 · 2024-04-12T16:43:25Z

Any updates?

simonfelding · 2024-04-25T11:04:15Z

would like this to be fixed.

fmulero · 2024-04-29T06:36:20Z

I've just bumped the priority

mleklund · 2024-04-30T22:46:36Z

I have been messing with the chart and I am pretty sure the issue is that a set of keeper replicas is created for every shard. Looking over the documentation for shards and for replicas, I believe that all nodes should share a single set of keepers. Now whether the right thing to do is to create a separate statefulset of keepers (which would probably be easiest) or to only point servers to the keepers on shard 0, I will leave up to the maintainers.

pankaj-taneja · 2024-06-06T08:23:24Z

Any release date decided for the fix of this issue?

exfly · 2024-06-25T03:46:35Z

Any updates?

EamonZhang · 2024-06-26T08:53:24Z

I have been messing with the chart and I am pretty sure the issue is that a set of keeper replicas is created for every shard. Looking over the documentation for shards and for replicas, I believe that all nodes should share a single set of keepers. Now whether the right thing to do is to create a separate statefulset of keepers (which would probably be easiest) or to only point servers to the keepers on shard 0, I will leave up to the maintainers.

point servers to the keepers on shard 0 . A temporary solution, easy to modify and works well .

values.yaml

       <node>
-        <host from_env="{{ printf "KEEPER_NODE_%d" $node }}"></host>
+        <host from_env="{{ printf "ZOOKEEPER_NODE_%d" $node }}"></host>
         <port>{{ $.Values.service.ports.keeper }}</port>
       </node>

statefulset.yaml

            {{- if $.Values.keeper.enabled }}
            {{- $replicas := $.Values.replicaCount | int }}
            {{- range $j, $r := until $replicas }}
             - name: {{ printf "KEEPER_NODE_%d" $j }}
               value: {{ printf "%s-shard%d-%d.%s.%s.svc.%s" (include "common.names.fullname" $ ) $i $j (include "clickhouse.headlessServiceName" $) (include "common.names.namespace" $) $.Values.clusterDomain }}
+            - name: {{ printf "ZOOKEEPER_NODE_%d" $j }}
+              value: {{ printf "%s-shard%d-%d.%s.%s.svc.%s" (include "common.names.fullname" $ ) 0 $j (include "clickhouse.headlessServiceName" $) (include "common.names.namespace" $) $.Values.clusterDomain }}
            {{- end }}

marcleibold added the tech-issues The user has a technical issue about an application label Apr 3, 2023

github-actions bot added the triage Triage is needed label Apr 3, 2023

bitnami-bot assigned javsalgar Apr 3, 2023

javsalgar added the clickhouse label Apr 4, 2023

github-actions bot added in-progress and removed triage Triage is needed labels Apr 5, 2023

bitnami-bot assigned fmulero and unassigned javsalgar Apr 5, 2023

github-actions bot added on-hold Issues or Pull Requests with this label will never be considered stale and removed in-progress labels May 2, 2023

github-actions bot added triage Triage is needed and removed on-hold Issues or Pull Requests with this label will never be considered stale labels Aug 22, 2023

github-actions bot added on-hold Issues or Pull Requests with this label will never be considered stale and removed triage Triage is needed labels Sep 1, 2023

github-actions bot added triage Triage is needed and removed on-hold Issues or Pull Requests with this label will never be considered stale labels Sep 20, 2023

github-actions bot added on-hold Issues or Pull Requests with this label will never be considered stale and removed triage Triage is needed labels Sep 27, 2023

mleklund mentioned this issue Apr 29, 2024

Clickhouse keeper is broken in bitnami chart stafftastic/jitsu-chart#19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bitnami/clickhouse] Connection to Clickhouse-Keeper broken/unresponsive #15935

[bitnami/clickhouse] Connection to Clickhouse-Keeper broken/unresponsive #15935

marcleibold commented Apr 3, 2023 •

edited

Loading

javsalgar commented Apr 4, 2023

marcleibold commented Apr 4, 2023

fmulero commented Apr 10, 2023

marcleibold commented Apr 11, 2023

fmulero commented Apr 12, 2023

roberthorn commented Apr 12, 2023

marcleibold commented Apr 13, 2023

marcleibold commented Apr 13, 2023

marcleibold commented Apr 20, 2023

fmulero commented May 2, 2023

fmulero commented Sep 1, 2023

exfly commented Sep 20, 2023

marcleibold commented Sep 20, 2023

ozeanith commented Dec 22, 2023

fmulero commented Dec 26, 2023

mike-fischer1 commented Feb 15, 2024

nikitamikhaylov commented Feb 28, 2024

mike-fischer1 commented Feb 29, 2024 •

edited

Loading

mike-fischer1 commented Apr 12, 2024

simonfelding commented Apr 25, 2024

fmulero commented Apr 29, 2024

mleklund commented Apr 30, 2024

pankaj-taneja commented Jun 6, 2024

exfly commented Jun 25, 2024

EamonZhang commented Jun 26, 2024 •

edited by carrodher

Loading

[bitnami/clickhouse] Connection to Clickhouse-Keeper broken/unresponsive #15935

[bitnami/clickhouse] Connection to Clickhouse-Keeper broken/unresponsive #15935

Comments

marcleibold commented Apr 3, 2023 • edited Loading

Name and Version

What architecture are you using?

What steps will reproduce the bug?

Are you using any custom parameters or values?

What is the expected behavior?

What do you see instead?

Additional information

javsalgar commented Apr 4, 2023

marcleibold commented Apr 4, 2023

fmulero commented Apr 10, 2023

marcleibold commented Apr 11, 2023

fmulero commented Apr 12, 2023

roberthorn commented Apr 12, 2023

marcleibold commented Apr 13, 2023

marcleibold commented Apr 13, 2023

marcleibold commented Apr 20, 2023

fmulero commented May 2, 2023

fmulero commented Sep 1, 2023

exfly commented Sep 20, 2023

marcleibold commented Sep 20, 2023

ozeanith commented Dec 22, 2023

fmulero commented Dec 26, 2023

mike-fischer1 commented Feb 15, 2024

nikitamikhaylov commented Feb 28, 2024

mike-fischer1 commented Feb 29, 2024 • edited Loading

mike-fischer1 commented Apr 12, 2024

simonfelding commented Apr 25, 2024

fmulero commented Apr 29, 2024

mleklund commented Apr 30, 2024

pankaj-taneja commented Jun 6, 2024

exfly commented Jun 25, 2024

EamonZhang commented Jun 26, 2024 • edited by carrodher Loading

marcleibold commented Apr 3, 2023 •

edited

Loading

mike-fischer1 commented Feb 29, 2024 •

edited

Loading

EamonZhang commented Jun 26, 2024 •

edited by carrodher

Loading