You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a Cluster of 3 etcd nodes and use it from various shell scripts running in systemd services which all use etcdctl. Two nodes run on Debian stretch, the third on Centos 7. All nodes use etcd downloaded directly from Github.
etcdctl version: 3.3.10
API version: 3.3
etcd Version: 3.3.10
Git SHA: 27fc7e2
Go Version: go1.10.4
Go OS/Arch: linux/amd64
A lot of the systemd services check some system parameter (e.g. an IP or whether a service is running) and write data about it to etcd. All of the scripts use leases to ensure no outdated information remains in etcd. Some use put, some use txn with a check whether the key exists followed by put.
etcdctl is configured via environment variables (IPs changed for privacy)
We get several messages per second with the IPs of the two nodes where etcdctl processes run (two of the nodes that are part of the cluster)
rejected connection from "1.2.3.4:23456" (error "EOF", ServerName "")
rejected connection from "1.2.3.5:34567" (error "EOF", ServerName "")
as well as occasional (once every couple of seconds)
rejected connection from "1.2.3.4:54321" (error "read tcp 1.2.3.4:2379->1.2.3.4:54321: read: connection reset by peer", ServerName "")
(the first and last IP and port are identical here)
I can't quite pin down what causes this but I strongly assume that it is related to the way etcdctl connects to the cluster when multiple endpoints are supplied. Looking at the debug output of a simple get call
~ # etcdctl --debug get /internet/elected_router
ETCDCTL_CACERT=/etc/etcd/client_certs/ca.crt
ETCDCTL_CERT=/etc/etcd/client_certs/client.crt
ETCDCTL_COMMAND_TIMEOUT=5s
ETCDCTL_DEBUG=true
ETCDCTL_DIAL_TIMEOUT=3s
ETCDCTL_DISCOVERY_SRV=
ETCDCTL_ENDPOINTS=[https://1.2.3.4:2379,https://1.2.3.5:2379,https://1.2.3.6:2379]
ETCDCTL_HEX=false
ETCDCTL_INSECURE_DISCOVERY=true
ETCDCTL_INSECURE_SKIP_TLS_VERIFY=false
ETCDCTL_INSECURE_TRANSPORT=true
ETCDCTL_KEEPALIVE_TIME=2s
ETCDCTL_KEEPALIVE_TIMEOUT=6s
ETCDCTL_KEY=/etc/etcd/client_keys/client.key
ETCDCTL_USER=
ETCDCTL_WRITE_OUT=simple
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: updating state and picker called by balancer: IDLE, 0xc4202d2a20
INFO: 2019/01/09 14:42:15 dialing to target with scheme: ""
INFO: 2019/01/09 14:42:15 could not get resolver for scheme: ""
INFO: 2019/01/09 14:42:15 balancerWrapper: is pickfirst: false
INFO: 2019/01/09 14:42:15 balancerWrapper: got update addr from Notify: [{1.2.3.4:2379 <nil>} {1.2.3.5:2379 <nil>} {1.2.3.6:2379 <nil>}]
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: new subconn: [{1.2.3.4:2379 0 <nil>}]
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: new subconn: [{1.2.3.5:2379 0 <nil>}]
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: new subconn: [{1.2.3.6:2379 0 <nil>}]
INFO: 2019/01/09 14:42:15 balancerWrapper: handle subconn state change: 0xc4201986b0, CONNECTING
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc4202d2a20
INFO: 2019/01/09 14:42:15 balancerWrapper: handle subconn state change: 0xc420198700, CONNECTING
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc4202d2a20
INFO: 2019/01/09 14:42:15 balancerWrapper: handle subconn state change: 0xc420198750, CONNECTING
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc4202d2a20
INFO: 2019/01/09 14:42:15 balancerWrapper: handle subconn state change: 0xc420198700, READY
INFO: 2019/01/09 14:42:15 clientv3/balancer: pin "1.2.3.5:2379"
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: updating state and picker called by balancer: READY, 0xc4202d2a20
INFO: 2019/01/09 14:42:15 balancerWrapper: got update addr from Notify: [{1.2.3.5:2379 <nil>}]
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: removing subconn
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: removing subconn
INFO: 2019/01/09 14:42:15 balancerWrapper: handle subconn state change: 0xc4201986b0, SHUTDOWN
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: updating state and picker called by balancer: READY, 0xc4202d2a20
INFO: 2019/01/09 14:42:15 balancerWrapper: handle subconn state change: 0xc420198750, SHUTDOWN
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: updating state and picker called by balancer: READY, 0xc4202d2a20
/internet/elected_router
node1
it looks like etcdctl connects to all three endpoints and only uses one connection, presumably it races them and uses the first one that responds. I suspect the EOF errors in the logs are caused by those two other connection attempts.
Sadly there seems to be no way to suppress the logging of TLS handshake failures like this.
The text was updated successfully, but these errors were encountered:
We have a Cluster of 3 etcd nodes and use it from various shell scripts running in systemd services which all use etcdctl. Two nodes run on Debian stretch, the third on Centos 7. All nodes use etcd downloaded directly from Github.
A lot of the systemd services check some system parameter (e.g. an IP or whether a service is running) and write data about it to etcd. All of the scripts use leases to ensure no outdated information remains in etcd. Some use put, some use txn with a check whether the key exists followed by put.
etcdctl is configured via environment variables (IPs changed for privacy)
We get several messages per second with the IPs of the two nodes where etcdctl processes run (two of the nodes that are part of the cluster)
as well as occasional (once every couple of seconds)
(the first and last IP and port are identical here)
I can't quite pin down what causes this but I strongly assume that it is related to the way etcdctl connects to the cluster when multiple endpoints are supplied. Looking at the debug output of a simple get call
it looks like etcdctl connects to all three endpoints and only uses one connection, presumably it races them and uses the first one that responds. I suspect the EOF errors in the logs are caused by those two other connection attempts.
Sadly there seems to be no way to suppress the logging of TLS handshake failures like this.
The text was updated successfully, but these errors were encountered: