Multiple endpoints in ETCDCTL_ENDPOINTS lead to rejected connection from "1.2.3.4:5678" (error "EOF", ServerName "") spam in server log #10391

taladar · 2019-01-09T13:49:38Z

We have a Cluster of 3 etcd nodes and use it from various shell scripts running in systemd services which all use etcdctl. Two nodes run on Debian stretch, the third on Centos 7. All nodes use etcd downloaded directly from Github.

etcdctl version: 3.3.10
API version: 3.3

etcd Version: 3.3.10
Git SHA: 27fc7e2
Go Version: go1.10.4
Go OS/Arch: linux/amd64

A lot of the systemd services check some system parameter (e.g. an IP or whether a service is running) and write data about it to etcd. All of the scripts use leases to ensure no outdated information remains in etcd. Some use put, some use txn with a check whether the key exists followed by put.

etcdctl is configured via environment variables (IPs changed for privacy)

ETCDCTL_API=3
ETCDCTL_CACERT=/etc/etcd/client_certs/ca.crt
ETCDCTL_CERT=/etc/etcd/client_certs/client.crt
ETCDCTL_DIAL_TIMEOUT=3s
ETCDCTL_ENDPOINTS=https://1.2.3.4:2379,https://1.2.3.5:2379,https://1.2.3.6:2379
ETCDCTL_KEY=/etc/etcd/client_keys/client.key
`
etcd is started by a systemd unit we wrote (adjusted heavily)

`
[Unit]
Description=etcd
Documentation=https://github.com/coreos/etcd
Conflicts=etcd2.service

[Service]
Type=notify
Restart=always
RestartSec=5s
LimitNOFILE=40000
TimeoutStartSec=60s
User=etcd
Group=etcd

ExecStart=/opt/etcd/etcd \
    --name node1 \
    --data-dir /var/lib/etcd \
    --listen-client-urls https://1.2.3.4:2379,https://127.0.0.1:2379 \
    --cert-file=/etc/etcd/server_certs/server_client.crt \
    --key-file=/etc/etcd/server_keys/server_client.key \
    --client-cert-auth \
    --trusted-ca-file=/etc/etcd/server_certs/ca.crt \
    --advertise-client-urls https://1.2.3.4:2379 \
    --listen-peer-urls https://1.2.3.4:2380 \
    --peer-cert-file=/etc/etcd/server_certs/server_peer.crt \
    --peer-key-file=/etc/etcd/server_keys/server_peer.key \
    --peer-client-cert-auth \
    --peer-trusted-ca-file=/etc/etcd/server_certs/ca.crt \
    --initial-advertise-peer-urls https://1.2.3.4:2380 \
    --initial-cluster node1=https://1.2.3.4:2380,node2=https://1.2.3.5:2380,node3=https://1.2.3.6:2380 \
    --initial-cluster-token "etcd_cluster1" \
    --initial-cluster-state new \
    --auto-compaction-mode=revision \
    --auto-compaction-retention=1000

[Install]
WantedBy=multi-user.target

We get several messages per second with the IPs of the two nodes where etcdctl processes run (two of the nodes that are part of the cluster)

rejected connection from "1.2.3.4:23456" (error "EOF", ServerName "")
rejected connection from "1.2.3.5:34567" (error "EOF", ServerName "")

as well as occasional (once every couple of seconds)

rejected connection from "1.2.3.4:54321" (error "read tcp 1.2.3.4:2379->1.2.3.4:54321: read: connection reset by peer", ServerName "")

(the first and last IP and port are identical here)

I can't quite pin down what causes this but I strongly assume that it is related to the way etcdctl connects to the cluster when multiple endpoints are supplied. Looking at the debug output of a simple get call

~ # etcdctl --debug get /internet/elected_router                                                                                                                                                                    
ETCDCTL_CACERT=/etc/etcd/client_certs/ca.crt
ETCDCTL_CERT=/etc/etcd/client_certs/client.crt
ETCDCTL_COMMAND_TIMEOUT=5s
ETCDCTL_DEBUG=true
ETCDCTL_DIAL_TIMEOUT=3s
ETCDCTL_DISCOVERY_SRV=
ETCDCTL_ENDPOINTS=[https://1.2.3.4:2379,https://1.2.3.5:2379,https://1.2.3.6:2379]
ETCDCTL_HEX=false
ETCDCTL_INSECURE_DISCOVERY=true
ETCDCTL_INSECURE_SKIP_TLS_VERIFY=false
ETCDCTL_INSECURE_TRANSPORT=true
ETCDCTL_KEEPALIVE_TIME=2s
ETCDCTL_KEEPALIVE_TIMEOUT=6s
ETCDCTL_KEY=/etc/etcd/client_keys/client.key
ETCDCTL_USER=
ETCDCTL_WRITE_OUT=simple
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: updating state and picker called by balancer: IDLE, 0xc4202d2a20
INFO: 2019/01/09 14:42:15 dialing to target with scheme: ""
INFO: 2019/01/09 14:42:15 could not get resolver for scheme: ""
INFO: 2019/01/09 14:42:15 balancerWrapper: is pickfirst: false
INFO: 2019/01/09 14:42:15 balancerWrapper: got update addr from Notify: [{1.2.3.4:2379 <nil>} {1.2.3.5:2379 <nil>} {1.2.3.6:2379 <nil>}]
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: new subconn: [{1.2.3.4:2379 0  <nil>}]
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: new subconn: [{1.2.3.5:2379 0  <nil>}]
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: new subconn: [{1.2.3.6:2379 0  <nil>}]
INFO: 2019/01/09 14:42:15 balancerWrapper: handle subconn state change: 0xc4201986b0, CONNECTING
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc4202d2a20
INFO: 2019/01/09 14:42:15 balancerWrapper: handle subconn state change: 0xc420198700, CONNECTING
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc4202d2a20
INFO: 2019/01/09 14:42:15 balancerWrapper: handle subconn state change: 0xc420198750, CONNECTING
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc4202d2a20
INFO: 2019/01/09 14:42:15 balancerWrapper: handle subconn state change: 0xc420198700, READY
INFO: 2019/01/09 14:42:15 clientv3/balancer: pin "1.2.3.5:2379"
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: updating state and picker called by balancer: READY, 0xc4202d2a20
INFO: 2019/01/09 14:42:15 balancerWrapper: got update addr from Notify: [{1.2.3.5:2379 <nil>}]
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: removing subconn
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: removing subconn
INFO: 2019/01/09 14:42:15 balancerWrapper: handle subconn state change: 0xc4201986b0, SHUTDOWN
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: updating state and picker called by balancer: READY, 0xc4202d2a20
INFO: 2019/01/09 14:42:15 balancerWrapper: handle subconn state change: 0xc420198750, SHUTDOWN
INFO: 2019/01/09 14:42:15 ccBalancerWrapper: updating state and picker called by balancer: READY, 0xc4202d2a20
/internet/elected_router
node1

it looks like etcdctl connects to all three endpoints and only uses one connection, presumably it races them and uses the first one that responds. I suspect the EOF errors in the logs are caused by those two other connection attempts.

Sadly there seems to be no way to suppress the logging of TLS handshake failures like this.

The text was updated successfully, but these errors were encountered:

taladar · 2019-01-09T13:56:17Z

The closed issue (for lack of information from submitter) #10040 seems to show a similar problem.

jsok · 2019-01-15T23:48:11Z

This is a duplicate of #9949

hexfusion · 2019-01-16T14:14:33Z

This is a duplicate of #9949
yeah closing lets track #9949

hexfusion added the area/clientv3 label Jan 13, 2019

hexfusion closed this as completed Jan 16, 2019

jcrowthe mentioned this issue Mar 26, 2019

(error "EOF", ServerName "") error on etcd servers #10587

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple endpoints in ETCDCTL_ENDPOINTS lead to rejected connection from "1.2.3.4:5678" (error "EOF", ServerName "") spam in server log #10391

Multiple endpoints in ETCDCTL_ENDPOINTS lead to rejected connection from "1.2.3.4:5678" (error "EOF", ServerName "") spam in server log #10391

taladar commented Jan 9, 2019 •

edited

Loading

taladar commented Jan 9, 2019

jsok commented Jan 15, 2019

hexfusion commented Jan 16, 2019

Multiple endpoints in ETCDCTL_ENDPOINTS lead to rejected connection from "1.2.3.4:5678" (error "EOF", ServerName "") spam in server log #10391

Multiple endpoints in ETCDCTL_ENDPOINTS lead to rejected connection from "1.2.3.4:5678" (error "EOF", ServerName "") spam in server log #10391

Comments

taladar commented Jan 9, 2019 • edited Loading

taladar commented Jan 9, 2019

jsok commented Jan 15, 2019

hexfusion commented Jan 16, 2019

taladar commented Jan 9, 2019 •

edited

Loading