Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

embedded etcd: dropped internal Raft message since sending buffer is full (overloaded network) #3511

Closed
iameli opened this issue Jun 24, 2021 · 2 comments

Comments

@iameli
Copy link

iameli commented Jun 24, 2021

Environmental Info:
K3s Version:

# k3s -v
k3s version v1.20.6+k3s1 (8d043282)
go version go1.15.10

Node(s) CPU architecture, OS, and Version:
Linux dp4605 5.4.0-72-generic #80-Ubuntu SMP Mon Apr 12 17:35:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
Three servers, zero agents. Using embedded etcd, wireguard backend, embedded containerd runtime.

Describe the bug:
One of the servers in the cluster had a hardware failure. Since that happened, this message has been spammed over and over to the k3s journald logs on one of the two remaining machines:

Jun 24 18:20:04 dp4605 k3s[855297]: {"level":"warn","ts":"2021-06-24T18:20:04.739Z","caller":"rafthttp/peer.go:267","msg":"dropped internal Raft message since sending buffer is full (overloaded network)","message-type":"MsgHeartbeat","local-member-id":"d1c004e84980efe9","from":"d1c004e84980efe9","remote-peer-id":"865cdac463481cfd","remote-peer-active":false}

Digging through others of these looking for other pertinent messages... there's this, which makes sense considering the node is down:

Jun 24 18:20:03 dp4605 k3s[855297]: {"level":"warn","ts":"2021-06-24T18:20:03.847Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"865cdac463481cfd","rtt":"0s","error":"dial tcp REDACTED:2380: connect: no route to host"}
Jun 24 18:20:03 dp4605 k3s[855297]: {"level":"warn","ts":"2021-06-24T18:20:03.850Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"865cdac463481cfd","rtt":"0s","error":"dial tcp REDACTED:2380: connect: no route to host"}

The other

Steps To Reproduce:

Unsure yet. Presumably this happens when a node gets taken out of the cluster in some kind of unhealthy way?

@brandond
Copy link
Member

These messages are coming from the embedded etcd; there's not really any way to turn them off. You will see these messages until the node comes back up, or is deleted from the cluster.

@stale
Copy link

stale bot commented Dec 21, 2021

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Dec 21, 2021
@stale stale bot closed this as completed Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants