Communication between pods in different nodes not working (Fedora IoT 31, HyperV VMs) #2049

dnoliver · 2020-07-22T18:25:44Z

Environmental Info:
K3s Version:

[root@first ~]# k3s -v
k3s version v1.18.6+k3s1 (6f56fa1d)

Node(s) CPU architecture, OS, and Version:

[root@first ~]# uname -a
Linux first.mshome.net 5.5.17-200.fc31.x86_64 #1  SMP Mon Apr 13 15:29:42 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

[root@second test]# uname -a
Linux second.mshome.net 5.5.17-200.fc31.x86_64 rancher/k3s#1 SMP Mon Apr 13 15:29:42 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

2 VMs running Fedora IoT 31
1 VM (first.mshome.net) is the primary
1 VM (second.mshome.net) is the secondary

Describe the bug:

Running the Kubernetes Basic sample to test my deployment.

The hosts were deployed following the Kubernetes on Fedora IoT with k3s post

The primary is deployed with:

curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_START=true sh -

The secondary is deployed with:

curl -sfL https://get.k3s.io | K3S_URL=https://first.mshome.net:6443 INSTALL_K3S_SKIP_START=true K3S_TOKEN=K10977bcc7b0cf459d06084e75a4055c4899c4c1a83f9b7df59f6b1565e95383821::server:183f8d6761f5ea728f8b15142b0c43d4 sh -

After everything is started, nodes are up and running

[root@first ~]# kubectl get nodes
NAME                STATUS   ROLES    AGE     VERSION
first.mshome.net    Ready    master   3h30m   v1.18.6+k3s1
second.mshome.net   Ready    <none>   104m    v1.18.6+k3s1

After following the tutorial, pods are running:

NAME                                   READY   STATUS    RESTARTS   AGE
kubernetes-bootcamp-6f6656d949-p7l8f   1/1     Running   2          89m
kubernetes-bootcamp-6f6656d949-dnsjk   1/1     Running   2          102m
kubernetes-bootcamp-6f6656d949-sffrn   1/1     Running   2          89m
kubernetes-bootcamp-6f6656d949-89ckv   1/1     Running   2          89m

Getting the pods ip addresses:

[root@first ~]# kubectl get pods -l app=kubernetes-bootcamp -o go-template='{{range .items}}{{.status.podIP}}{{"\n"}}{{end}}'
10.42.0.28
10.42.0.31
10.42.1.13
10.42.1.11

Start a test container for pinging pods:

[root@first ~]# kubectl run -it --rm --restart=Never alpine --image=alpine sh
If you don't see a command prompt, try pressing enter.
/ #

The sample app publishes the 8080 port. But I can only wget it from the pods running in the same host:

/ # wget -qO- --timeout 5 10.42.0.28:8080
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-p7l8f | v=1
/ # wget -qO- --timeout 5 10.42.0.31:8080
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-dnsjk | v=1
/ # wget -qO- --timeout 5 10.42.1.13:8080
wget: download timed out
/ # wget -qO- --timeout 5 10.42.1.11:8080
wget: download timed out

I have seen several similar issues, and I have tried some of the firewalld commands posted:

[root@first ~]# history | grep firewall-cmd
  100  firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -i cni0 -j ACCEPT
  101  firewall-cmd --reload
  111  firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
  112  firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT
  113  firewall-cmd --reload

This was done in both nodes (primary and secondary), and tried several reboots, and the issue is still there.

One thing that I did not see in any issue is that nmcli is telling me that flannel is not running.

[root@first ~]# nmcli 
eth0: connected to eth0
        "The Linux Foundation Microsoft Hyper-V"
        ethernet (hv_netvsc), 00:15:5D:E5:22:37, hw, mtu 1500
        ip4 default
        inet4 172.17.110.196/28
        route4 0.0.0.0/0
        route4 172.17.110.192/28
        inet6 fe80::361c:2bb7:30ee:4a9d/64
        route6 fe80::/64
        route6 ff00::/8

cni0: connected to cni0
        "cni0"
        bridge, 32:CA:5A:20:1E:B1, sw, mtu 1450
        inet4 10.42.0.1/24
        route4 10.42.0.0/16
        inet6 fe80::30ca:5aff:fe20:1eb1/64
        route6 fe80::/64
        route6 ff00::/8

flannel.1: disconnected
        "flannel.1"
        vxlan, B2:FF:73:E6:CF:A3, sw, mtu 1450

Also, I do not have flanneld. I am not sure if I need it, though.

[root@first ~]# systemctl status flanneld
Unit flanneld.service could not be found.

Steps To Reproduce:

Installed K3s: described above

Expected behavior:

wget -qO- --timeout 5 10.42.1.11:8080 should not fail.

Actual behavior:

wget -qO- --timeout 5 10.42.1.11:8080 fails.

Additional context / logs:

Additionally, I can do several kubernetes related operations without problems: drain, uncordon, update, rollaback, service creation, everything works. But when 2 pods on different hosts try to talk, or when I use a service to load balance the request, things start failing

The text was updated successfully, but these errors were encountered:

brandond · 2020-07-22T21:58:27Z

Flannel is built in to k3s, you will not see a separate service for it.

Can you try stopping k3s, stopping firewalld (or any other iptables-based firewall), and then starting k3s - just to ensure that it's not conflicting with something else?

dnoliver · 2020-07-22T22:04:34Z

stopped firewalld in both primary and secondary and restarted k3s and k3s-agent

still same result

[root@first ~]# systemctl status firewalld.service 
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Wed 2020-07-22 15:00:53 PDT; 1min 35s ago
     Docs: man:firewalld(1)
  Process: 815 ExecStart=/usr/sbin/firewalld --nofork --nopid $FIREWALLD_ARGS (code=exited, status=0/SUCCESS)
 Main PID: 815 (code=exited, status=0/SUCCESS)

Jul 22 09:54:04 first.mshome.net systemd[1]: Starting firewalld - dynamic firewall daemon...
Jul 22 09:54:05 first.mshome.net systemd[1]: Started firewalld - dynamic firewall daemon.
Jul 22 15:00:52 first.mshome.net systemd[1]: Stopping firewalld - dynamic firewall daemon...
Jul 22 15:00:53 first.mshome.net systemd[1]: firewalld.service: Succeeded.
Jul 22 15:00:53 first.mshome.net systemd[1]: Stopped firewalld - dynamic firewall daemon.

[root@first ~]# kubectl get pods -l app=kubernetes-bootcamp -o go-template='{{range .items}}{{.status.podIP}}{{"\n"}}{{end}}'
10.42.1.13
10.42.1.11
10.42.0.28
10.42.0.31

[root@first ~]# kubectl run -it --rm --restart=Never alpine --image=alpine sh
If you don't see a command prompt, try pressing enter.
/ # wget -qO- --timeout 5 10.42.1.13:8080
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-sffrn | v=1
/ # wget -qO- --timeout 5 10.42.0.28:8080
wget: download timed out

dnoliver · 2020-07-23T16:47:25Z

Additional information:

Using the Kube Proxy, I can communicate with all the pods:

[root@first ~]# kubectl proxy   
Starting to serve on 127.0.0.1:8001

[root@first ~]# kubectl get pods
NAME                                   READY   STATUS    RESTARTS   AGE
kubernetes-bootcamp-6f6656d949-lqkgk   1/1     Running   1          13h
kubernetes-bootcamp-6f6656d949-dnsjk   1/1     Running   3          24h
kubernetes-bootcamp-6f6656d949-p7l8f   1/1     Running   3          24h
kubernetes-bootcamp-6f6656d949-z4hkc   1/1     Running   1          13h

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-lqkgk:8080/proxy/                                                                                   
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-lqkgk | v=1

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-dnsjk:8080/proxy/                                                                                   
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-dnsjk | v=1

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-p7l8f:8080/proxy/                                                                                   
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-p7l8f | v=1

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-z4hkc:8080/proxy/                                                                                   
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-z4hkc | v=1

brandond · 2020-07-23T23:01:15Z

FWIW, I tried this on Fedora IoT 32 and wasn't even able to get it to start due to changes to kernel cgroups. I'll try again on 31.

dnoliver · 2020-07-23T23:39:48Z

About the SELinux comment that I don't see now: I am installing the following:

rpm-ostree install --reboot https://rpm.rancher.io/k3s-selinux-0.1.1-rc1.el7.noarch.rpm

I have this SELinux packages installed:

[root@first ~]# rpm -qa selinux*
selinux-policy-3.14.4-50.fc31.noarch
selinux-policy-targeted-3.14.4-50.fc31.noarch

[root@first ~]# rpm -qa *-selinux
k3s-selinux-0.1.1-rc1.el7.noarch
rpm-plugin-selinux-4.15.1-1.fc31.x86_64
container-selinux-2.124.0-3.fc31.noarch
cockpit-selinux-220-1.fc31.noarch

Also, after installing k3s, I need to do a restorecon on the created directory, otherwise the policy throws errors:

$ restorecon -R /var/lib/rancher

And also, for testing purposes, I am running with the container_t context as permissive:

[root@first ~]# semodule -l | grep permissive
permissive_container_t
permissivedomains

For the CGRoups problem, you are right, you need to downgrade to CGRoups v1 in F32 to make any container system to work. This is my kernel command line:

[root@first ~]# cat /proc/cmdline 
BOOT_IMAGE=(hd0,gpt2)/ostree/fedora-iot-108a44c2f0a50881cdd8c62efa9680697e3ae3eca304c727a89507ba1e53e219/vmlinuz-5.5.17-200.fc31.x86_64 ima_policy=tcb user_namespace.enable=1 systemd.unified_cgroup_hierarchy=0 lockdown=confidentiality resume=/dev/mapper/system-swap rd.lvm.lv=system/root rd.lvm.lv=system/swap rd.shell=0 root=/dev/mapper/system-root ostree=/ostree/boot.1/fedora-iot/108a44c2f0a50881cdd8c62efa9680697e3ae3eca304c727a89507ba1e53e219/0

brandond · 2020-07-23T23:56:14Z

Thanks - I hadn't gotten that far yet since I just had a spare moment to test. I didn't see mention of those steps on the fedoramagazine post which is odd since it won't work at all otherwise.

I was able to reproduce this, but I don't have any idea off the top of my head what it might be.

brandond · 2020-07-24T04:51:53Z

I tried restarting the k3s server node with --flanel-backend=host-gw and everything works, so I suspect it is something related to vxlan.

There was a kernel vxlan issue that was triggered by kube-proxy's iptables rules, but that was supposed to have been fixed (and I confirmed the fix on several different system) as of the most recent releases of 1.16/1.17/1.18, so I'm not sure what it might be.

In the mean time, you can use host-gw instead of vxlan.

dnoliver · 2020-07-24T14:58:38Z

Updated k3s service to use host-gw backend:

[root@first ~]# systemctl cat k3s.service 
# /etc/systemd/system/k3s.service
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
Type=notify
EnvironmentFile=/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
    server \
    --flannel-backend=host-gw

Restarted/rebooted primary

In the secondaries, I edited the following file to use host-gw. Is this necesary?

[root@second agent]# cat /var/lib/rancher/k3s/agent/etc/flannel/net-conf.json 
{
        "Network": "10.42.0.0/16",
        "Backend": {
        "Type": "host-gw"
}
}

Restarted the secondaries. At this point, I could communicate with the pod running in one of the secondaries, but not the other one (freshly deployed). After typing random commands in the non-working secondary, I applied the firewall-cmd customization:

firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -i cni0 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT
firewall-cmd --reload

And then, everything works! I can talk with my pods :)

After that, I tested the same with a service:

[root@first ~]# kubectl expose deployment/kubernetes-bootcamp --type="NodePort" --port 8080

[root@first ~]# kubectl get services
NAME                  TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)          AGE
kubernetes            ClusterIP   10.43.0.1    <none>        443/TCP          2d
kubernetes-bootcamp   NodePort    10.43.6.56   <none>        8080:31175/TCP   96s

And from the alpine pod:

[root@first ~]# kubectl run -it --rm --restart=Never alpine --image=alpine sh                                                                                                                                  
If you don't see a command prompt, try pressing enter.

/ # wget -qO- 10.43.6.56:8080
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-vm7p7 | v=1
/ # wget -qO- 10.43.6.56:8080                      
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-mslrq | v=1
/ # wget -qO- 10.43.6.56:8080                      
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-zgg82 | v=1
/ # wget -qO- 10.43.6.56:8080                      
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-pvc4k | v=1
/ # wget -qO- 10.43.6.56:8080                      
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-mslrq | v=1
/ # wget -qO- 10.43.6.56:8080                      
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-vm7p7 | v=1

NodePort Service working fine!

Follow up questions:

Do I need to modify /var/lib/rancher/k3s/agent/etc/flannel/net-conf.json in the secondaries as well to enable host-gw?
Are the 3 firewall-cmd customization necessary?
Based on the Supported Backends docs, I have the feeling that host-gw will work fine (if not better) for my use case, which are several hosts communicated in a LAN via ethernet or wifi. In which case I would need vxlan? For example, If I want one of my pods to move from my local nodes to a cloud host?

Thank you!

brandond · 2020-07-24T17:08:29Z

You only need to set --flannel-backend=host-gw on the server, since the flannel configuration is part of the kubelet config distributed to agents by the servers. You will need to restart the servers first, and then the agents, to pull in the change. I generally run with firewalld off but I suspect the the additional rules will still be necessary.

dnoliver · 2020-07-24T18:05:57Z

Thanks!

I have updated my kickstart configuration to apply this changes, and re-deploy my cluster from scratch.
NOTE: All the nodes use a partition mounted in /var/lib/rancher

My primary is deployed with:

curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_START=true sh -
semanage permissive -a container_t
restorecon -R /var/lib/rancher
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -i cni0 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT

mkdir -p /etc/systemd/system/k3s.service.d/
cat > /etc/systemd/system/k3s.service.d/override.conf << EOA
[Unit]
After=var-lib-rancher.mount

[Service]
ExecStart=
ExecStart=/usr/local/bin/k3s server --flannel-backend=host-gw
EOA

The empty ExecStart line is mandatory, otherwise systemd refuse to run the service because of bad configuration (something like "service of type oneshot cannot have multiple ExecStart")

The secondaries are deployed with:

curl -sfL https://get.k3s.io | K3S_URL=https://first.mshome.net:6443 INSTALL_K3S_SKIP_START=true \
    K3S_TOKEN=<TOKEN> sh -
semanage permissive -a container_t
restorecon -R /var/lib/rancher
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 0 -i cni0 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 1 -s 10.42.0.0/15 -j ACCEPT

mkdir -p /etc/systemd/system/k3s-agent.service.d/
cat > /etc/systemd/system/k3s-agent.service.d/override.conf << EOA
[Unit]
After=var-lib-rancher.mount
EOA

And the /var/lib/rancher/k3s/agent/etc/flannel/net-conf.json is automatically populated with "host-gw" :)

After this new deployment, I have executed this tests to check connectivity:

# Deployment
kubectl create deployment kubernetes-bootcamp --image=gcr.io/google-samples/kubernetes-bootcamp:v1
kubectl scale deployment kubernetes-bootcamp --replicas=4

# Communication between pods
kubectl get pods -l app=kubernetes-bootcamp -o go-template='{{range .items}}{{.status.podIP}}{{"\n"}}{{end}}'
kubectl run -it --rm --restart=Never alpine --image=alpine sh
for i in <POD 1 IP> <POD 2 IP> <POD 3 IP> <POD 4 IP>; do wget -qO- $i:8080; done

# Communication with Service from inside pod
kubectl expose deployment/kubernetes-bootcamp --type="NodePort" --port 8080
kubectl get service kubernetes-bootcamp
kubectl run -it --rm --restart=Never alpine --image=alpine sh
wget -qO- <SERVICE IP>:8080

# Communication with Kube Proxy
kubectl get pods
kubectl proxy
curl http://localhost:8001/api/v1/namespaces/default/pods/<POD NAME>:8080/proxy/

# Communication with Service using Kube Proxy
kubectl expose deployment/kubernetes-bootcamp --type="NodePort" --port 8080
kubectl get service kubernetes-bootcamp
curl localhost:8001/api/v1/namespaces/default/services/kubernetes-bootcamp:8080/proxy/

Communication between pods

Works!

[root@first ~]# kubectl get pods -l app=kubernetes-bootcamp -o go-template='{{range .items}}{{.status.podIP}}{{"\n"}}{{end}}'
10.42.0.8
10.42.0.9
10.42.1.3
10.42.1.4

[root@first ~]# kubectl run -it --rm --restart=Never alpine --image=alpine sh
If you don't see a command prompt, try pressing enter.
/ # for i in 10.42.0.8 10.42.0.9 10.42.1.3 10.42.1.4; do wget -qO- $i:8080; done
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-6gnjj | v=1
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-txbh8 | v=1
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-lh7s4 | v=1
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-rtcz4 | v=1
/ # exit
pod "alpine" deleted

Communication with Service from inside pod

Works!

[root@first ~]# kubectl expose deployment/kubernetes-bootcamp --type="NodePort" --port 8080
service/kubernetes-bootcamp exposed

[root@first ~]# kubectl get service kubernetes-bootcamp
NAME                  TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
kubernetes-bootcamp   NodePort   10.43.187.101   <none>        8080:31812/TCP   5s

[root@first ~]# kubectl run -it --rm --restart=Never alpine --image=alpine sh
If you don't see a command prompt, try pressing enter.
/ # wget -qO- 10.43.187.101:8080
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-txbh8 | v=1
/ # wget -qO- 10.43.187.101:8080
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-lh7s4 | v=1
/ # wget -qO- 10.43.187.101:8080
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-txbh8 | v=1
/ # wget -qO- 10.43.187.101:8080
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-6gnjj | v=1

Communication with Kube Proxy

Works only for pods in the primary. This was working before as in #2049 (comment).

[root@first ~]# kubectl get pods
NAME                                   READY   STATUS    RESTARTS   AGE
kubernetes-bootcamp-6f6656d949-6gnjj   1/1     Running   0          30m
kubernetes-bootcamp-6f6656d949-txbh8   1/1     Running   0          30m
kubernetes-bootcamp-6f6656d949-lh7s4   1/1     Running   0          30m
kubernetes-bootcamp-6f6656d949-rtcz4   1/1     Running   0          30m

[root@first ~]# kubectl proxy &
[1] 23995

[root@first ~]# Starting to serve on 127.0.0.1:8001

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-6gnjj:8080/proxy/
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-6gnjj | v=1

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-txbh8:8080/proxy/
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-txbh8 | v=1

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-lh7s4:8080/proxy/
Error trying to reach service: 'dial tcp 10.42.1.3:8080: connect: no route to host' 

[root@first ~]# curl http://localhost:8001/api/v1/namespaces/default/pods/kubernetes-bootcamp-6f6656d949-rtcz4:8080/proxy/
Error trying to reach service: 'dial tcp 10.42.1.4:8080: connect: no route to host'

[root@first ~]# kill 23995

Communication with Service using Kube Proxy

Only working for pods located in the same node as the service:

[root@first ~]# kubectl expose deployment/kubernetes-bootcamp --type="NodePort" --port 8080
service/kubernetes-bootcamp exposed

[root@first ~]# kubectl get service kubernetes-bootcamp
NAME                  TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
kubernetes-bootcamp   NodePort   10.43.62.217   <none>        8080:30303/TCP   4s

[root@first ~]# kubectl proxy &
[1] 24592
[root@first ~]# Starting to serve on 127.0.0.1:8001

[root@first ~]# curl localhost:8001/api/v1/namespaces/default/services/kubernetes-bootcamp:8080/proxy/
Error trying to reach service: 'dial tcp 10.42.1.4:8080: connect: no route to host' 

[root@first ~]# curl localhost:8001/api/v1/namespaces/default/services/kubernetes-bootcamp:8080/proxy/
Error trying to reach service: 'dial tcp 10.42.1.3:8080: connect: no route to host' 

[root@first ~]# curl localhost:8001/api/v1/namespaces/default/services/kubernetes-bootcamp:8080/proxy/
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-txbh8 | v=1

[root@first ~]# curl localhost:8001/api/v1/namespaces/default/services/kubernetes-bootcamp:8080/proxy/
Hello Kubernetes bootcamp! | Running on: kubernetes-bootcamp-6f6656d949-6gnjj | v=1

So, pod-to-pod networking and pod-to-service-to-pod is working fine apparently. This was the reported issue, and seems to be solved by using host-gw and the firewall rules

Kube Proxy Networking seems to have additional problems. I could create a new issue for that one, or continue here, whatever you prefer!

brandond · 2020-07-24T21:43:20Z

Lets do another issue for that one.

dnoliver · 2020-07-27T20:29:52Z

Done! so the state of this issue is:

VXLAN does not work in this setup
Use HOST GW as a workaround
Kube Proxy problems reported another issue Kube Proxy cannot route traffic to pods running on different nodes (Fedora IoT 31) #2069

caroline-suse-rancher · 2023-04-18T15:33:13Z

Closing due to age, and seems to be a relatively isolated incident

brandond added the [zube]: To Verify label Jul 23, 2020

dnoliver mentioned this issue Jul 27, 2020

Kube Proxy cannot route traffic to pods running on different nodes (Fedora IoT 31) #2069

Closed

brandond mentioned this issue Aug 8, 2020

k3s fails and restarts, Fedora 32 #2105

Closed

davidnuzik removed the [zube]: To Verify label Feb 20, 2021

caroline-suse-rancher added this to K3s Development Jan 19, 2023

caroline-suse-rancher moved this to To Triage in K3s Development Jan 19, 2023

caroline-suse-rancher closed this as completed Apr 18, 2023

github-project-automation bot moved this from No Status to Done Issue in K3s Development Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Communication between pods in different nodes not working (Fedora IoT 31, HyperV VMs) #2049

Communication between pods in different nodes not working (Fedora IoT 31, HyperV VMs) #2049

dnoliver commented Jul 22, 2020 •

edited by brandond

Loading

brandond commented Jul 22, 2020

dnoliver commented Jul 22, 2020

dnoliver commented Jul 23, 2020

brandond commented Jul 23, 2020 •

edited

Loading

dnoliver commented Jul 23, 2020

brandond commented Jul 23, 2020 •

edited

Loading

brandond commented Jul 24, 2020 •

edited

Loading

dnoliver commented Jul 24, 2020

brandond commented Jul 24, 2020

dnoliver commented Jul 24, 2020

brandond commented Jul 24, 2020

dnoliver commented Jul 27, 2020

caroline-suse-rancher commented Apr 18, 2023

Communication between pods in different nodes not working (Fedora IoT 31, HyperV VMs) #2049

Communication between pods in different nodes not working (Fedora IoT 31, HyperV VMs) #2049

Comments

dnoliver commented Jul 22, 2020 • edited by brandond Loading

brandond commented Jul 22, 2020

dnoliver commented Jul 22, 2020

dnoliver commented Jul 23, 2020

brandond commented Jul 23, 2020 • edited Loading

dnoliver commented Jul 23, 2020

brandond commented Jul 23, 2020 • edited Loading

brandond commented Jul 24, 2020 • edited Loading

dnoliver commented Jul 24, 2020

brandond commented Jul 24, 2020

dnoliver commented Jul 24, 2020

Communication between pods

Communication with Service from inside pod

Communication with Kube Proxy

Communication with Service using Kube Proxy

brandond commented Jul 24, 2020

dnoliver commented Jul 27, 2020

caroline-suse-rancher commented Apr 18, 2023

dnoliver commented Jul 22, 2020 •

edited by brandond

Loading

brandond commented Jul 23, 2020 •

edited

Loading

brandond commented Jul 23, 2020 •

edited

Loading

brandond commented Jul 24, 2020 •

edited

Loading