Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreDNS name resolution fails, NodePort is not available on nodes without the pod #4588

Closed
VladimirKorzh opened this issue Nov 26, 2021 · 2 comments

Comments

@VladimirKorzh
Copy link

VladimirKorzh commented Nov 26, 2021

Environmental Info:
K3s Version:

k3s version v1.21.5+k3s2 (724ef70)
go version go1.16.8

Node(s) CPU architecture, OS, and Version:

Linux vova-test 5.4.0-88-generic #99-Ubuntu SMP Thu Sep 23 17:29:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

Looking for a way to orchestrate a fleet of odroid. Trying to set up a webapi that will make requests to payload running on odroid.
In this example I've used two cloud machine from equinix (bare metal), personal VM Ware virtual machine and an odroid XU4

Describe the bug:

  1. nslookup kubernetes.cluster fails from nodes that don't have coredns running
  2. Cannot access nodeport services on nodes that don't have them running
  3. it appear as if some overlay network issue.
  4. spent 24h+ on trying.
  5. tried k8s rancher as well, same issue.
  6. Tried the following IP Tables fixes
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
sudo update-alternatives --set arptables /usr/sbin/arptables-legacy
sudo update-alternatives --set ebtables /usr/sbin/ebtables-legacy
sudo iptables -P FORWARD ACCEPT

image

Steps To Reproduce:

Expected behavior:

exec into busybox container and curl -X GET http://some-service:port
OR
curl -X GET http://any-node-ip-in-cluster:port
to get access to the target service

Actual behavior:

works only on the node that is running the service

Additional context / logs:

root@vova-test:~# kubectl run -it --rm --restart=Never busybox --image=radial/busyboxplus:curl sh
If you don't see a command prompt, try pressing enter.
[ root@busybox:/ ]$ curl -X GET http://147.75.87.81:30009
^C
[ root@busybox:/ ]$ curl -X GET http://147.75.87.81:30009/api/magicbox/healthcheck
^C
[ root@busybox:/ ]$ curl -X GET http://magicbox-1:30009/api/magicbox/healthcheck
root@vova-test:~# kubectl exec -i -t dnsutils -- nslookup kubernetes.default
;; connection timed out; no servers could be reached

command terminated with exit code 1
apiVersion: v1
kind: Service
metadata:
  name: magicbox-1
spec:
  type: NodePort
  selector:
    app: magicbox-1
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
      nodePort: 30009                                                                                                                                                                                                            
apiVersion: v1
kind: Pod
metadata:
  name: magicbox-1
  labels:
    app: magicbox-1
spec:
  containers:
  - name: pm0
    image: #################
    imagePullPolicy: IfNotPresent
    volumeMounts:
    - mountPath: /dev/LOCKER
      name: locker
    securityContext:
      allowPrivilegeEscalation: true
      capabilities: {}
      privileged: true
      readOnlyRootFilesystem: false
  volumes:
  - name: locker
    hostPath:
      path: /dev/ttyUSB0
      type: ""
  imagePullSecrets:
    - name: externaljustin
  nodeSelector:
    server: "odroid"
@VladimirKorzh VladimirKorzh changed the title CoreDNS name resolution fails, NodePort is not available on every node CoreDNS name resolution fails, NodePort is not available on nodes without the pod Nov 26, 2021
@brandond
Copy link
Member

nslookup kubernetes.cluster fails from nodes that don't have coredns running

This sounds like an issue with the cluster overlay network. Can you update to the most recent version of K3s on whatever branch you prefer? There was an issue with flannel's vxlan on newer kernels that should now be fixed.

@stale
Copy link

stale bot commented Jul 3, 2022

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Jul 3, 2022
@stale stale bot closed this as completed Jul 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants