Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K3s reapplies "master" node-role label to single node cluster on server restart. #2124

Closed
jgreat opened this issue Aug 13, 2020 · 13 comments
Closed
Assignees
Labels
kind/bug Something isn't working priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@jgreat
Copy link

jgreat commented Aug 13, 2020

Environmental Info:

k3s version v1.18.6+k3s1 (6f56fa1d)

Node(s) CPU architecture, OS, and Version:

AWS - t3a.medium - 4GB, 2CPU
Linux ip-172-20-3-222 5.4.0-1018-aws #18-Ubuntu SMP Wed Jun 24 01:15:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

  • Single node.
  • AWS out-of-tree CCM.
  • Aurora PostgreSQL RDS backed.
➜ kubectl get node
NAME                                         STATUS   ROLES    AGE   VERSION
ip-172-20-3-222.us-east-2.compute.internal   Ready    single   13d   v1.18.6+k3s1

Describe the bug:

Occasionally the k3s server instance has an internal timeout (will file a different issue for that) and systemd will restart the k3s service. When this happens k3s will re-apply the node-role.kubernetes.io/master=true label to the node. This "breaks" the AWS CCM since that service will not add master nodes to the CCM managed AWS load balancer pools.

Steps To Reproduce:

Add single role label and remove master role label

➜ kubectl label node ip-172-20-3-222.us-east-2.compute.internal 'node-role.kubernetes.io/single="true"'
➜ kubectl label node ip-172-20-3-222.us-east-2.compute.internal node-role.kubernetes.io/master-

Validate that only the single role is enabled

➜ kubectl get node
NAME                                         STATUS   ROLES    AGE   VERSION
ip-172-20-3-222.us-east-2.compute.internal   Ready    single   13d   v1.18.6+k3s1

Restart k3s service

systemctl restart k3s.service

Expected behavior:
Only single role label should be applied

kubectl get node
NAME                                         STATUS   ROLES    AGE   VERSION
ip-172-20-3-222.us-east-2.compute.internal   Ready    single   13d   v1.18.6+k3s1

Actual behavior:
master label has been reapplied

kubectl get node
NAME                                         STATUS   ROLES           AGE   VERSION
ip-172-20-3-222.us-east-2.compute.internal   Ready    master,single   13d   v1.18.6+k3s1
@davidnuzik davidnuzik added [zube]: To Triage kind/bug Something isn't working and removed [zube]: To Triage labels Aug 14, 2020
@cjellick
Copy link
Contributor

@jgreat - semi-unrelated - Im curious how you are sidestepping this issue with an external CCM #1807
(we are introducing a fix and i wonder how it meshes with whatever you are doing to work around it)

@jgreat
Copy link
Author

jgreat commented Aug 14, 2020

I'm adding the "raw" ccm manifest via cloud-init to /var/lib/rancher/k3s/server/manifests and have an install script that wraps around k3s install to provide the provider-id info and re-label the node after it comes up.

install.sh

#!/bin/bash

provider_id="$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)/$(curl -s http://169.254.169.254/latest/meta-data/instance-id)"

curl -sfL https://get.k3s.io -o /k3s_install.sh
chmod +x /k3s_install.sh

/k3s_install.sh \
  --disable-cloud-controller \
  --disable servicelb \
  --disable local-storage \
  --disable traefik \
  --kubelet-arg="cloud-provider=external" \
  --kubelet-arg="provider-id=aws:///${provider_id}"

rm /k3s_install.sh

# wait for cluster up.
return=1
while [ ${return} != 0 ]; do
  sleep 2
  kubectl get nodes $(hostname -f) 2>&1 >/dev/null
  return=$?
done

# re-lable if single node cluster. AWS CCM doesn't run on "master" nodes.
if [ "${NODE_ROLE}" == "single" ]; then
  is_master=$(kubectl get node -o json | jq -r ".items[] | select(.metadata.name == \"$(hostname -f)\") | .metadata.labels.\"node-role.kubernetes.io/master\"")

  if [ "${is_master}" == "true" ]; then
    kubectl label node $(hostname -f) node-role.kubernetes.io/master- node-role.kubernetes.io/single="true"
  fi
fi

00-aws-ccm.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: cloud-controller-manager
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: system:cloud-controller-manager
  labels:
    kubernetes.io/cluster-service: "true"
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - '*'
- apiGroups:
  - ""
  resources:
  - nodes/status
  verbs:
  - patch
- apiGroups:
  - ""
  resources:
  - services
  verbs:
  - list
  - watch
  - patch
- apiGroups:
  - ""
  resources:
  - services/status
  verbs:
  - update
  - patch
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - create
  - patch
  - update
# For leader election
- apiGroups:
  - ""
  resources:
  - endpoints
  verbs:
  - create
- apiGroups:
  - ""
  resources:
  - endpoints
  resourceNames:
  - "cloud-controller-manager"
  verbs:
  - get
  - list
  - watch
  - update
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - create
- apiGroups:
  - ""
  resources:
  - configmaps
  resourceNames:
  - "cloud-controller-manager"
  verbs:
  - get
  - update
- apiGroups:
  - ""
  resources:
  - serviceaccounts
  verbs:
  - create
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - get
  - list
- apiGroups:
  - "coordination.k8s.io"
  resources:
  - leases
  verbs:
  - get
  - create
  - update
  - list
# For the PVL
- apiGroups:
  - ""
  resources:
  - persistentvolumes
  verbs:
  - list
  - watch
  - patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: aws-cloud-controller-manager
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: cloud-controller-manager
  namespace: kube-system
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: aws-cloud-controller-manager-ext
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: cloud-controller-manager
  namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: aws-cloud-controller-manager
  namespace: kube-system
  labels:
    k8s-app: aws-cloud-controller-manager
spec:
  selector:
    matchLabels:
      component: aws-cloud-controller-manager
      tier: control-plane
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        component: aws-cloud-controller-manager
        tier: control-plane
    spec:
      serviceAccountName: cloud-controller-manager
      hostNetwork: true
      # If this is a single node we do not want this selector
      # and we need to remove the node-role.kubernetes.io/master label
      # Maybe set node-role.kubernetes.io/combined: "true"
      # nodeSelector:
      #   node-role.kubernetes.io/master: "true"
      tolerations:
      - key: node.cloudprovider.kubernetes.io/uninitialized
        value: "true"
        effect: NoSchedule
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      containers:
        - name: aws-cloud-controller-manager
          image: jgreat/aws-cloud-controller-manager:20200331-095641

@jgreat
Copy link
Author

jgreat commented Sep 2, 2020

Here's a temporary workaround just in case someone else runs into this.

As part of my install, I'm adding a couple of 'ExecStartPost' commands to the k3s systemd service. This should remove the "master" label if the service is restarted.

# remove master label if single node cluster. AWS CCM doesn't run on "master" nodes.
if [ "${NODE_ROLE}" == "single" ]; then
  /usr/local/bin/k3s kubectl label node --all --overwrite node-role.kubernetes.io/master-

  # Add extra line because of the tailing backslash in the k3s install generated systemd unit config.
  echo '' >> /etc/systemd/system/k3s.service
  echo 'ExecStartPost=/usr/bin/sleep 10' >> /etc/systemd/system/k3s.service
  echo 'ExecStartPost=/usr/local/bin/k3s kubectl label node --all --overwrite node-role.kubernetes.io/master-' >> /etc/systemd/system/k3s.service
  systemctl daemon-reload
fi

@davidnuzik davidnuzik added this to the v1.20 - Backlog milestone Sep 15, 2020
@dweomer dweomer added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Nov 23, 2020
@tomerleib
Copy link

Tried the methods above for rke2 cluster (3 nodes) - the master label is still listed and the ccm cannot join the ELB

@brandond
Copy link
Member

brandond commented Jan 4, 2021

Yes, there is not currently any way to prevent K3s from reapplying the master/control-plane roles on startup.

FWIW I don't see anything in the out-of-tree cloud provider at https://github.com/kubernetes/cloud-provider-aws, nor the legacy in-tree cloud provider, that would prevent nodes with a master role from serving the ELB? Where are you seeing this restriction?

@tomerleib
Copy link

@brandond I'm attempting to run RKE2 with aws-cloud-provider, 3 nodes.
I've created the service for LoadBalancer, however, no nodes were registered, I've faced the same issue mentioned here:
kubernetes/kubernetes#65618

So, unless I will remove the master label (which currently isn't working), I'm forced to create more workers in order to served by ELB.

@brandond
Copy link
Member

brandond commented Jan 5, 2021

Ah I see, it's not that it's just excluded from the ELB by the cloud provider, it's excluded from showing up in Service endpoints.

As per kubernetes/kubernetes#90126 it looks like this should be fixable now, and fixed by default in v1.20 - have you tried turning the LegacyNodeRoleBehavior feature gate off?

The LegacyNodeRoleBehavior gate is now beta and will be turned off by default in 1.20. Clusters that rely on the 'node-role.kubernetes.io/master' label to exclude nodes from service load balancers should set the node.kubernetes.io/exclude-from-external-load-balancers label on their master nodes if they still wish masters to not be included in service load balancing. The feature gate may be manually enabled in 1.20 but will be removed in 1.21.

@dgiebert
Copy link

I have tried with version k3s version v1.20.4+k3s1 using the latest k3os build and get

Warning  UnAvailableLoadBalancer  1s     service-controller  There are no available nodes for LoadBalancer

@brandond
Copy link
Member

brandond commented Apr 13, 2021

Have you tried turning the referenced FeatureGate off? It appears that upstream did not in fact disable it by default in 1.20 as they had proposed doing in that PR. According to the docs, 1.19 moved it to Beta but it remains true by default.

https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#:~:text=LegacyNodeRoleBehavior

@dgiebert
Copy link

I actually tried adding it as a kubelet arg --kubelet-arg=feature-gates=LegacyNodeRoleBehavior=false

Currently not sure if this now is a problem with the cloud controller or k8s / k3s

@brandond
Copy link
Member

brandond commented Apr 13, 2021

I don't think that's a kubelet feature-gate, you'll probably need to pass it as a controller-manager and/or apiserver arg. If you're running an out-of-tree cloud controller you'd need to pass it to that.

@dgiebert
Copy link

Thanks for the hint, it needs to be added to the kubelet and the controller-manager 👍

@brandond
Copy link
Member

Closing - K3s will continue to apply the role labels; anyone that has an issue with this due to the endpoint controller not including addresses for nodes with master/control-plane role labels can use the LegacyNodeRoleBehavior FeatureGate to turn this off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

7 participants