Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport 1807 - Deploying External Cloud Provider with Helm Controller - Chicken:Egg problem #2140

Closed
davidnuzik opened this issue Aug 18, 2020 · 3 comments
Assignees
Milestone

Comments

@davidnuzik
Copy link
Contributor

davidnuzik commented Aug 18, 2020

From #1807

Version:
v1.18.2+k3s1

K3s arguments:

provider_id="$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)/$(curl -s http://169.254.169.254/latest/meta-data/instance-id)"

k3s server \
  --disable-cloud-controller \
  --disable servicelb \
  --disable local-storage \
  --disable traefik \
  --datastore-endpoint=${k3s_db_endpoint} \
  --token="${k3s_server_token}" \
  --agent-token="${k3s_agent_token}" \
  --node-name="$(hostname -f)" \
  --kubelet-arg="cloud-provider=external" \
  --kubelet-arg="provider-id=aws:///$provider_id" \
  --write-kubeconfig-mode=644 \
  --tls-san=${elb_dns}

Describe the bug
I would like to install the AWS CCM (external cloud provider) with a helm chart and the helm chart controller by placing a helm manifest in /var/lib/rancher/k3s/server/manifests but helm chart controller pod won't initialize on a new cluster because all the nodes are tainted with node.cloudprovider.kubernetes.io/uninitialized: true. So I need a cloud-provider to install my cloud-provider.

I can remove the taint on one of the nodes, but then that node doesn't get properly initialized.

Is there a way to modify the helm chart controller with a tolerance so it can run on a uninitialized node?

To Reproduce
Launch a new cluster with an external cloud provider and try to install an external cloud provider with a helm chart.

/var/lib/rancher/k3s/server/manifests/00-aws-ccm.yaml

apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: aws-cloud-controller-manager
  namespace: kube-system
spec:
  chart: aws-cloud-controller-manager
  repo: http://charts.jgreat.me
  version: 0.0.0-20200508.T071542
  targetNamespace: kube-system

Expected behavior
Helm controller would launch included chart.

Actual behavior
Helm controller pods didn't tolerate node.cloudprovider.kubernetes.io/uninitialized: true taint.

Additional context / logs

➜ kubectl -n kube-system describe pods helm-install-aws-cloud-controller-manager-nd5cm
Name:         helm-install-aws-cloud-controller-manager-nd5cm
Namespace:    kube-system
Priority:     0
Node:         ip-172-20-61-193.us-east-2.compute.internal/172.20.61.193
Start Time:   Mon, 18 May 2020 15:51:20 -0500
Labels:       controller-uid=0e7af8c6-7bab-4a40-bb49-f97f782a1dd1
              helmcharts.helm.cattle.io/chart=aws-cloud-controller-manager
              job-name=helm-install-aws-cloud-controller-manager
Annotations:  <none>
Status:       Running
IP:           10.42.1.2
IPs:
  IP:           10.42.1.2
Controlled By:  Job/helm-install-aws-cloud-controller-manager
Containers:
  helm:
    Container ID:  containerd://25af9567c5dc8692f6f812530b2dfcb2602e57600bfcee72284e23b9a8adb9e6
    Image:         rancher/klipper-helm:v0.2.5
    Image ID:      docker.io/rancher/klipper-helm@sha256:b694f931ffb70c4e0b6aedf69171936cad98e79a5df49372f0e553d7d610062d
    Port:          <none>
    Host Port:     <none>
    Args:
      install
      --namespace
      kube-system
      --repo
      http://charts.jgreat.me
      --version
      0.0.0-20200508.T071542
    State:          Running
      Started:      Mon, 18 May 2020 15:51:24 -0500
    Ready:          True
    Restart Count:  0
    Environment:
      NAME:             aws-cloud-controller-manager
      VERSION:          0.0.0-20200508.T071542
      REPO:             http://charts.jgreat.me
      VALUES_HASH:      e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
      HELM_DRIVER:      secret
      CHART_NAMESPACE:  kube-system
      CHART:            aws-cloud-controller-manager
      HELM_VERSION:     
      NO_PROXY:         ,10.42.0.0/16,10.43.0.0/16
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from helm-aws-cloud-controller-manager-token-rcgx9 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  helm-aws-cloud-controller-manager-token-rcgx9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  helm-aws-cloud-controller-manager-token-rcgx9
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age        From                                                  Message
  ----     ------            ----       ----                                                  -------
  Warning  FailedScheduling  <unknown>  default-scheduler                                     0/2 nodes are available: 2 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler                                     0/2 nodes are available: 2 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.
@davidnuzik
Copy link
Contributor Author

The original issue was needed in 1.18.x (originally had a 1.18 milestone). We'll need to backport this for our next v1.18.x patch release.

@davidnuzik
Copy link
Contributor Author

@rancher-max same as #1807 just backported into v1.18 release branch. We should just do a quick check to ensure it's good to go. I think we should just build from that branch and test and close this out.

@rancher-max
Copy link
Contributor

Validated in backport using commit id: b9542ef0014fd4c5b7d6d70a29f0a953ab7d85dd from release-1.18 branch.
Followed the same steps I did to validate the original issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants