Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Run cloud-controller-manager (CCM) on AWS #145

Open
iaguis opened this issue Mar 12, 2020 · 8 comments · May be fixed by #707
Open

Run cloud-controller-manager (CCM) on AWS #145

iaguis opened this issue Mar 12, 2020 · 8 comments · May be fixed by #707
Labels
area/kubernetes Core Kubernetes stuff kind/roadmap Roadmap issues platform/aws AWS-related

Comments

@iaguis
Copy link
Contributor

iaguis commented Mar 12, 2020

Right now we don't run the CCM on AWS so, for example, creating Services of type LoadBalancer doesn't work so we should run it on the AWS platform.

Cloud providers in Kubernetes

It seems in the beginning the code that communicated with the cloud provider was living in each core Kubernetes component (except the scheduler and kube-proxy), so they all had a --cloud-provider flag and they communicated with the cloud.

Today, there are out-of-tree providers too and a new component called cloud-controller-manager. In this way, only that component communicates with the cloud and it can be released independently from Kubernetes. This is the recommended way forward. Check https://kubernetes.io/blog/2019/04/17/the-future-of-cloud-providers-in-kubernetes/ for more details.

Options

There's configuring the cloud-controller-manager component with --cloud-provider=aws. You can find an example DaemonSet here: https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/#examples. As mentioned there, the kubelet needs to run with the flag --cloud-provider=external and there should be no --cloud-provider flags in the API Server nor kube-controller-manager. Also, we might need to play with the CLI flags to get it working correctly.

There's also the cloud-provider-aws repo, which adds to the confusion. It seems to be the home of the out-of-tree AWS cloud provider but development doesn't seem very active (see kubernetes/cloud-provider-aws#42). In the readme it shows the IAM policy needed and the proper node names requirement but it tells to pass --cloud-provider=external to the kubelet, API Server and kube-controller manager, contradicting the previous paragraph. Also, there's no example or yaml file on how to deploy this.

What to do

I think what we should do is use the cloud-controller-manager and try to deploy it as mentioned in Kubernetes Cloud Controller Manager for now, and if in the future cloud-provider-aws is more active we can consider switching to it.

@iaguis iaguis added area/kubernetes Core Kubernetes stuff platform/aws AWS-related labels Mar 12, 2020
@johananl johananl added the proposed/next-sprint Issues proposed for next sprint label Mar 17, 2020
@Wenzil
Copy link

Wenzil commented Mar 18, 2020

I'm confused by this as well. I'm not very fluent in Go but looking at the source code of the in-tree cloud controller manager command, I don't see the code that gets loaded based on the --cloud-provider flag passed to it (i.e. "aws"). Could it be the legacy AWS cloud provider code? It would make sense as a stop-gap solution until the out-of-tree implementation (cloud-provider-aws) gets traction.

In any case, have you had the chance to try it?

@surajssd
Copy link
Member

Also naming of the machines is important for this to work. We need to change the way nodes are named on the AWS provider.

@BrainBlasted
Copy link
Contributor

BrainBlasted commented Mar 18, 2020 via email

@invidian
Copy link
Member

This repository is the right location for the external cloud controller manager, and I'll be spending much more time investing in it this year. At some point, likely this year, we will migrate the source for the AWS cloud provider from upstream to this repo. At that point, development will shift from upstream to here. For now, we are importing the upstream cloud provider and relying on bug fixes upstream. That being said, significant work this year needs to be done on testing and documentation in this repository to make it usable, and that's one of my highest priority goals.

kubernetes/cloud-provider-aws#42 (comment)

It seems that this project is not usable yet 😕 I guess we need to use built-in cloud provider for now then...

@iaguis iaguis added this to the v0.2.0 milestone Mar 18, 2020
@iaguis iaguis removed the proposed/next-sprint Issues proposed for next sprint label Mar 18, 2020
@iaguis
Copy link
Contributor Author

iaguis commented Apr 1, 2020

It seems the CCM doesn't handle Dynamic Provisioning of PVCs, so I was looking at https://github.com/kubernetes-sigs/aws-ebs-csi-driver/, which seems to be the way to go.

Here are my notes:

After that, I followed their Dynamic Provisioning example and everything worked as expected: the StorageClass was created, the PersistentVolumeClaim was created and after some seconds it was Bound, then the Pod using that claim was started and I could see that the Volume worked.

As they mention in their readme There are two ways to grant permissions to the ebs-csi-driver:

  • Creating an IAM user and adding its credentials to the cluster as a secret (the way I used)
  • Giving the InstanceProfile the right permissions for Worker nodes.

I'm not sure what's best so we should discuss it: the InstanceProfile route would mean that anything on that machine can do the operations listed in the policy IIUC so it seems creating an IAM user and placing it in a secret is safer (making sure the cluster is set up properly so only the CSI driver has access to the secret, of course).

@invidian
Copy link
Member

invidian commented Apr 1, 2020

I'm not sure what's best so we should discuss it: the InstanceProfile route would mean that anything on that machine can do the operations listed in the policy IIUC so it seems creating an IAM user and placing it in a secret is safer (making sure the cluster is set up properly so only the CSI driver has access to the secret, of course).

It also seems to me, that creating credentials as a secret is a nicer way to go.

@iaguis iaguis added proposed/next-sprint Issues proposed for next sprint and removed proposed/next-sprint Issues proposed for next sprint labels Apr 8, 2020
@iaguis iaguis added the kind/roadmap Roadmap issues label Apr 22, 2020
@iaguis iaguis mentioned this issue Apr 22, 2020
2 tasks
@BrainBlasted
Copy link
Contributor

A summary of the work so far: we've been able to get CCM running manually on AWS and get the --cloud-provider flags configured correctly. However, getting CCM integrated properly in lokomotive is where we hit a roadblock. When setting up the helm chart within lokomotive, we hit a point where bootstrapping failed. Mateusz and Johannes looked into it, and we need to add the KubernetesCluster = <cluster name> tag on the controller node, and provide the --cluster-name=<cluster name> parameter to CCM, and also set the right IAM role. Setting the IAM role through terraform should be the last step we need for everything to work.

@iaguis
Copy link
Contributor Author

iaguis commented Apr 29, 2020

We discussed about this OOB so I'll summarize the discussion:

  • The relevant features we're interested in for the CCM are handling LoadBalancer services and Dynamic Provisioning of persistent volumes.
  • LoadBalancer services are not really needed to have Ingress working

So for now we decided to put on hold the work on getting the CCM running and focus efforts on running the EBS CSI driver like mentioned in #145 (comment)

Here's a new issue about the EBS CSI driver: #379

@iaguis iaguis removed this from the v0.2.0 milestone Apr 30, 2020
@BrainBlasted BrainBlasted self-assigned this Jul 7, 2020
@BrainBlasted BrainBlasted linked a pull request Jul 7, 2020 that will close this issue
BrainBlasted pushed a commit that referenced this issue Jul 7, 2020
Changes the hostname for AWS clusters to the naming scheme
preferred by Cloud Controller Manager in order to allow us
to set up LoadBalancer services on AWS.

Required for #145
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/kubernetes Core Kubernetes stuff kind/roadmap Roadmap issues platform/aws AWS-related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants