Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bind the kubelet to the local ipv4 address #4417

Merged
merged 2 commits into from
Mar 2, 2018
Merged

Bind the kubelet to the local ipv4 address #4417

merged 2 commits into from
Mar 2, 2018

Conversation

dezmodue
Copy link
Contributor

@dezmodue dezmodue commented Feb 9, 2018

No description provided.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 9, 2018
@chrislovecnm
Copy link
Contributor

@liwenwu-amazon double checking with you that this was the recommended solution.

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 12, 2018
@justinsb justinsb added this to the 1.9 milestone Feb 21, 2018
@bksteiny
Copy link
Contributor

@chrislovecnm, I tested this based on our Slack conversation , and --node-ip is added to the kubelet environment file.

root@ip-10-4-233-250:/home/admin# cat /etc/sysconfig/kubelet 
DAEMON_ARGS="--allow-privileged=true --cgroup-root=/ --cloud-provider=aws --cluster-dns=101.64.0.10 --cluster-domain=cluster.local --enable-debugging-handlers=true --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5% --feature-gates=ExperimentalCriticalPodAnnotation=true --hostname-override=ip-10-4-233-250.us-west-2.compute.internal --kubeconfig=/var/lib/kubelet/kubeconfig --network-plugin=cni --node-labels=kubernetes.io/role=master,node-role.kubernetes.io/master= --non-masquerade-cidr=101.64.0.0/10 --pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.0 --pod-manifest-path=/etc/kubernetes/manifests --register-schedulable=true --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --require-kubeconfig=true --v=2 --cni-bin-dir=/opt/cni/bin/ --cni-conf-dir=/etc/cni/net.d/ --node-ip=10.4.233.250"
HOME="/root"

Log from journalctl:

Feb 26 01:04:27 ip-10-4-233-250 kubelet[1663]: I0226 01:04:27.254687    1663 kubelet_node_status.go:455] Using node IP: "10.4.233.250"

However, there is an issue pulling down the amazon-k8s-cni image from ECR. In order to pull it down, I had to:

  1. Add AmazonEC2ContainerRegistryReadOnly permission to my Kops user
  2. Run aws configure to setup my Kops user
  3. Authenticate to ECR using: aws ecr get-login and run docker login ..... https://602401143452.dkr.ecr.us-west-2.amazonaws.com

Is this expected or am I doing something wrong?

if err != nil {
glog.Fatalf("Couldn't fetch the local-ipv4 address from the ec2 meta-data: %v", err)
} else {
flags += " --node-ip=" + localIpv4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don’t need the else as I think glog.Fatal will return an err. We may want to log and error and return err here.

@chrislovecnm
Copy link
Contributor

@bksteiny no idea on the image, I would reach out to the AWS folks

@dezmodue
Copy link
Contributor Author

@chrislovecnm changed as requested

@chrislovecnm
Copy link
Contributor

Run ./hack/update-bazel.sh

You need to rebase and run that command as CI is failing.

@chrislovecnm
Copy link
Contributor

Looks good ... CI is not happy

@justinsb
Copy link
Member

Code looks fine once CI is happy.

Another option is to do this (though with local-ipv4), as it is just an HTTP request I believe: https://github.com/kubernetes/kops/blob/master/upup/pkg/fi/nodeup/command.go#L344-L348

@dezmodue
Copy link
Contributor Author

@justinsb let me know if I should update the PR as you suggested, I don't have a preference

@chrislovecnm
Copy link
Contributor

Let’s just get CI fixed ;)

@chrislovecnm
Copy link
Contributor

You need to run the command to fix bazel. I mentioned it in a previous comment.

@chrislovecnm
Copy link
Contributor

@justinsb this needs to go into 1.9 since it is fixing a bug with the aws cni provider

@liwenwu-amazon
Copy link

liwenwu-amazon commented Feb 28, 2018

I pulled this PR locally and build a dev version of kop. I am NOT able to bring up a kop cluster successfully. Not sure it is my environment issue or this PR.
Here are output

ubuntu@ip-10-0-1-11:~/workspace/src/k8s.io/kops$ kops update cluster cni-feb28.k8s-test.com --yes
W0228 19:10:04.388084   25171 apply_cluster.go:778] unable to parse kops version "dev"
W0228 19:10:04.404572   25171 urls.go:71] Using base url from KOPS_BASE_URL env var: "https://k8s-test-com-state-store.s3.amazonaws.com/kops/dev/"
I0228 19:10:04.460920   25171 dns.go:92] Private DNS: skipping DNS validation
I0228 19:10:04.612321   25171 executor.go:91] Tasks: 0 done / 77 total; 30 can run
I0228 19:10:04.880869   25171 vfs_castore.go:715] Issuing new certificate: "ca"
I0228 19:10:05.062297   25171 vfs_castore.go:715] Issuing new certificate: "apiserver-aggregator-ca"
I0228 19:10:05.707267   25171 executor.go:91] Tasks: 30 done / 77 total; 24 can run
I0228 19:10:06.197135   25171 vfs_castore.go:715] Issuing new certificate: "kube-controller-manager"
I0228 19:10:06.432073   25171 vfs_castore.go:715] Issuing new certificate: "kubecfg"
I0228 19:10:06.680989   25171 vfs_castore.go:715] Issuing new certificate: "kube-proxy"
I0228 19:10:06.698087   25171 vfs_castore.go:715] Issuing new certificate: "kubelet-api"
I0228 19:10:06.783840   25171 vfs_castore.go:715] Issuing new certificate: "kops"
I0228 19:10:06.854512   25171 vfs_castore.go:715] Issuing new certificate: "master"
I0228 19:10:06.880286   25171 vfs_castore.go:715] Issuing new certificate: "apiserver-proxy-client"
I0228 19:10:06.932544   25171 vfs_castore.go:715] Issuing new certificate: "kube-scheduler"
I0228 19:10:07.045471   25171 vfs_castore.go:715] Issuing new certificate: "kubelet"
I0228 19:10:07.128042   25171 vfs_castore.go:715] Issuing new certificate: "apiserver-aggregator"
I0228 19:10:07.523679   25171 executor.go:91] Tasks: 54 done / 77 total; 21 can run
I0228 19:10:07.695420   25171 launchconfiguration.go:333] waiting for IAM instance profile "nodes.cni-feb28.k8s-test.com" to be ready
I0228 19:10:07.747283   25171 launchconfiguration.go:333] waiting for IAM instance profile "masters.cni-feb28.k8s-test.com" to be ready
I0228 19:10:18.184210   25171 executor.go:91] Tasks: 75 done / 77 total; 2 can run
I0228 19:10:18.901682   25171 executor.go:91] Tasks: 77 done / 77 total; 0 can run
I0228 19:10:18.901717   25171 dns.go:153] Pre-creating DNS records
I0228 19:10:19.156283   25171 update_cluster.go:253] Exporting kubecfg for cluster
W0228 19:10:19.256569   25171 create_kubecfg.go:58] Did not find API endpoint for gossip hostname; may not be able to reach cluster
kops has set your kubectl context to cni-feb28.k8s-test.com

Cluster is starting.  It should be ready in a few minutes.

here is the output of kops validate error

kops validate cluster
Using cluster from kubectl context: cni-feb28.k8s-test.com

Validating cluster cni-feb28.k8s-test.com

Validation Failed

The dns-controller Kubernetes deployment has not updated the Kubernetes cluster's API DNS entry to the correct IP address.  The API DNS IP address is the placeholder address that kops creates: 203.0.113.123.  Please wait about 5-10 minutes for a master to start, dns-controller to launch, and DNS to propagate.  The protokube container and dns-controller deployment logs may contain more diagnostic information.  Etcd and the API DNS entries must be updated for a kops Kubernetes cluster to start.


Cannot reach cluster's API server: unable to Validate Cluster: cni-feb28.k8s-test.com

The master instance is up already for over 10 minutes.
Here is command I use

kops create cluster --zones us-east-1a,us-east-1b,us-east-1c --dns private --vpc vpc-0066bd79 --node-count 3 --master-size m3.xlarge  --networking amazon-vpc-routed-eni --kubernetes-version 1.9.3 $NAME -v 10

@chrislovecnm
Copy link
Contributor

@dezmodue can we get really detailed instructions on how you tested?

@dezmodue
Copy link
Contributor Author

dezmodue commented Mar 1, 2018

Hi, at the time I sent in the PR I had built kops and nodeup from the modified version in my repo:

make crossbuild
make crossbuild-nodeup
shasum $GOPATH/src/k8s.io/kops/.build/dist/linux/amd64/nodeup | cut -d\  -f1 > $GOPATH/src/k8s.io/kops/.build/dist/linux/amd64/nodeup.sha1
aws s3 cp --acl public-read $GOPATH/src/k8s.io/kops/.build/dist/linux/amd64/nodeup s3://mybucket-kops-binaries/testing/
aws s3 cp --acl public-read $GOPATH/src/k8s.io/kops/.build/dist/linux/amd64/nodeup.sha1 s3://mybucket-dev-kops-binaries/testing/
export NODEUP_URL=https://mybucket-kops-binaries.s3.amazonaws.com/testing/nodeup
export NODEUP_HASH=$(cat .build/dist/linux/amd64/nodeup.sha1)

Then I built 2 clusters from a yaml definition like the one in the issue #4218 -- they are still running fwiw

$GOPATH/src/k8s.io/kops/.build/dist/darwin/amd64/kops create -f ${NAME}.yaml

Today I have rebased on 034bad8 and ran again a test cluster:

go version go1.9.2 darwin/amd64
make crossbuild
make crossbuild-nodeup
shasum $GOPATH/src/k8s.io/kops/.build/dist/linux/amd64/nodeup | cut -d\  -f1 > $GOPATH/src/k8s.io/kops/.build/dist/linux/amd64/nodeup.sha1
aws s3 cp --acl public-read $GOPATH/src/k8s.io/kops/.build/dist/linux/amd64/nodeup s3://${BUCKET}-kops-binaries/testing/
aws s3 cp --acl public-read $GOPATH/src/k8s.io/kops/.build/dist/linux/amd64/nodeup.sha1 s3://${BUCKET}-kops-binaries/testing/
export NODEUP_URL=https://${BUCKET}-kops-binaries.s3.amazonaws.com/testing/nodeup
export NODEUP_HASH=$(cat .build/dist/linux/amd64/nodeup.sha1)

export NODE_SIZE=${NODE_SIZE:-m4.large}
export MASTER_SIZE=${MASTER_SIZE:-m4.large}
export ZONES=${ZONES:-"eu-west-1a,eu-west-1b,eu-west-1c"}
$GOPATH/src/k8s.io/kops/.build/dist/darwin/amd64/kops create cluster test-cni.${DOMAIN} \
  --node-count 1 \
  --zones $ZONES \
  --node-size $NODE_SIZE \
  --master-size $MASTER_SIZE \
  --master-zones $ZONES \
  --networking amazon-vpc-routed-eni \
  --kubernetes-version 1.9.3 \
  --topology private \
  --ssh-public-key ~/.ssh/${SSHKEY} \
  --ssh-access ${ACCESS} \
  --api-loadbalancer-type public \
  --authorization rbac \
  --admin-access ${ACCESS} \
  --bastion="true"

The result is a healthy cluster as far as I can tell:

$GOPATH/src/k8s.io/kops/.build/dist/darwin/amd64/kops validate cluster --name test-cni.${DOMAIN}
Validating cluster test-cni.my.domain.com

INSTANCE GROUPS
NAME                    ROLE    MACHINETYPE     MIN     MAX     SUBNETS
bastions                Bastion t2.micro        1       1       utility-eu-west-1a,utility-eu-west-1b,utility-eu-west-1c
master-eu-west-1a       Master  m4.large        1       1       eu-west-1a
master-eu-west-1b       Master  m4.large        1       1       eu-west-1b
master-eu-west-1c       Master  m4.large        1       1       eu-west-1c
nodes                   Node    m4.large        1       1       eu-west-1a,eu-west-1b,eu-west-1c

NODE STATUS
NAME                                            ROLE    READY
ip-172-20-122-136.eu-west-1.compute.internal    master  True
ip-172-20-41-87.eu-west-1.compute.internal      master  True
ip-172-20-87-136.eu-west-1.compute.internal     node    True
ip-172-20-87-29.eu-west-1.compute.internal      master  True

Your cluster test-cni.my.domain.com is ready

I logged into the node and checked the running kubelet process:

admin@ip-172-20-87-136:~$ ps -efw | grep kubelet
root      2503     1  1 15:17 ?        00:00:20 /usr/local/bin/kubelet --allow-privileged=true --cgroup-root=/ --cloud-provider=aws --cluster-dns=172.20.0.10 --cluster-domain=cluster.local --enable-debugging-handlers=true --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5% --feature-gates=ExperimentalCriticalPodAnnotation=true --hostname-override=ip-172-20-87-136.eu-west-1.compute.internal --kubeconfig=/var/lib/kubelet/kubeconfig --network-plugin=cni --node-labels=kops.k8s.io/instancegroup=nodes,kubernetes.io/role=node,node-role.kubernetes.io/node= --non-masquerade-cidr=172.20.0.0/16 --pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.0 --pod-manifest-path=/etc/kubernetes/manifests --register-schedulable=true --v=2 --cni-bin-dir=/opt/cni/bin/ --cni-conf-dir=/etc/cni/net.d/ --node-ip=172.20.87.136

And the config:

admin@ip-172-20-87-136:~$ cat /lib/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet Server
Documentation=https://github.com/kubernetes/kubernetes
After=docker.service

[Service]
EnvironmentFile=/etc/sysconfig/kubelet
ExecStart=/usr/local/bin/kubelet "$DAEMON_ARGS"
Restart=always
RestartSec=2s
StartLimitInterval=0
KillMode=process
User=root

admin@ip-172-20-87-136:~$ cat /etc/sysconfig/kubelet
DAEMON_ARGS="--allow-privileged=true --cgroup-root=/ --cloud-provider=aws --cluster-dns=172.20.0.10 --cluster-domain=cluster.local --enable-debugging-handlers=true --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5% --feature-gates=ExperimentalCriticalPodAnnotation=true --hostname-override=ip-172-20-87-136.eu-west-1.compute.internal --kubeconfig=/var/lib/kubelet/kubeconfig --network-plugin=cni --node-labels=kops.k8s.io/instancegroup=nodes,kubernetes.io/role=node,node-role.kubernetes.io/node= --non-masquerade-cidr=172.20.0.0/16 --pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.0 --pod-manifest-path=/etc/kubernetes/manifests --register-schedulable=true --v=2 --cni-bin-dir=/opt/cni/bin/ --cni-conf-dir=/etc/cni/net.d/ --node-ip=172.20.87.136"
HOME="/root"

I can see in the AWS console that the node and the masters have assigned secondary private IPs as expected.

kubectl describe node ip-172-20-87-136.eu-west-1.compute.internal shows:

....
Addresses:
  InternalIP:  172.20.87.136
  Hostname:    ip-172-20-87-136.eu-west-1.compute.internal
....
  Namespace                  Name                                                      CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                                                      ------------  ----------  ---------------  -------------
  kube-system                aws-node-hk8t6                                            10m (0%)      0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-dns-autoscaler-787d59df8f-t7trg                      20m (1%)      0 (0%)      10Mi (0%)        0 (0%)
  kube-system                kube-dns-c58977f6c-9w7hs                                  260m (13%)    0 (0%)      110Mi (1%)       170Mi (2%)
  kube-system                kube-dns-c58977f6c-fv769                                  260m (13%)    0 (0%)      110Mi (1%)       170Mi (2%)
  kube-system                kube-proxy-ip-172-20-87-136.eu-west-1.compute.internal    100m (5%)     0 (0%)      0 (0%)           0 (0%)

I launched a pod and I can see it gets assigned an IP from the correct range:

root@test-5b77c64966-8xznp:/# ifconfig
eth0      Link encap:Ethernet  HWaddr 3e:d8:7d:fa:13:3d
          inet addr:172.20.84.59  Bcast:172.20.84.59  Mask:255.255.255.255

kubectl describe pod test-5b77c64966-8xznp
Name:           test-5b77c64966-8xznp
Namespace:      default
Node:           ip-172-20-87-136.eu-west-1.compute.internal/172.20.87.136
Start Time:     Thu, 01 Mar 2018 17:11:24 +0100
Labels:         pod-template-hash=1633720522
                run=test
Annotations:    kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container test
Status:         Running
IP:             172.20.84.59

I tore down the cluster now but if you have more questions please let me know.

@chrislovecnm I have rebased and ran ./hack/update-bazel.sh - fwiw I am also available in the kops-users channel in slack

@dezmodue
Copy link
Contributor Author

dezmodue commented Mar 1, 2018

/retest

@liwenwu-amazon
Copy link

@dezmodue @justinsb @chrislovecnm
I have just tested the PR. it works for me. I am able to "kubectl exec" into a Pod.

Also, I have changed to use Gossip-based cluster. Not sure why Private DNS based cluster suddenly stopped working for me now.

Copy link
Contributor

@KashifSaadat KashifSaadat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chrislovecnm
Copy link
Contributor

@dezmodue I think we want to cherry pick this into release-1.9 branch. Do you mind?

@chrislovecnm
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 2, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrislovecnm, dezmodue

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 2, 2018
@dezmodue
Copy link
Contributor Author

dezmodue commented Mar 2, 2018

Fine by me, anything I should do?

@chrislovecnm
Copy link
Contributor

@dezmodue just create a PR into the release branch, we are moving towards doing individual cherry picks

@k8s-ci-robot k8s-ci-robot merged commit e634143 into kubernetes:master Mar 2, 2018
@dezmodue
Copy link
Contributor Author

dezmodue commented Mar 3, 2018

#4568 -- hope this is correct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. blocks-next cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants