通过kubeadm搭建一个单master节点的k8s集群
机器:
hostname | ip | role |
---|---|---|
host01 | 10.11.11.11 | master |
host02 | 10.11.11.12 | node1 |
host02 | 10.11.11.13 | node2 |
软件版本:
- OS: centos 7
- Docker: 18.03.0-ce (已提前安装好)
- Kubernetes: 1.13.2
master: apiserver, scheduler, controller-manager
node: kubelet, proxy
addons:
- pod 网络: Flannel
- dns: CoreDNS v1.13开始默认开启使用
- ui: dashboard
monitor: heasper- monitor: Prometheus + Grafana (TODO:)
cluster 网络: 192.66.0.0/6
pod 网络: 192.88.0.0/16
禁用防火墙
systemctl stop firewalld
systemctl disable firewalld
禁用SELINUX
setenforce 0
vi /etc/selinux/config
SELINUX=disabled
关闭swap
swapoff -a
/etc/fstab 注释掉 swap
iptables --list
检查一下Chain FORWARD策略)为ACCEPT, Docker从1.13版本开始调整了默认的防火墙规则,禁用了iptables filter表中FOWARD链,这样会引起Kubernetes集群中跨node的pod无法通信,如果不对,在各个Docker节点修改docker服务配置:
/usr/lib/systemd/system/docker.service 增加
ExecStartPost=/usr/sbin/iptables -P FORWARD ACCEPT
重启docker
systemctl daemon-reload
systemctl restart docker
增加阿里云的kubernetes yum源
cat <<EOF > /etc/yum.repos.d/kubernetes-aliyun.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
EOF
yum update
查看版本
yum list | grep kube
cockpit-kubernetes.x86_64 176-4.el7.centos extras
cri-tools.x86_64 1.12.0-0 kubernetes
kubeadm.x86_64 1.13.2-0 kubernetes
kubectl.x86_64 1.13.2-0 kubernetes
kubelet.x86_64 1.13.2-0 kubernetes
kubernetes.x86_64 1.5.2-0.7.git269f928.el7 extras
kubernetes-ansible.noarch 0.6.0-0.1.gitd65ebd5.el7 epel
kubernetes-client.x86_64 1.5.2-0.7.git269f928.el7 extras
kubernetes-cni.x86_64 0.6.0-0 kubernetes
kubernetes-master.x86_64 1.5.2-0.7.git269f928.el7 extras
kubernetes-node.x86_64 1.5.2-0.7.git269f928.el7 extras
rkt.x86_64 1.27.0-1 kubernetes
rsyslog-mmkubernetes.x86_64 8.24.0-34.el7 os
在所有服务器上安装
yum install -y kubelet-1.13.2 kubeadm-1.13.2 kubectl-1.13.2
systemctl enable kubelet && systemctl start kubelet
RHEL/CentOS7 有issue反映会出现iptables被忽略的情况,配置网桥参数,使得流量不会绕过iptable
cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system
然后kubelet就开始每几秒重启一次,因为在等待kubeadm的指令。
确保kubelet使用的cgroup driver与 Docker的一致, kubelet默认是 cgroupfs
查看docker的cgroup docker info |grep cgroup
输出 Cgroup Driver: cgroupfs
如果不是cgroupfs, 参考官方文档操作
查看指定版本的 kubernetes 所需镜像
kubeadm config images list --kubernetes-version v1.13.2
k8s.gcr.io/kube-apiserver:v1.13.2
k8s.gcr.io/kube-controller-manager:v1.13.2
k8s.gcr.io/kube-scheduler:v1.13.2
k8s.gcr.io/kube-proxy:v1.13.2
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.2.24
k8s.gcr.io/coredns:1.2.6
如果服务器不能F墙,则需要在每台机器上手动从其他镜像源下载并打好tag
docker pull mirrorgooglecontainers/kube-apiserver:v1.13.2
docker tag mirrorgooglecontainers/kube-apiserver:v1.13.2 k8s.gcr.io/kube-apiserver:v1.13.2
docker pull mirrorgooglecontainers/kube-controller-manager:v1.13.2
docker tag mirrorgooglecontainers/kube-controller-manager:v1.13.2 k8s.gcr.io/kube-controller-manager:v1.13.2
docker pull mirrorgooglecontainers/kube-scheduler:v1.13.2
docker tag mirrorgooglecontainers/kube-scheduler:v1.13.2 k8s.gcr.io/kube-scheduler:v1.13.2
docker pull mirrorgooglecontainers/kube-proxy:v1.13.2
docker tag mirrorgooglecontainers/kube-proxy:v1.13.2 k8s.gcr.io/kube-proxy:v1.13.2
docker pull mirrorgooglecontainers/pause:3.1
docker tag mirrorgooglecontainers/pause:3.1 k8s.gcr.io/pause:3.1
docker pull mirrorgooglecontainers/etcd:3.2.24
docker tag mirrorgooglecontainers/etcd:3.2.24 k8s.gcr.io/etcd:3.2.24
docker pull netonline/coredns:1.2.6
docker tag netonline/coredns:1.2.6 k8s.gcr.io/coredns:1.2.6
在master服务器执行
kubeadm init \
--kubernetes-version=v1.13.2 \
--service-cidr=192.66.0.0/16 \
--pod-network-cidr=192.88.0.0/16 \
--apiserver-advertise-address=10.11.11.11
会输出安装过程
[init] Using Kubernetes version: v1.13.2
[preflight] Running pre-flight checks
[WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.03.1-ce. Latest validated version: 18.06
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [host01 localhost] and IPs [10.11.11.11 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [host01 localhost] and IPs [10.11.11.11 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [host01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [192.66.0.1 10.11.11.11]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 20.001856 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "host01" as an annotation
[mark-control-plane] Marking the node host01 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node host01 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: gdh6ih.abw6qwc1ox16e7a3
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes master has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join 10.11.11.11:6443 --token tvgqy4.dh2ocb4sanqwwfs8 --discovery-token-ca-cert-hash sha256:c113734c9333cf088b66c2058fd79925b765790f3420acdb36c490837975bc90
如果需要kubelet在非root用户也能执行,执行以下
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
或者在root用户下执行export KUBECONFIG=/etc/kubernetes/admin.conf
查看集群状态 kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health": "true"}
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
删除所有kind: DaemonSet
项只保留amd64
的一项
并修改该项的imageimage: quay.io/coreos/flannel:v0.10.0-amd64
(要翻墙)为:
jmgao1983/flannel:v0.10.0-amd64
(2处)
将"Network": "10.244.0.0/16"
改为"Network": "192.88.0.0/16"
创建 kubectl create -f kube-flannel.yml
查看 kubectl get daemonset --all-namespaces
master kubectl get pod --all-namespaces -o wide
确认所有pods都是Running状态
master上创建新密钥
kubeadm token create
inbhxg.5v4valxaaa0ggv4w
master上查看密钥的 hash 值
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
c113734c9333cf088b66c2058fd79925b765790f3420acdb36c490837975bc90
node节点上执行添加节点
kubeadm join 10.11.11.11:6443 --token ujpk9a.e4md2z6c6qm9up3a --discovery-token-ca-cert-hash sha256:c113734c9333cf088b66c2058fd79925b765790f3420acdb36c490837975bc90
在master查看nodes
kubectl get nodes
NAME STATUS ROLES AGE VERSION
host02 Ready <none> 17m v1.13.2
host01 Ready master 44m v1.13.2
t-shhq-data-art-1 Ready <none> 13s v1.13.2
kubectl run curl --image=radial/busyboxplus:curl -it
进入后执行
nslookup kubernetes.default
Server: 192.66.0.10
Address 1: 192.66.0.10 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default
Address 1: 192.66.0.1 kubernetes.default.svc.cluster.local
wget https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml
镜像k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1
需要翻墙,修改yaml中的image:
image: siriuszg/kubernetes-dashboard-amd64:v1.10.1
暴露端口
kind: Service
apiVersion: v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
type: NodePort
ports:
- port: 443
targetPort: 8443
nodePort: 30001
selector:
k8s-app: kubernetes-dashboard
新建 kubernetes-dashboard-admin.rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard-admin
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: kubernetes-dashboard-admin
labels:
k8s-app: kubernetes-dashboard
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: kubernetes-dashboard-admin
namespace: kube-system
kubectl create -f .
访问 https://NODEIP:30001
查询token kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin | awk '{print $1}')
使用token登陆
Heapster 已在 v1.11 中弃用,仅推荐 <= v1.7 版本使用heasper
图形化的集群度量指标信息,需要安装一个dashboard插件:Heapster
下载yaml文件
wget https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/grafana.yaml
wget https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/heapster.yaml
wget https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/influxdb.yaml
wget https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/rbac/heapster-rbac.yaml
修改镜像:
k8s.gcr.io/heapster-grafana-amd64:v5.0.4
fishchen/heapster-grafana-amd64:v5.0.4
k8s.gcr.io/heapster-amd64:v1.5.4
fishchen/heapster-amd64:v1.5.4
k8s.gcr.io/heapster-influxdb-amd64:v1.5.2
fishchen/heapster-influxdb-amd64:v1.5.2
heapster 报错
Error in scraping containers from kubelet:10.11.11.13:10255: failed to get all container stats from Kubelet URL "http://10.11.11.13:10255/stats/container/": Post http://10.11.11.13:10255/stats/container/: dial tcp 10.11.11.13:10255: getsockopt: connection refused
修改heapster.yaml
#- --source=kubernetes:https://kubernetes.default
- --source=kubernetes.summary_api:''?kubeletPort=10250&kubeletHttps=true&insecure=true
更新 kubectl apply -f .
报错
Error in scraping containers from kubelet_summary:10.11.11.11:10250: request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:heapster, verb=get, resource=nodes, subresource=stats)"
需要给system:heapster角色增加nodes/stats权限,这里偷个懒直接换成system:kubelet-api-admin角色
修改heapster-rbac.yaml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: heapster
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kubelet-api-admin
subjects:
- kind: ServiceAccount
name: heapster
namespace: kube-system
删除重新创建
kubectl delete -f .
kubectl create -f .
在dashboard可以看到图像化的资源使用情况
官方文档给出两种高可用方案,一种是堆叠master节点,一种是部署额外的etcd集群。
通过kubeadm方式安装只会运行一个单节点的etcd,和一个无状态的kube-apiserver服务。
stacked etcd 拓扑图:
external etcd 拓扑图:
另外要注意kube-dns/coredns需要运行多个实例, 只运行了一个实例的话会有单点风险。
可以使用 kubeadm reset
重置安装环境
# 清理cni和flannel
ifconfig cni0 down
ip link delete cni0
ifconfig flannel.1 down
ip link delete flannel.1
rm -rf /var/lib/cni/
清除iptables
iptables -F && iptables -X && iptables -F -t nat && iptables -X -t nat
清除网桥
ip link del flannel.1
重启docker服务 systemctl restart docker
master执行
kubectl get nodes
NAME STATUS ROLES AGE VERSION
host02 Ready <none> 8m35s v1.13.2
host01 NotReady master 9m29s v1.13.2
kubectl drain host02 --delete-local-data --force --ignore-daemonsets
kubectl delete node host02
node上执行
kubeadm reset
ifconfig cni0 down
ip link delete cni0
ifconfig flannel.1 down
ip link delete flannel.1
rm -rf /var/lib/cni/