目录

k8s集群搭建

Kubernetes (1.22.2)安装手册(Ubuntu非高可用版)

安装前准备工作

(重要!)安装k8s 1.22.2 版本需要的硬件条件:

所有节点:

cpu数量2个及以上

4G 以上内存

不用开虚拟化

(重要!!)虚拟机不要用克隆,每台都要单独创建否则会出问题。

未安装docker 之前创建一个快照。

安装docker、kubelet,并初始化集群后在创建一个快照。

未安装cni插件之前创建一个快照,以便于随时更改calico 和fluence

kubeadm 初始化失败检查,重置kubeadm。

ip 地址规划:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
master: 10.4.7.60/24 

gw:10.4.7.254

dns:114.114.114.114


worker1: 10.4.7.61/24

gw:10.4.7.254

dns:114.114.114.114


worker1: 10.4.7.61/24

gw:10.4.7.254

dns:114.114.114.114

各节点安装时需要更换apt源:(更新和安装deb包都很快!!!)

1
http://mirrors.tuna.tsinghua.edu.cn/ubuntu/

1. 设置hosts解析

操作节点:所有节点(master)均需执行

  • 修改hostname hostname必须只能包含小写字母、数字、","、"-",且开头结尾必须是小写字母或数字
1
2
3
4
5
6
7
8
# 在master节点
hostnamectl set-hostname master #设置master节点的hostname

# slave1节点
hostnamectl set-hostname worker1

# slave2节点
hostnamectl set-hostname worker2

2. 调整系统配置

操作节点: 所有的master和slave节点(master,k8s-slave)需要执行

本章下述操作均以master为例,其他节点均是相同的操作(ip和hostname的值换成对应机器的真实值)

设置iptables

1
2
3
iptables -P FORWARD ACCEPT
/etc/init.d/ufw stop
ufw disable
  • 关闭swap
1
2
3
4
swapoff -a
# 防止开机自动挂载 swap 分区
rm /swap.img  
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
  • 修改内核参数
1
2
3
4
5
6
7
8
cat <<EOF >  /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward=1
vm.max_map_count=262144
EOF
modprobe br_netfilter
sysctl -p /etc/sysctl.d/k8s.conf
  • 设置apt源
1
2
3
4
5
6
7
8
apt-get update && apt-get install -y apt-transport-https ca-certificates software-properties-common 
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - 
curl -fsSL https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main"

apt-get update   
#若上步出现NO_PUBLICKEY问题,参考https://www.cnblogs.com/jiangzuo/p/13667011.html

3. 安装docker

操作节点: 所有节点

1
2
apt-get install docker-ce=5:20.10.9~3-0~ubuntu-focal
systemctl enable docker && systemctl start docker

注意要关闭docker 默认的cgroup 否则会出现kubeadm 初始化失败!!!!

这是cgroup驱动问题。默认情况下Kubernetes cgroup驱动程序设置为system,但docker设置为systemd。我们需要更改Docker cgroup驱动,通过创建配置文件/etc/docker/daemon.json并添加以下行:

1
{"exec-opts": ["native.cgroupdriver=systemd"]}

然后,为使配置生效,你必须重启docker和kubelet。

1
systemctl daemon-reloadsystemctl restart dockersystemctl restart kubelet

重置kubeadm

1
sudo kubeadm reset

查看所有节点是否有污点

1
kubectl get nodes -o json | jq '.items[].spec.taints'

给kubectl 设置别名为k,永久生效。

退出当前用户在进入就生效了。

1
2
3
vi .bashrc
alias k='kubectl'
source ~/.bashrc

删除master节点上的污点,删除后pod可以调度到master上。

1
kubectl taint nodes --all node-role.kubernetes.io/master-

使用kubectl中的强制删除命令

1
kubectl delete pod podName -n NAMESPACE --force --grace-period=0

自动补全

1
2
3
4
5
6
apt install bash-completion
// locate bash_completion
source /usr/share/bash-completion/bash_completion
source <(kubectl completion bash)
echo "source <(kubectl completion bash)" >> ~/.bashrc
source ~/.bashrc

查看所有pod

1
kubectl get po -owide  -A 

部署kubernetes

1. 安装 kubeadm, kubelet 和 kubectl

操作节点: 所有的master和slave节点(master,k8s-slave) 需要执行

1
2
3
4
5
apt-get install kubelet=1.22.2-00 kubectl=1.22.2-00 kubeadm=1.22.2-00
## 查看kubeadm 版本
kubeadm version
## 设置kubelet开机启动
systemctl enable kubelet 

2. 初始化配置文件

操作节点: 只在master节点(master)执行

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
kubeadm config print init-defaults > kubeadm.yaml

apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfigurationx
localAPIEndpoint:
  advertiseAddress: 10.4.7.60 #修改为master 的IP
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: node     # 删掉此行,删掉此行,删掉此行
  imagePullPolicy: IfNotPresent
  taints: null
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.22.2
networking:
  dnsDomain: cluster.local
  podSubnet: 10.244.0.0/16   #添加此行
  serviceSubnet: 10.96.0.0/12
scheduler: {}

3. 提前下载镜像

操作节点:只在master节点(master)执行

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 提前下载镜像到本地
kubeadm config images pull --config kubeadm.yaml

[config/images] Pulled registry.aliyuncs.com/google_containers/kube-apiserver:v1.22.2
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-controller-manager:v1.22.2
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-scheduler:v1.22.2
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-proxy:v1.22.2
[config/images] Pulled registry.aliyuncs.com/google_containers/pause:3.5
[config/images] Pulled registry.aliyuncs.com/google_containers/etcd:3.5.0-0
[config/images] Pulled registry.aliyuncs.com/google_containers/coredns:v1.8.4

4. 初始化master节点

操作节点:只在master节点(k8s-master)执行

1
kubeadm init --config kubeadm.yaml

若初始化成功后,最后会提示如下信息:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
...
To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.4.7.60:6443 --token abcdef.0123456789abcdef \
        --discovery-token-ca-cert-hash sha256:3a7987c9f5007ebac7980e6614281ee0e064c760c8db012471f9f662289cc9ce

接下来按照上述提示信息操作,配置kubectl客户端的认证

1
2
3
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

**⚠️注意:**此时使用 kubectl get nodes查看节点应该处于notReady状态,因为还未配置网络插件

若执行初始化过程中出错,根据错误信息调整后,执行kubeadm reset后再次执行init操作即可

5. 添加slave节点到集群中

操作节点:所有的slave节点(worker1`worker2`)需要执行 在每台slave节点,执行如下命令,该命令是在kubeadm init成功后提示信息中打印出来的,需要替换成实际init后打印出的命令。

1
2
kubeadm join 10.4.7.60:6443 --token abcdef.0123456789abcdef \
        --discovery-token-ca-cert-hash sha256:3a7987c9f5007ebac7980e6614281ee0e064c760c8db012471f9f662289cc9ce

6. 安装calico插件

操作节点:只在master节点(master)执行

方式一:直接安装(未使用!!!)

  • 下载资源文件

    1
    
    wget https://docs.projectcalico.org/manifests/calico-etcd.yaml
    
  • 修改配置

    • 注释掉文件的前22行

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      
      1 ---
        2 # Source: calico/templates/calico-etcd-secrets.yaml
        3 # The following contains k8s Secrets for use with a TLS enabled etcd cluster.
        4 # For information on populating Secrets, see http://kubernetes.io/docs/user-guide/secrets/
        5 #apiVersion: v1
        6 #kind: Secret
        7 #type: Opaque
        8 #metadata:
        9 #  name: calico-etcd-secrets
       10 #  namespace: kube-system
       11 #data:
       12   # Populate the following with etcd TLS configuration if desired, but leave blank if
       13   # not using TLS for etcd.
       14   # The keys below should be uncommented and the values populated with the base64
       15   # encoded contents of each file that would be associated with the TLS data.
       16   # Example command for encoding a file contents: cat <file> | base64 -w 0
       17   # etcd-key: null
       18   # etcd-cert: null
       19   # etcd-ca: null
       20 ---
       21 # Source: calico/templates/calico-config.yaml
       22 # This ConfigMap is used to configure a self-hosted Calico installation.
       23 kind: ConfigMap
       24 apiVersion: v1
       25 metadata:
       26   name: calico-config
       27   namespace: kube-system
       ...
      
    • 修改configmap

      注意30-35行,其中etcd_endpoints换成环境的etcd地址

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      
       23 kind: ConfigMap
       24 apiVersion: v1
       25 metadata:
       26   name: calico-config
       27   namespace: kube-system
       28 data:
       29   # Configure this with the location of your etcd cluster.
       30   etcd_endpoints: "https://10.4.7.60:2379"
       31   # If you're using TLS enabled etcd uncomment the following.
       32   # You must also populate the Secret below with these files.
       33   etcd_ca: "/calico-secrets/etcd-ca"   # "/calico-secrets/etcd-ca"
       34   etcd_cert: "/calico-secrets/etcd-cert" # "/calico-secrets/etcd-cert"
       35   etcd_key: "/calico-secrets/etcd-key"  # "/calico-secrets/etcd-key"
       36   # Typha is disabled.
       37   typha_service_name: "none"
       38   # Configure the backend to use.
       39   calico_backend: "bird"
      
    • 添加calico-node环境变量

      注意297-302行为新添加

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      
      285       containers:
      286         # Runs calico-node container on each Kubernetes node. This
      287         # container programs network policy and routes on each
      288         # host.
      289         - name: calico-node
      290           image: docker.io/calico/node:v3.20.0
      291           envFrom:
      292           - configMapRef:
      293               # Allow KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT to be overridden for eBPF mode.
      294               name: kubernetes-services-endpoint
      295               optional: true
      296           env:
      297             - name: KUBERNETES_SERVICE_HOST
      298               value: "10.4.7.60"
      299             - name: KUBERNETES_SERVICE_PORT
      300               value: "6443"
      301             - name: KUBERNETES_SERVICE_PORT_HTTPS
      302               value: "6443"
      303             # The location of the etcd cluster.
      304             - name: ETCD_ENDPOINTS
      305               valueFrom:
      306                 configMapKeyRef:
      307                   name: calico-config
      308                   key: etcd_endpoints
      309             # Location of the CA certificate for etcd.
      310             - name: ETCD_CA_CERT_FILE
      
    • 修改CIDR

      注意371-372行,value值为k8s集群初始化的pod-network-cidr

      1
      2
      3
      4
      5
      6
      
      370             # no effect. This should fall within `--cluster-cidr`.
      371             - name: CALICO_IPV4POOL_CIDR
      372               value: "10.244.0.0/16"
      373             # Disable file logging so `kubectl logs` works.
      374             - name: CALICO_DISABLE_FILE_LOGGING
      375               value: "true"
      

      创建secret

      1
      
      kubectl -n kube-system create secret generic calico-etcd-secrets --from-file=etcd-ca=/etc/kubernetes/pki/etcd/ca.crt --from-file=etcd-cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=etcd-key=/etc/kubernetes/pki/etcd/healthcheck-client.key
      

      创建calico资源清单

      1
      
      kubectl apply -f calico-etcd.yaml
      

​ 等待pod启动完成

1
2
3
4
5
6
kubectl -n kube-system get po 

NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-59db5cf8fd-fpzdq   1/1     Running   1          32m
calico-node-d2xq4                          1/1     Running   1          32m
calico-node-ppzjk     

calico插件包:

1
2
3
4
5
docker.io/calico/cni:v3.20.2

docker.io/calico/pod2daemon-flexvol:v3.20.2

docker.io/calico/node:v3.20.2

方式二:使用operator安装(已成功!!!!!!)

  • 安装operator

    1
    
    kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml
    
  • 等待operator pod安装启动完成

    1
    2
    3
    4
    
    kubectl -n tigera-operator get po
    
    NAME                               READY   STATUS    RESTARTS   AGE
    tigera-operator-698876cbb5-kfpb2   1/1     Running   0          38m
    

    镜像拉取比较慢,可以手动去节点docker pull拉取

  • 编辑calico配置

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    
    vim custom-resources.yaml
    
    apiVersion: operator.tigera.io/v1
    kind: Installation
    metadata:
      name: default
    spec:
      # Configures Calico networking.
      calicoNetwork:
        # Note: The ipPools section cannot be modified post-install.
        ipPools:
        - blockSize: 26
          cidr: 10.244.0.0/16        #修改和pod cidr一致
          encapsulation: VXLANCrossSubnet
          natOutgoing: Enabled
          nodeSelector: all()
    
    ---
    
    # This section configures the Calico API server.
    # For more information, see: https://docs.projectcalico.org/v3.20/reference/installation/api#operator.tigera.io/v1.APIServer
    apiVersion: operator.tigera.io/v1
    kind: APIServer
    metadata:
      name: default
    spec: {}
    
  • 创建calico配置

    1
    
    kubectl apply -f custom-resources.yaml
    
  • 等待operator自动创建calico的pod

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# operator会自动创建calico-apiserver和calico-system两个命名空间以及必要的pod,等待pod启动完成即可
kubectl get ns

NAME               STATUS   AGE
calico-apiserver   Active   13m
calico-system      Active   19m

kubectl -n calico-apiserver get po

NAME                                READY   STATUS    RESTARTS   AGE
calico-apiserver-554fbf9554-b6kzv   1/1     Running   0          13m

kubectl -n calico-system get po

NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-868b656ff4-hn6qv   1/1     Running   0          20m
calico-node-qqrp9                          1/1     Running   0          20m
calico-node-r45z2                          1/1     Running   0          20m
calico-typha-5b64cf4b48-vws5j              1/1     Running   0          20m
calico-typha-5b64cf4b48-w6wqf              1/1     Running   0          20m

7. 验证集群

操作节点: 在master节点(k8s-master)执行

1
kubectl get nodes  #观察集群节点是否全部Ready

创建测试nginx服务

1
kubectl run  test-nginx --image=nginx:alpine

查看pod是否创建成功,并访问pod ip测试是否可用

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
kubectl get po -o wide

NAME                          READY   STATUS    RESTARTS   AGE   IP           NODE         NOMINATED NODE   READINESS GATES
test-nginx-5bd8859b98-5nnnw   1/1     Running   0          9s    10.244.1.2   k8s-slave1   <none>           <none>
curl 10.244.1.2

...
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

8. 清理环境

如果你的集群安装过程中遇到了其他问题,我们可以使用下面的命令来进行重置:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 在全部集群节点执行
kubeadm reset
ifconfig cni0 down && ip link delete cni0
ifconfig flannel.1 down && ip link delete flannel.1
rm -rf /run/flannel/subnet.env
rm -rf /var/lib/cni/
mv /etc/kubernetes/ /tmp
mv /var/lib/etcd /tmp
mv ~/.kube /tmp
iptables -F
iptables -t nat -F
ipvsadm -C
ip link del kube-ipvs0
ip link del dummy0

设置apt源报错。换源

1
2
3
4
5
6
7
8
apt-get update && apt-get install -y apt-transport-https ca-certificates software-properties-common 
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - 
curl -fsSL http://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | apt-key add -
add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main"

apt-get update   
#若上步出现NO_PUBLICKEY问题,参考https://www.cnblogs.com/jiangzuo/p/13667011.html

未安装cni插件pod状态

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
 coredns-7f6cbbb7b8-c778v pod 不能启动因为未安装cni插件。 
  events:
  Warning  FailedScheduling  43s (x2 over 2m13s)  default-scheduler  0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
  
  root@master:~# k get po -A -owide 
NAMESPACE     NAME                             READY   STATUS    RESTARTS   AGE     IP          NODE      NOMINATED NODE   READINESS GATES
kube-system   coredns-7f6cbbb7b8-c778v         0/1     Pending   0          3m54s   <none>      <none>    <none>           <none>
kube-system   coredns-7f6cbbb7b8-px7cm         0/1     Pending   0          3m54s   <none>      <none>    <none>           <none>
kube-system   etcd-master                      1/1     Running   0          4m7s    10.4.7.70   master    <none>           <none>
kube-system   kube-apiserver-master            1/1     Running   0          4m7s    10.4.7.70   master    <none>           <none>
kube-system   kube-controller-manager-master   1/1     Running   0          4m7s    10.4.7.70   master    <none>           <none>
kube-system   kube-proxy-5w6c4                 1/1     Running   0          3m23s   10.4.7.71   worker1   <none>           <none>
kube-system   kube-proxy-6jnkr                 1/1     Running   0          3m20s   10.4.7.72   worker2   <none>           <none>
kube-system   kube-proxy-hcwtz                 1/1     Running   0          3m54s   10.4.7.70   master    <none>           <none>
kube-system   kube-scheduler-master            1/1     Running   0          4m8s    10.4.7.70   master    <none>           <none>
警告
本文最后更新于 January 2, 2022,文中内容可能已过时,请谨慎使用。