2022-07-06 10:39:00 Zhuang Xiaoyan


Kubernetes All kinds of problems in the process of operation , So I sort out what I met about k8s Related problems and solutions of

One 、k8s Restart error :The connection to the server was refused

1.1 The phenomenon

k8s Restart error 

# kubectl get pods
The connection to the server xxx:6443 was refused - did you specify the right host or port?

1.2 Troubleshoot problems

 According to the error report description , Connect kubelet Of 6443 Port rejected :
 View the status of the port 
 The display port is not started :

ss -antulp | grep :6443

 The port is kubelet Of api Listening port , Should be kubelet Boot failure , Try to restart , see kubelet state , Sure enough, the startup failed , Analysis log 

systemctl status kubelet
journalctl -xefu kubelet

 It is possible that some components failed to start , View container status , I found that none of the components started up , restart docker And related containers , Report errors 

[[email protected] ~]# docker ps  -a
CONTAINER ID        IMAGE                                               COMMAND                  CREATED             STATUS                      PORTS               NAMES
56f463b5684b        9b60aca1d818                                        "kube-controller-man…"   40 hours ago        Exited (2) 39 hours ago                         k8s_kube-controller-manager_kube-controller-manager-master_kube-system_8f99a56fb3eeae0c61283d6071bfb1f4_5
5043f1103f1f        aaefbfa906bd                                        "kube-scheduler --au…"   40 hours ago        Exited (2) 39 hours ago                         k8s_kube-scheduler_kube-scheduler-master_kube-system_285062c53852ebaf796eba8548d69e43_5
2d707069ab22        bfe3a36ebd25                                        "/coredns -conf /etc…"   41 hours ago        Exited (0) 39 hours ago                         k8s_coredns_coredns-6d56c8448f-mt7vz_kube-system_abc65488-0a54-4a1a-8e23-339f3f23f6d2_0
0dadfca20cb7        bfe3a36ebd25                                        "/coredns -conf /etc…"   41 hours ago        Exited (0) 39 hours ago                         k8s_coredns_coredns-6d56c8448f-hdtlf_kube-system_e1f90d02-77d0-4529-bea5-b4a72cdb4cf5_0
f25051c775cf        registry.aliyuncs.com/google_containers/pause:3.2   "/pause"                 41 hours ago        Exited (0) 39 hours ago                         k8s_POD_coredns-6d56c8448f-mt7vz_kube-system_abc65488-0a54-4a1a-8e23-339f3f23f6d2_0
b24a10712152        registry.aliyuncs.com/google_containers/pause:3.2   "/pause"                 41 hours ago        Exited (0) 39 hours ago                         k8s_POD_coredns-6d56c8448f-hdtlf_kube-system_e1f90d02-77d0-4529-bea5-b4a72cdb4cf5_0
fed8e33864c1        e708f4bb69e3                                        "/opt/bin/flanneld -…"   41 hours ago        Exited (137) 39 hours a

[[email protected] ~]# docker start $(docker ps -a | awk '{ print $1}' | tail -n +2)
Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice"
Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice"
Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice"
Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice"

1.3 Solution

 According to the error description , yes docker Configuration file configuration driver configuration error , You can comment out , restart docker, restart kubelet( Do not restart the container manually , Because there is a start-up sequence between containers , If you don't know , Manual restart is not recommended )

Two 、kubectl Command execution error :(Unable to connect to the server: x509: certificate signed by unknown authority )

2.1 The phenomenon

kubectl get nodes Error report in execution :

kubectl  get nodes
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")

2.2 Question why

/root/.kube/config Certificate authentication error , Use the correct certificate , May be kubeadm reset Certificate not deleted 

2.3 Solution

 Delete the original certificate and cache file 

#rm -rf  /root/.kube/*

 Copy master Just connect the certificate on the node to the directory 

3、 ... and 、k8s Cluster replacement runc:docker->containerd

3.1 be based on kubeadm Installed kubelet Solution

#  Use kubeadm View default configuration :

kubeadm config print init-defaults --component-configs KubeletConfiguration

 If you want to change the runtime from the default docker Switch to containerd, Then you need to modify the file :

vim /var/lib/kubelet/kubeadm-flags.env

 stay KUBELET_KUBEADM_ARGS Add the following parameters :

--container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock


KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.5"

3.2 Based on... Deployed directly using executable files kubelet Solution

 modify /usr/lib/systemd/system/kubelet.service  file , Add startup parameters :

--container-runtime=remote --container-runtime-endpoint=unix:///run/containerd/containerd.sock

#systemctl daemon-reload && systemctl restart kubelet

kubeadm Used drop-in How to manage kubelet service , This modification kubelet Launch parameters , Directly modifying /usr/lib/systemd/system/kubelet.service  The file will not work ,

Four 、coredns Access certificate error

4.1 The phenomenon

kubectl describe pod -n kube-system coredns-757569d647-qj8ts

Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "b7ea16c5b21e06069d1418b322e04bd2da482acdf21f863f47c96a80c551eab5" network for pod "coredns-757569d647-qj8ts": networkPlugin cni failed to set up pod "coredns-757569d647-qj8ts_kube-system" network: error getting ClusterInformation: Get https://[]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), failed to clean up sandbox container "b7ea16c5b21e06069d1418b322e04bd2da482acdf21f863f47c96a80c551eab5" network for pod "coredns-757569d647-qj8ts": networkPlugin cni failed to teardown pod "coredns-757569d647-qj8ts_kube-system" network: error getting ClusterInformation: Get https://[]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")]
 Various methods are used to compare coredns Generated ciphertext 

kubectl get secrets -n kube-system coredns-token-xc8kc -o yaml

 Discover and host /etc/kubernetes/admin.conf Recorded in the document ca Ciphertext is as like as two peas . Unable to access kube-apiserver Service for .

 Use ipvsadm -Ln The command check didn't find anything wrong .

 The final solution is , hold admin.conf Medium ca Ciphertext decryption .certificate-authority-data:  The following is copied into a text . such as ca.txt, And then use base64 -d ./ca.txt Command restore Certificate . Then save the certificate to /etc/pki/ca-trust/source/anchors/kube.pem in . modify coredns Of deploy Mount Directory . add to pki mount 

4.2 Solution

        - name: config-volume
          mountPath: /etc/coredns
          readOnly: true
        - name: etc-pki
          mountPath: /etc/pki
          readOnly: true
        - name: config-volume
            name: coredns
            - key: Corefile
              path: Corefile
        - hostPath:
            path: /etc/pki
            type: DirectoryOrCreate
          name: etc-pki


