Today, after deploying metrics server, I checked the pod log and found a pile of errors:
The error information is as follows:
]# kubectl logs -f -n kube-system metrics-server-d8669575f-xl6mw I1202 09:09:31.217954 1 serving.go:312] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key) I1202 09:09:37.725863 1 secure_serving.go:116] Serving securely on [::]:443 E1202 09:09:49.807117 1 reststorage.go:135] unable to fetch node metrics for node "master": no metrics known for node E1202 09:09:49.807185 1 reststorage.go:135] unable to fetch node metrics for node "node1": no metrics known for node E1202 09:09:49.807202 1 reststorage.go:135] unable to fetch node metrics for node "node2": no metrics known for node E1202 09:09:50.940606 1 reststorage.go:160] unable to fetch pod metrics for pod linux40/nginx-deployment-7d8599fbc9-68pf8: no metrics known for pod E1202 09:09:53.825493 1 reststorage.go:135] unable to fetch node metrics for node "node1": no metrics known for node E1202 09:09:53.825540 1 reststorage.go:135] unable to fetch node metrics for node "node2": no metrics known for node E1202 09:09:53.825551 1 reststorage.go:135] unable to fetch node metrics for node "master": no metrics known for node E1202 09:10:05.976306 1 reststorage.go:160] unable to fetch pod metrics for pod linux40/nginx-deployment-7d8599fbc9-68pf8: no metrics known for pod E1202 09:10:21.291923 1 reststorage.go:160] unable to fetch pod metrics for pod linux40/nginx-deployment-7d8599fbc9-68pf8: no metrics known for pod E1202 09:10:31.601208 1 reststorage.go:135] unable to fetch node metrics for node "master": no metrics known for node E1202 09:10:31.601330 1 reststorage.go:135] unable to fetch node metrics for node "node1": no metrics known for node E1202 09:10:31.601353 1 reststorage.go:135] unable to fetch node metrics for node "node2": no metrics known for node E1202 09:10:31.610963 1 reststorage.go:160] unable to fetch pod metrics for pod kube-system/kube-flannel-ds-64qdh: no metrics known for pod E1202 09:10:31.611032 1 reststorage.go:160] unable to fetch pod metrics for pod linux40/magedu-tomcat-app1-deployment-6cd664c5bd-wprjb: no metrics known for pod
No valid error message was found when viewing the details of pod
]# kubectl describe pod metrics-server-6c97c89fd5-j2rql -n kube-system Name: metrics-server-6c97c89fd5-j2rql Namespace: kube-system Priority: 2000000000 Priority Class Name: system-cluster-critical Node: node2/192.168.64.112 Start Time: Thu, 02 Dec 2021 16:50:50 +0800 Labels: k8s-app=metrics-server pod-template-hash=6c97c89fd5 Annotations: <none> Status: Running IP: 10.244.2.61 IPs: IP: 10.244.2.61 Controlled By: ReplicaSet/metrics-server-6c97c89fd5 Containers: metrics-server: Container ID: docker://eac4a2db02ca75315047eb778b7d3e1d7543d10ed6d33b4b1eddb006f824e34e Image: mirrorgooglecontainers/metrics-server-amd64:v0.3.6 Image ID: docker://sha256:9dd718864ce61b4c0805eaf75f87b95302960e65d4857cb8b6591864394be55b Port: 4443/TCP Host Port: 0/TCP Args: --cert-dir=/tmp --secure-port=4443 --kubelet-preferred-address-types=InternalIP --kubelet-use-node-status-port --kubelet-insecure-tls State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 2 Started: Thu, 02 Dec 2021 16:51:10 +0800 Finished: Thu, 02 Dec 2021 16:51:11 +0800 Ready: False Restart Count: 2 Liveness: http-get https://:https/livez delay=0s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get https://:https/readyz delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /tmp from tmp-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from metrics-server-token-4xrbc (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: tmp-dir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> metrics-server-token-4xrbc: Type: Secret (a volume populated by a Secret) SecretName: metrics-server-token-4xrbc Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned kube-system/metrics-server-6c97c89fd5-j2rql to node2 Normal Pulled 16s (x3 over 34s) kubelet, node2 Container image "mirrorgooglecontainers/metrics-server-amd64:v0.3.6" already present on machine Normal Created 16s (x3 over 34s) kubelet, node2 Created container metrics-server Normal Started 16s (x3 over 34s) kubelet, node2 Started container metrics-server Warning BackOff 8s (x5 over 32s) kubelet, node2 Back-off restarting failed container
Because the CA certificate does not sign the IP of each node when deploying the cluster, when the metrics server requests through IP, it will prompt that the signed certificate does not have a corresponding IP (error: x509: cannot validate certificate for 192.168.33.11 because it doesn’t contain any IP SANS),
We can add a — kubelet secure TLS parameter to skip certificate verification:
apiVersion: apps/v1 kind: Deployment metadata: name: metrics-server namespace: kube-system labels: k8s-app: metrics-server spec: selector: matchLabels: k8s-app: metrics-server template: metadata: name: metrics-server labels: k8s-app: metrics-server spec: serviceAccountName: metrics-server volumes: - name: tmp-dir emptyDir: {} containers: - name: metrics-server image: mirrorgooglecontainers/metrics-server-amd64:v0.3.6 imagePullPolicy: IfNotPresent command: - /metrics-server - --kubelet-insecure-tls //skip tls - --kubelet-preferred-address-types=InternalIP //Using internal IP communication volumeMounts: - name: tmp-dir mountPath: /tmp resources: limits: cpu: 300m memory: 200Mi requests: cpu: 200m memory: 100Mi