从 v1.8 开始,资源使用情况的度量(如容器的 CPU 和内存使用)可以通过 Metrics API 获取。注意:
Metrics API 只可以查询当前的度量数据,并不保存历史数据
Metrics API URI 为 /apis/metrics.k8s.io/,在 k8s.io/metrics 维护
必须部署 metrics-server 才能使用该 API,metrics-server 通过调用 Kubelet Summary API 获取数据
下面记录了一下Kubernetes1.13安装metrics-server及填坑,这里我使用了yaml的方式安装
将metrics-server克隆到目标机器(可操作kubectl命令的机器)
git clone https://github.com/kubernetes-incubator/metrics-server.git
目前metrics-server的最新版本是v0.3.1,如果部署v0.2版本以上的进1.8+这个目录,如果是v0.2版本以下进入1.7这个目录,此次 我部署的是v0.3.1,所以进入1.8+目录
[root@localhost ~]# ls metrics-server/deploy/1.8+ aggregated-metrics-reader.yaml auth-delegator.yaml auth-reader.yaml metrics-apiservice.yaml metrics-server-deployment.yaml metrics-server-service.yaml resource-reader.yaml
安装
kubectl apply -f ./
验证安装是否成功,如下显示证明安装成功,接口已经通了
[root@localhost ~]# kubectl api-versions |grep metrics metrics.k8s.io/v1beta1 [root@localhost ~]# kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq { "kind": "NodeMetricsList", "apiVersion": "metrics.k8s.io/v1beta1", "metadata": { "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes" },
使用kubectl top 查看节点信息
kubectl top node error: metrics not available yet
上述命令发现无法使用,下面开始排错
kubectl logs metrics-server-68d85f76bb-8mmw6 -n kube-system E0304 09:14:52.776119 1 reststorage.go:129] unable to fetch node metrics for node "node1": no metrics known for node E0304 09:15:03.649147 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:node1: unable to fetch metrics from Kubelet node1 (node1): Get https://node1:10250/stats/summary/: dial tcp: lookup node1 on 10.80.0.10 :53: no such host, unable to fully scrape metrics from source kubelet_summary:k8s-master: unable to fetch metrics from Kubelet k8s-master (k8s-master): Get https://k8s-master:10250/stats/summary/: dial tcp: lookup k8s-master on 10.80.0.10 :53: no such host, unable to fully scrape metrics from source kubelet_summary:k8s-node1: unable to fetch metrics from Kubelet k8s-node1 (k8s-node1): Get https://k8s-node1:10250/stats/summary/: dial tcp: lookup k8s-node1 on 10.80.0.10 :53: no such host]
通过日志发现两个问题:
1,coredns无法解析node1这个节点,导致metrics-server无法获取节点信息
2,10250是个https端口,需要证书认证
--- apiVersion: v1 kind: ServiceAccount metadata: name: metrics-server namespace: kube-system --- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: metrics-server namespace: kube-system labels: k8s-app: metrics-server spec: selector: matchLabels: k8s-app: metrics-server template: metadata: name: metrics-server labels: k8s-app: metrics-server spec: serviceAccountName: metrics-server volumes: # mount in tmp so we can safely use from-scratch images and/or read-only containers - name: tmp-dir emptyDir: {} containers: - name: metrics-server image: k8s.gcr.io/metrics-server-amd64:v0.3.1 imagePullPolicy: IfNotPresent command: - /metrics-server - --kubelet-insecure-tls - --kubelet-preferred-address-types=InternalIP volumeMounts: - name: tmp-dir mountPath: /tmp
编辑metrics-server-deployment.yaml,增加如下参数
command:
– /metrics-server
– –kubelet-insecure-tls #表示不验证客户端证书
– –kubelet-preferred-address-types=InternalIP #表示直接请求IP
kubectl apply -f metrics-server-deployment.yaml
再次使用kubectl top node查看
[root@localhost 1.8+]# kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% localhost.localdomain 244m 24% 1222Mi 70% node1 113m 11% 718Mi 19% node2 68m 6% 444Mi 12%
如上述操作执行后,coredns依旧会报错,下面提供两种解决办法
编辑coredns的ConfigMap文件
方法一:增加hosts信息
[root@localhost 1.8+]# cat metrics-server-deployment.yaml --- apiVersion: v1 kind: ServiceAccount metadata: name: metrics-server namespace: kube-system --- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: metrics-server # Please edit the object below. Lines beginning with a '#' will be ignored, # and an empty file will abort the edit. If an error occurs while saving this file will be # reopened with the relevant failures. # apiVersion: v1 data: Corefile: | .:53 { errors health hosts { $IP node1 fallthrough } kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure upstream fallthrough in-addr.arpa ip6.arpa }
方法二:获取外部dns信息
kind: ConfigMap metadata: name: coredns namespace: kube-system data: Corefile: | .:53 { errors log stdout health kubernetes cluster.local { cidrs 10.3.0.0/24 } proxy . /etc/resolv.conf cache 30 }