从 v1.8 开始,资源使用情况的度量(如容器的 CPU 和内存使用)可以通过 Metrics API 获取。注意:
Metrics API 只可以查询当前的度量数据,并不保存历史数据
Metrics API URI 为 /apis/metrics.k8s.io/,在 k8s.io/metrics 维护
必须部署 metrics-server 才能使用该 API,metrics-server 通过调用 Kubelet Summary API 获取数据
下面记录了一下Kubernetes1.13安装metrics-server及填坑,这里我使用了yaml的方式安装
将metrics-server克隆到目标机器(可操作kubectl命令的机器)
git clone https://github.com/kubernetes-incubator/metrics-server.git
目前metrics-server的最新版本是v0.3.1,如果部署v0.2版本以上的进1.8+这个目录,如果是v0.2版本以下进入1.7这个目录,此次 我部署的是v0.3.1,所以进入1.8+目录
[root@localhost ~]# ls metrics-server/deploy/1.8+ aggregated-metrics-reader.yaml auth-delegator.yaml auth-reader.yaml metrics-apiservice.yaml metrics-server-deployment.yaml metrics-server-service.yaml resource-reader.yaml
安装
kubectl apply -f ./
验证安装是否成功,如下显示证明安装成功,接口已经通了
[root@localhost ~]# kubectl api-versions |grep metrics
metrics.k8s.io/v1beta1
[root@localhost ~]# kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq
{
"kind": "NodeMetricsList",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes"
},
使用kubectl top 查看节点信息
kubectl top node error: metrics not available yet
上述命令发现无法使用,下面开始排错
kubectl logs metrics-server-68d85f76bb-8mmw6 -n kube-system E0304 09:14:52.776119 1 reststorage.go:129] unable to fetch node metrics for node "node1": no metrics known for node E0304 09:15:03.649147 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:node1: unable to fetch metrics from Kubelet node1 (node1): Get https://node1:10250/stats/summary/: dial tcp: lookup node1 on 10.80.0.10 :53: no such host, unable to fully scrape metrics from source kubelet_summary:k8s-master: unable to fetch metrics from Kubelet k8s-master (k8s-master): Get https://k8s-master:10250/stats/summary/: dial tcp: lookup k8s-master on 10.80.0.10 :53: no such host, unable to fully scrape metrics from source kubelet_summary:k8s-node1: unable to fetch metrics from Kubelet k8s-node1 (k8s-node1): Get https://k8s-node1:10250/stats/summary/: dial tcp: lookup k8s-node1 on 10.80.0.10 :53: no such host]
通过日志发现两个问题:
1,coredns无法解析node1这个节点,导致metrics-server无法获取节点信息
2,10250是个https端口,需要证书认证
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.1
imagePullPolicy: IfNotPresent
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
volumeMounts:
- name: tmp-dir
mountPath: /tmp
编辑metrics-server-deployment.yaml,增加如下参数
command:
– /metrics-server
– –kubelet-insecure-tls #表示不验证客户端证书
– –kubelet-preferred-address-types=InternalIP #表示直接请求IP
kubectl apply -f metrics-server-deployment.yaml
再次使用kubectl top node查看
[root@localhost 1.8+]# kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% localhost.localdomain 244m 24% 1222Mi 70% node1 113m 11% 718Mi 19% node2 68m 6% 444Mi 12%
如上述操作执行后,coredns依旧会报错,下面提供两种解决办法
编辑coredns的ConfigMap文件
方法一:增加hosts信息
[root@localhost 1.8+]# cat metrics-server-deployment.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: metrics-server
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health
hosts {
$IP node1
fallthrough
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
方法二:获取外部dns信息
kind: ConfigMap
metadata: name: coredns
namespace: kube-system
data: Corefile: |
.:53 {
errors
log stdout
health
kubernetes cluster.local {
cidrs 10.3.0.0/24
}
proxy . /etc/resolv.conf
cache 30
}