08/25/2020

Kubernetes Monitoring – Prometheus 설치

Table of Contents

1 Prometheus
2 metric-server, metric-state-server
3 kube-state-metrics 설치
4 Prometheus 설치
5 마치며

이 글은 Kubernetes 에 모니터링을 위해 Prometheus 설치에 대한 것이다. Kubernetes 는 여러가지 오브젝트들과 컴포넌트들로 구성 된다. 각각에 구성요소들은 컴퓨터의 자원을 사용하게 되는데 이러한 자원 사용량은 metric-server 를 설치함으로써 CLI 를 통해서 실시간으로 모니터링이 가능하다.

하지만 CLI 를 하나하나 다 치면서 하는데에는 한계가 있어, 각종 구성요소들의 자원 사용량을 데이터베이스로 저장하고 이것을 기반으로 그래프로 보여주는 것이 훨씬 좋을 것이다. 특히나 각종 자원의 모니터링은 시인성이 아주 좋아야 하는게 핵심이기도 한데 Kubernetes 는 이를 위해서 Prometheus 를 공식적(?) 으로 밀고 있다.

Prometheus

프로메테우스(Prometheus) 는 한 회사에서 만들기 시작해 오픈소스화 되었으며 현재는 Kubernetes 를 제작지원하는 CNCF(Cloud Native Computing Foundation) 의 두번째 프로젝트다. 이렇게 되면 당연히 Kubernetes 에서 Prometheus 는 거의 표준 모니터링 시스템이라고 봐야 한다.

인터넷을 검색해보면 위와같은 그림을 볼 수 있는데, 처음 접하는 사람들은 당연히 이게 뭔지 잘 모른다. Prometheus 를 이루는 기본 요소는 단 두개다.

Prometheus Server
Node Exporter

간단하게 Server/Client 구조를 가지는 것인데, 특이한 것은 Prometheus Server 가 데이터를 가지고 오는 방법이다. Node Exporter 는 어떤 정보를 수집할지에 대해서 정의를 해두고 정해진 시간마다 정보를 수집한다. 그리고네트워크를 통해서 Prometheus Server 가 가지고 갈 수 있도록 네트워크 데몬으로 존재한다.

Prometheus Server 는 Node Exporter 목록을 가지고 있으면서 주기적으로 접속해서 데이터를 긁어온다. 이렇게 긁어온 데이터는 Prometheus 가 TimeSeries 데이터베이스로 저장하게 된다.

위 그림에 왼쪽 아래쪽에 KUBERNETES NODES 로 향하는 화살표에 Pull 이라고 적혀있는데, 이것이 Prometheus 가 Node Exporter 에 접속해 데이터를 긁어오는 것을 말하는 것이다.

metric-server, metric-state-server

Kubernetes 는 자신의 상태를 자동으로 모니터링해서 보여주지 않는다. 이것도 관리자가 설치해야 한다. 이를 위해서 인터넷을 검색해보면 다음과 같이 두가지 정도가 나온다.

metric-server
kube-state-metrics

metric-server 는 Kubernetes 자신에 대한 자원 모니터링으로 Node, Pod 등의 자원을 모니터링 한다. 이것은 Resource Metrics API 를 구현한 것으로 보면 된다. 이를 설치하면 HPA(Horizontal Pod Autoscaler), Scheduler 등에서 활용하게 된다.

HPA, Scheduler 등은 실시간으로 CPU, Memory 등에 변화를 감지해야 하며 이를 통해서 액션을 취해야하기 때문에 Kubernetes 의 Health 상태도 더블어 체크 된다. 자원이 Health 상태가 되지 않는다면 Kubernetes 의 상태를 Unknown 으로 바꾸놓을 가능성이 있다. 그리고 이 상태가 지속되면 Terminated 시켜버리고 다른 오브젝트를 올릴려고 할 것이다.

문제는 Prometheus 에서 이들에 대해서 스크랩을 할 수 없다는데 있다.

kube-state-metrics 는 Kubernetes 자원의 Health 상태를 전혀 고려하지 않는다. 그냥 자원에 대한 정보를 뿌려줄뿐이고 어떤 원인으로 인해서 자원 모니터링이 안된다면 그냥 정보를 안 뿌려준다. 따라서 이렇게 되며 이것을 가지고 HPA, Scheduler 에서 사용하기가 불가능해진다.

대신, kube-state-metrics 는 Prometheus 에서 스크랩이 가능하다. 따라서 Prometheus 를 이용한 모니터링을 구축하기 위해서는 반드시 kube-state-metrics 를 설치해주는 걸 권장한다.

kube-state-metrics 설치

Prometheus 설치의 시작은 kube-state-metrics 설치로부터 시작 된다. 이것이 없으면 Prometheus 설치/운영을 못하는 것은 아니지만 Kubernetes 의 컴포넌트의 자세한 정보를 가지고 올 수 없다.

설치는 kube-state-metrics 의 GitHub 를 이용하면 간단하게 해결된다. 한 가지 주의해야 할 것은 Kubernetes 버전을 확인해 지원되는 버전을 설치해야 한다는 것이다.

소스코드를 clone 한 후에 다음과 같이 설치를 해준다. Kubernetes 가 최신버전이라 별도의 작업은 필요 없다.

$ git clone https://github.com/kubernetes/kube-state-metrics.git
$ cd kube-state-metrics
$ kubectl apply -f examples/standard
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
serviceaccount/kube-state-metrics created
service/kube-state-metrics created
$ kubectl get po -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-578894d4cd-ks46s   1/1     Running   1          18h
calico-node-b5zb5                          1/1     Running   1          18h
calico-node-kvrjr                          1/1     Running   1          18h
coredns-66bff467f8-dp6m5                   1/1     Running   1          18h
coredns-66bff467f8-gqq6m                   1/1     Running   1          18h
etcd-kmaster                               1/1     Running   1          18h
kube-apiserver-kmaster                     1/1     Running   1          18h
kube-controller-manager-kmaster            1/1     Running   1          18h
kube-proxy-hcfkw                           1/1     Running   1          18h
kube-proxy-ssdbw                           1/1     Running   1          18h
kube-scheduler-kmaster                     1/1     Running   1          18h
kube-state-metrics-5c5cb55b4-gkh4m         1/1     Running   0          103s
metrics-server-5f4ffd464c-7dxxr            1/1     Running   1          18h

$ git clone https://github.com/kubernetes/kube-state-metrics.git

$ cd kube-state-metrics

$ kubectl apply -f examples/standard

clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created

clusterrole.rbac.authorization.k8s.io/kube-state-metrics created

deployment.apps/kube-state-metrics created

serviceaccount/kube-state-metrics created

service/kube-state-metrics created

$ kubectl get po -n kube-system

NAME READY STATUS RESTARTS AGE

calico-kube-controllers-578894d4cd-ks46s 1/1 Running 1 18h

calico-node-b5zb5 1/1 Running 1 18h

calico-node-kvrjr 1/1 Running 1 18h

coredns-66bff467f8-dp6m5 1/1 Running 1 18h

coredns-66bff467f8-gqq6m 1/1 Running 1 18h

etcd-kmaster 1/1 Running 1 18h

kube-apiserver-kmaster 1/1 Running 1 18h

kube-controller-manager-kmaster 1/1 Running 1 18h

kube-proxy-hcfkw 1/1 Running 1 18h

kube-proxy-ssdbw 1/1 Running 1 18h

kube-scheduler-kmaster 1/1 Running 1 18h

kube-state-metrics-5c5cb55b4-gkh4m 1/1 Running 0 103s

metrics-server-5f4ffd464c-7dxxr 1/1 Running 1 18h

정상적으로 설치가 되었다면 kube-system 네임스페이스에 pod 가 보인다.

Prometheus 설치

먼저 Prometheus 설치하는 방법에는 다양한 방법이 존재한다. helm 을 이용해도 되고 Prometheus Operator 를 이용해도 된다. 하지만 수동으로 설치를 해봐야 Kubernetes 에서 프로그램 배포를 위해서 필요한 것들이 무엇인지를 알게 된다. 따라서 여기서는 모든걸 수동으로 한번 해보도록 하겠다.

계획

일단, 계획이 필요한데 뭔가 대단한게 아니라 Prometheus 를 운영하는데 필요한 설정과 파일들을 어떻게 할 것인가에 대한 것이다.

먼저, Prometheus 를 위한 네임스페이스를 별도로 만든다. Kubernetes 는 네임스페이스 개념이 존재한다. 일종의 격리되는 그룹같은 것인데, Prometheus 를 위한 네임스페이스로 montoring 을 만들 것이다.

Prometheus 는 설정 파일과 데이터 저장 파일 크게 두가지로 나뉜다. 설정 파일은 그야말로 메트릭 수집과 운영에 대한 것으로 text 파일이며 데이터 저장 파일은 수집된 정보를 저장하는 TimeSeries 데이터 파일이다.

prometheus.yaml
tdb

Kubernetes 는 배포되는 프로그램의 각종 설정들을 pod 자체에 가지고 있게 하지 않는다. 이것은 pod 가 비상태 운영을 기본원칙으로 하기 때문이다. pod 는 언제든지 사라지고 만들어지고 해야 한다. pod 내에 배포되는 프로그램들의 설정파일을 pod 에 내장하게 되면 뭐하나 바꿀때마다 pod 를 손대야 한다.

ConfigMap 은 이렇게 Kubernetes 의 비상태 운영 원칙을 위한 것이다. 설정 맵은 그야말로 다양한 설정들의 내용을 Kubernetes 가 대신 가지고 있게 해준다. 예를들어 Prometheus 의 설정파일인 prometheus.yaml 은 text 파일인데, 이것을 ConfigMap 에 등록하고 Pod 생성시에 이것을 가지고와서 파일로 굽게 된다. 설정파일을 Kubernetest 가 가지고 있기 때문에 설정 편집을 위해서 pod 에 손을 댈 일은 없다.

데이터 저장 파일은 영구적인 저장 파일이다. Pod 에 내장되어 있다면 Pod 가 재시작될때마다 데이터 저장 파일은 삭제되었다 재 생성되는 과정을 반복할 것이다. 하지만 Pod 의 재시작과는 상관없이 데이터는 영구적으로 보존 되어야 하기 때문에 Pod 에 영구볼륨(Persistant Volume) 을 붙여서 데이터 저장 파일을 Pod 와 분리할 것이다.

추가로 Prometheus 나 exporter 들이 Kubernetes 의 API 에 접근할 수 있도록 계정과 권한도 부여할 것이다.

monitoring 네임스페이스 생성

Kubernetes 의 네임스페이스 생성은 아주 간단다. CLI 명령어로 간단하게 생성할 수 있다.

$ kubectl create ns monitoring
namespace/monitoring created

1 2	$ kubectl create ns monitoring namespace/monitoring created

ClusterRole 생성

Kubernetes 의 자원에 대한 접근은 API 를 통해서 이루어 진다. API 접근은 Kubernetes 의 보안 부분으로 엄격하게 통제되고 있는데, Prometheus 가 이 API 에 접근해 자원에 대한 메트릭을 생성해야 하기 때문에 이에 대한 접근 권한이 필요하게 된다.

Kubernetes 의 API 에 대한 권한은 Role 개념으로 정립되어 있으며 이 Role 에 기반한 퍼미션을 부여하고 이것을 다시 특정한 계정과 바인딩 함으로써 접근제어는 마무리 된다.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
  namespace: monitoring
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
  namespace: monitoring
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitoring

apiVersion: rbac.authorization.k8s.io/v1

kind: ClusterRole

metadata:

namespace: monitoring

rules:

- apiGroups: [""]

resources:

- nodes

- nodes/proxy

- services

- endpoints

- pods

verbs: ["get", "list", "watch"]

- apiGroups:

- extensions

resources:

- ingresses

verbs: ["get", "list", "watch"]

- nonResourceURLs: ["/metrics"]

verbs: ["get"]

---

apiVersion: v1

kind: ServiceAccount

metadata:

namespace: monitoring

---

apiVersion: rbac.authorization.k8s.io/v1

kind: ClusterRoleBinding

metadata:

namespace: monitoring

roleRef:

apiGroup: rbac.authorization.k8s.io

kind: ClusterRole

subjects:

- kind: ServiceAccount

namespace: monitoring

위와같이 ClusterRole 을 작성한 후에 적용해 준다.

$ kubectl apply -f prometheus-cluster-role.yaml
clusterrole.rbac.authorization.k8s.io/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created

$ kubectl apply -f prometheus-cluster-role.yaml

clusterrole.rbac.authorization.k8s.io/prometheus created

clusterrolebinding.rbac.authorization.k8s.io/prometheus created

Prometheus 를 위한 영구볼륨 생성

Kubernetes 에서 영구 볼륨은 Persistant Volume 과 Claim 개념으로 다루어 진다. PV 는 영구저장소 미디어 특성을 고려한 일종의 드라이버 개념이고 Claim 은 영구저장소 미디어와는 상관없는 추상적 개층으로 PV 와 연결되어 작동된다.

Prometheus 는 데이터베이스 파일을 영구적으로 저장할 필요성이 있기 때문에 Kubernetes 의 PV, PVC 를 이용할 필요가 있다.

먼저 다음과 같이 PV 를 만든다.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-pv
  namespace: monitoring
  labels:
    type: local
    app: prometheus
spec:
  capacity:
    storage: 2Gi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: manual
  hostPath:
    path: /opt/prometheus
    type: DirectoryOrCreate
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - knode

apiVersion: v1

kind: PersistentVolume

metadata:

namespace: monitoring

labels:

type: local

app: prometheus

spec:

capacity:

storage: 2Gi

accessModes:

- ReadWriteOnce

persistentVolumeReclaimPolicy: Retain

storageClassName: manual

hostPath:

path: /opt/prometheus

type: DirectoryOrCreate

nodeAffinity:

required:

nodeSelectorTerms:

- matchExpressions:

- key: kubernetes.io/hostname

operator: In

values:

- knode

중요한 것은 영구저장소를 이용할 Kubernetes 의 Node 를 지정해 주었고, 사용할 저장소는 디렉토리이며 없으면 생성하도록 했다. 거기다 ReclaimPolicy 를 Retain 으로 함으로써 PVC 와 연결이 해제되더라도 PV 는 그대로 데이터를 보존하도록 했다. 사용할 디렉토리 경로는 /opt/prometheus 로 지정 했다.

다음과 같이 생성해 준다.

$ kubectl apply -f prometheus-pv.yaml 
persistentvolume/prometheus-pv created
$ kubectl get pv -n monitoring
NAME            CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
prometheus-pv   2Gi        RWO            Retain           Available           manual                  64s

$ kubectl apply -f prometheus-pv.yaml

persistentvolume/prometheus-pv created

$ kubectl get pv -n monitoring

NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE

prometheus-pv 2Gi RWO Retain Available manual 64s

이제 PV 를 가져다 쓸 PVC 를 생성해야 한다.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-pvc
  namespace: monitoring
  labels:
    type: local
    app: prometheus
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 1Gi
  selector:
    matchLabels:
      app: prometheus
      type: local

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

namespace: monitoring

labels:

type: local

app: prometheus

spec:

storageClassName: manual

accessModes:

- ReadWriteOnce

volumeMode: Filesystem

resources:

requests:

storage: 1Gi

selector:

matchLabels:

app: prometheus

type: local

PVC 를 생성할 때에는 PV 에서 가져다 쓸 용량도 함께 기재한다. 당연한 이야기지만 PV 보다 많은 용량은 허용되지 않는다. 한가지 주의해야 할 것은 PV 의 Label 과 StorageClass 를 맞춰야 한다. 이게 어긋날 경우에는 PVC 가 제대로 연결되지 않는다.

다음과 같이 생성해 준다.

$ kubectl apply -f prometheus-pvc.yaml 
persistentvolumeclaim/prometheus-pvc created
$ kubectl get pv,pvc -n monitoring
NAME                             CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                       STORAGECLASS   REASON   AGE
persistentvolume/prometheus-pv   2Gi        RWO            Retain           Bound    monitoring/prometheus-pvc   manual                  4m36s

NAME                                   STATUS   VOLUME          CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/prometheus-pvc   Bound    prometheus-pv   2Gi        RWO            manual         11s

$ kubectl apply -f prometheus-pvc.yaml

persistentvolumeclaim/prometheus-pvc created

$ kubectl get pv,pvc -n monitoring

NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE

persistentvolume/prometheus-pv 2Gi RWO Retain Bound monitoring/prometheus-pvc manual 4m36s

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE

persistentvolumeclaim/prometheus-pvc Bound prometheus-pv 2Gi RWO manual 11s

정상적으로 생성이 되면 위와같이 PVC 의 Status 가 Bound 로 표시 된다.

Prometheus 를 위한 ConfigMap 생성

앞에서도 이야기 했지만, Kubernetes 에서 Pod 는 비상태여야 한다. Pod 는 Replica 정책에 의해서 항상 특정한 수를 유지하도록 하는데, 간혹 Pod 가 알수 없는 이유로 재생성되는 경우가 발생하는데 이럴때에 Pod 가 상태를 가지는 파일을 가지고 있게 되면 데이터 손실이 발생하게 된다.

그래서 각종 서버에 대한 설정파일들을 Kubernetes 에 ConfigMap 으로 등록하고 Pod가 생성될때에 이를 가져다 설정파일로 만들도록 권장하고 있다.

ConfigMap 을 만드는 방법에는 크게 두가지로 나뉜다. 첫번째는 ConfigMap 메니페스트 파일에 설정 내용을 모두 함께 기술하는 것과 서버 설정파일을 특정한 디렉토리에 만든 다음에 kubectl 명령어로 ConfigMap 을 만드는 것이다. 어떤 방법을 쓰던지 결과는 동일하다.

여기선 설정 파일을 특정 디렉토리 별도로 만들어서 kubectl 명령어를 이용해 ConfigMap 을 작성하는 것으로 한다.

Prometheus 를 위한 설정 파일은 prometheus.yml 파일이다. 단 하나의 파일로 모두 가능하지만 rules 같은 경우에는 별도의 파일로 작성해 include 문을 작성해 연결 시킬 수 있다. 여기서는 두개의 파일로 제작했다.

$ vim prometheus.rules
groups:
- name: container memory alert
  rules:
  - alert: container memory usage rate is very high( > 55%)
    expr: sum(container_memory_working_set_bytes{pod!="", name=""}) / sum (kube_node_status_allocatable_memory_bytes) * 100 > 55
    for: 1m
    labels:
      severity: fatal
    annotations:
      summary: High Memory Usage on {{ $labels.instance }}
      identifier: "{{ $labels.instance }}"
      description: "{{ $labels.job }} Memory Usage: {{ $value }}"
- name: container CPU alert
  rules:
  - alert: container CPU usage rate is very high( > 10%)
    expr: sum (rate (container_cpu_usage_seconds_total{pod!=""}[1m])) / sum (machine_cpu_cores) * 100 > 10
    for: 1m
    labels:
      severity: fatal
    annotations:
      summary: High Cpu Usage

$ vim prometheus.rules

groups:

- name: container memory alert

rules:

- alert: container memory usage rate is very high( > 55%)

expr: sum(container_memory_working_set_bytes{pod!="", name=""}) / sum (kube_node_status_allocatable_memory_bytes) * 100 > 55

for: 1m

labels:

severity: fatal

annotations:

summary: High Memory Usage on {{ $labels.instance }}

identifier: "{{ $labels.instance }}"

description: "{{ $labels.job }} Memory Usage: {{ $value }}"

- name: container CPU alert

rules:

- alert: container CPU usage rate is very high( > 10%)

expr: sum (rate (container_cpu_usage_seconds_total{pod!=""}[1m])) / sum (machine_cpu_cores) * 100 > 10

for: 1m

labels:

severity: fatal

annotations:

summary: High Cpu Usage

prometheus.rules 파일에 rules 만 정의했다. 내용을 보면 Prometheus 에 알람을 보내는 Rule 을 적은 것이다. 이 파일은 prometheus.yml 파일에 include 문을 통해서 포함 된다.

$ vim prometheus.yml
global:
  scrape_interval: 10s
  evaluation_interval: 10s
rule_files:
  - /etc/prometheus/prometheus.rules
alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets:
      - "alertmanager.monitoring.svc:9093"

scrape_configs:
  - job_name: 'kubernetes-apiservers'

    kubernetes_sd_configs:
    - role: endpoints
    scheme: https

    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

    relabel_configs:
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
      action: keep
      regex: default;kubernetes;https

  - job_name: 'kubernetes-nodes'

    scheme: https

    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

    kubernetes_sd_configs:
    - role: node

    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    - target_label: __address__
      replacement: kubernetes.default.svc:443
    - source_labels: [__meta_kubernetes_node_name]
      regex: (.+)
      target_label: __metrics_path__
      replacement: /api/v1/nodes/${1}/proxy/metrics


  - job_name: 'kubernetes-pods'

    kubernetes_sd_configs:
    - role: pod

    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      regex: ([^:]+)(?::\d+)?;(\d+)
      replacement: $1:$2
      target_label: __address__
    - action: labelmap
      regex: __meta_kubernetes_pod_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_pod_name]
      action: replace
      target_label: kubernetes_pod_name

  - job_name: 'kube-state-metrics'
    static_configs:
      - targets: ['kube-state-metrics.kube-system.svc:8080']

  - job_name: 'kubernetes-cadvisor'

    scheme: https

    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

    kubernetes_sd_configs:
    - role: node

    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    - target_label: __address__
      replacement: kubernetes.default.svc:443
    - source_labels: [__meta_kubernetes_node_name]
      regex: (.+)
      target_label: __metrics_path__
      replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

  - job_name: 'kubernetes-service-endpoints'

    kubernetes_sd_configs:
    - role: endpoints

    relabel_configs:
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
      action: replace
      target_label: __scheme__
      regex: (https?)
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
      action: replace
      target_label: __address__
      regex: ([^:]+)(?::\d+)?;(\d+)
      replacement: $1:$2
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
      action: replace
      target_label: kubernetes_name

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

$ vim prometheus.yml

global:

scrape_interval: 10s

evaluation_interval: 10s

rule_files:

- /etc/prometheus/prometheus.rules

alerting:

alertmanagers:

- scheme: http

static_configs:

- targets:

- "alertmanager.monitoring.svc:9093"

scrape_configs:

- job_name: 'kubernetes-apiservers'

kubernetes_sd_configs:

- role: endpoints

scheme: https

tls_config:

ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

relabel_configs:

- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]

action: keep

regex: default;kubernetes;https

- job_name: 'kubernetes-nodes'

scheme: https

tls_config:

ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

kubernetes_sd_configs:

- role: node

relabel_configs:

- action: labelmap

regex: __meta_kubernetes_node_label_(.+)

- target_label: __address__

replacement: kubernetes.default.svc:443

- source_labels: [__meta_kubernetes_node_name]

regex: (.+)

target_label: __metrics_path__

replacement: /api/v1/nodes/${1}/proxy/metrics

- job_name: 'kubernetes-pods'

kubernetes_sd_configs:

- role: pod

relabel_configs:

- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]

action: keep

regex: true

- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]

action: replace

target_label: __metrics_path__

regex: (.+)

- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]

action: replace

regex: ([^:]+)(?::\d+)?;(\d+)

replacement: $1:$2

target_label: __address__

- action: labelmap

regex: __meta_kubernetes_pod_label_(.+)

- source_labels: [__meta_kubernetes_namespace]

action: replace

target_label: kubernetes_namespace

- source_labels: [__meta_kubernetes_pod_name]

action: replace

target_label: kubernetes_pod_name

- job_name: 'kube-state-metrics'

static_configs:

- targets: ['kube-state-metrics.kube-system.svc:8080']

- job_name: 'kubernetes-cadvisor'

scheme: https

tls_config:

ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

kubernetes_sd_configs:

- role: node

relabel_configs:

- action: labelmap

regex: __meta_kubernetes_node_label_(.+)

- target_label: __address__

replacement: kubernetes.default.svc:443

- source_labels: [__meta_kubernetes_node_name]

regex: (.+)

target_label: __metrics_path__

replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

- job_name: 'kubernetes-service-endpoints'

kubernetes_sd_configs:

- role: endpoints

relabel_configs:

- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]

action: keep

regex: true

- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]

action: replace

target_label: __scheme__

regex: (https?)

- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]

action: replace

target_label: __metrics_path__

regex: (.+)

- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]

action: replace

target_label: __address__

regex: ([^:]+)(?::\d+)?;(\d+)

replacement: $1:$2

- action: labelmap

regex: __meta_kubernetes_service_label_(.+)

- source_labels: [__meta_kubernetes_namespace]

action: replace

target_label: kubernetes_namespace

- source_labels: [__meta_kubernetes_service_name]

action: replace

target_label: kubernetes_name

prometheus.yml 파일의 내용을 보면 어떤 정보를 수집할지에 대해서 기술하고 있다. 전체적으로 10s 단위로 메트릭 정보를 수집하도록 하고 있으며 kube-state-metrics 에 대해서는 도메인 호출을해서 스크랩을 하도록 하고 있다. 그밖에 각종 Kubernetes 컴포넌트들에 대해서도 Kubernetes 의 API 서버를 통해서 메트릭을 수집하도록 설정하고 있다.

이렇게 설정내용을 파일로 작성하였다면 kubectl 명령어를 이용해 ConfigMap 을 다음과 같이 생성할 수 있다.

$ ls
prometheus.rules  prometheus.yml
$ kubectl create configmap prometheus-config -n monitoring --from-file=./
configmap/prometheus-config created

$ ls

prometheus.rules prometheus.yml

$ kubectl create configmap prometheus-config -n monitoring --from-file=./

configmap/prometheus-config created

정상적으로 생성이 되었을 것이다.

이제 필요한 제반 사항들은 모두 만들어 졌다. 실제 Prometheus 배포로 Pod 를 생성해 보자.

Prometheus 를 위한 Deployment

Kubernetes 가 버전이 높아지면 Deployment 를 기반으로 여러개의 Pod 를 생성하도록 변경되었다. 여기서는 Prometheus 를 만들어야 하는데, 단순하게 Pod 만 생성하는게 아니라 Deployment 를 이용해서 Pod 를 생성 한다.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-deployment
  namespace: monitoring
  labels:
    app: prometheus-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-server
  template:
    metadata:
      labels:
        app: prometheus-server
    spec:
      serviceAccountName: prometheus
      containers:
        - name: prometheus
          image: prom/prometheus
          args:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus/"
          ports:
            - containerPort: 9090
          volumeMounts:
            - name: prometheus-config-volume
              mountPath: /etc/prometheus/
            - name: prometheus-storage-volume
              mountPath: /prometheus/
      nodeSelector:
        kubernetes.io/hostname: knode
      volumes:
        - name: prometheus-config-volume
          configMap:
            defaultMode: 420
            name: prometheus-config

        - name: prometheus-storage-volume
          persistentVolumeClaim:
            claimName: prometheus-pvc

apiVersion: apps/v1

kind: Deployment

metadata:

namespace: monitoring

labels:

app: prometheus-server

spec:

replicas: 1

selector:

matchLabels:

app: prometheus-server

template:

metadata:

labels:

app: prometheus-server

spec:

serviceAccountName: prometheus

containers:

- name: prometheus

image: prom/prometheus

args:

- "--config.file=/etc/prometheus/prometheus.yml"

- "--storage.tsdb.path=/prometheus/"

ports:

- containerPort: 9090

volumeMounts:

- name: prometheus-config-volume

mountPath: /etc/prometheus/

- name: prometheus-storage-volume

mountPath: /prometheus/

nodeSelector:

kubernetes.io/hostname: knode

volumes:

- name: prometheus-config-volume

configMap:

defaultMode: 420

- name: prometheus-storage-volume

persistentVolumeClaim:

claimName: prometheus-pvc

지금까지 만들었던 제반사항들이 모두 포함되어 있다. ServiceAccount 의 경우에는 Deployment.spec.serviceAccountName 에 prometheus 로 지정해 ClusterBinding 으로 묶인 메트릭 수집을 위한 API 접근을 부여하고 있다. Volume 마운트 에서는 ConfigMap 을 마운트해 파일을 작성과 PVC 를 마운트해 Prometheus 의 데이터 저장 디렉토리로 마운트 해주고 있다.

다음과 같이 생성해 준다.

$ kubectl apply -f prometheus-deployment.yaml 
deployment.apps/prometheus-deployment created
$ kubectl get po -n monitoring
NAME                                     READY   STATUS   RESTARTS   AGE
prometheus-deployment-84468df555-5jdnq   0/1     Error    4          2m11s

$ kubectl apply -f prometheus-deployment.yaml

deployment.apps/prometheus-deployment created

$ kubectl get po -n monitoring

NAME READY STATUS RESTARTS AGE

prometheus-deployment-84468df555-5jdnq 0/1 Error 4 2m11s

처음 실행을 하면 위와 같이 에러가 발생 한다. 로그를 한번 보자.

$ kubectl logs prometheus-deployment-84468df555-5jdnq -n monitoring
level=error ts=2020-08-25T05:25:06.038Z caller=query_logger.go:87 component=activeQueryTracker msg="Error opening query log file" file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied"
panic: Unable to create mmap-ed active query log

goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x7ffd40076e44, 0xc, 0x14, 0x30898a0, 0xc0005e28d0, 0x30898a0)
        /app/promql/query_logger.go:117 +0x4cd
main.main()
        /app/cmd/prometheus/main.go:374 +0x4f08

$ kubectl logs prometheus-deployment-84468df555-5jdnq -n monitoring

level=error ts=2020-08-25T05:25:06.038Z caller=query_logger.go:87 component=activeQueryTracker msg="Error opening query log file" file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied"

panic: Unable to create mmap-ed active query log

goroutine 1 [running]:

github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x7ffd40076e44, 0xc, 0x14, 0x30898a0, 0xc0005e28d0, 0x30898a0)

/app/promql/query_logger.go:117 +0x4cd

main.main()

/app/cmd/prometheus/main.go:374 +0x4f08

“permission denied” 이 에러는 PV 에서 생성한 /opt/prometheus 디렉토리에 대한 권한이 없어서 나는 것이다. PV 를 생성한 Node 에 다음과 같이 퍼미션을 부여 한다.

# chmod 757 /opt/prometheus

1	# chmod 757 /opt/prometheus

위와같이 퍼미션을 조정해 주면 자동으로 Pod 가 재배포 되면서 정상화 된다.

접속을 위한 Service 배포

Kubernetes 는 기본적으로 Cluster 내에서의 접속만 허용 한다. 외부에서 접속이 되게 하기 위해서는 Service 배포를 통해서 NodPort 를 열거나 아니면 Port-Forward 를 설정을 해서 접속을 해야 한다.

여기선 Service 배포를 통해서 NodePort 를 배포해 준다.

$ vim prometheus-service.yml
apiVersion: v1
kind: Service
metadata:
  name: prometheus-service
  namespace: monitoring
  annotations:
      prometheus.io/scrape: 'true'
      prometheus.io/port:   '9090'

spec:
  selector:
    app: prometheus-server
  type: NodePort
  ports:
    - port: 8080
      targetPort: 9090
      nodePort: 30003

$ vim prometheus-service.yml

apiVersion: v1

kind: Service

metadata:

namespace: monitoring

annotations:

prometheus.io/scrape: 'true'

prometheus.io/port: '9090'

spec:

selector:

app: prometheus-server

type: NodePort

ports:

- port: 8080

targetPort: 9090

nodePort: 30003

Deployment 에서 Container 포트를 9090 으로 해줬다. Service 에서는 targetPort 로 Container 포트를 인식시켜주고 Service 에서 사용할 포트 8080 과 연결해준다. Cluster 내에서 8080 포트로 접속을 하면 Deployment 의 Container 에 9090 와 연결되어 응답이 오게 된다.

nodePort 를 지정해줘서 외부에서도 접속할 수 있도록 오픈해 준다.

다음과 같이 배포를 진행 한다.

$ kubectl apply -f prometheus-service.yaml
service/prometheus-service created

1 2	$ kubectl apply -f prometheus-service.yaml service/prometheus-service created

Node Exportor

Node Exportor 는 Kubernetes 의 Node 에 대한 정보를 수집해 준다. Kubernetes 는 모든 것을 Pod 로 작동되는데, Node 에 하나씩만 작동되어야 하며 Kubernetes 클러스터와는 별도로 Node 자체에 데몬(Daemon) 처럼 동작해야 한다. 이를 위해서 DaemonSet 오브젝트로 만들어야 한다.

$ vim prometheus-node-exporter.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring
  labels:
    k8s-app: node-exporter
spec:
  selector:
    matchLabels:
      k8s-app: node-exporter
  template:
    metadata:
      labels:
        k8s-app: node-exporter
    spec:
      containers:
      - image: prom/node-exporter
        name: node-exporter
        ports:
        - containerPort: 9100
          protocol: TCP
          name: http
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: node-exporter
  name: node-exporter
  namespace: kube-system
spec:
  ports:
  - name: http
    port: 9100
    nodePort: 31672
    protocol: TCP
  type: NodePort
  selector:
    k8s-app: node-exporter

$ vim prometheus-node-exporter.yaml

apiVersion: apps/v1

kind: DaemonSet

metadata:

namespace: monitoring

labels:

k8s-app: node-exporter

spec:

selector:

matchLabels:

k8s-app: node-exporter

template:

metadata:

labels:

k8s-app: node-exporter

spec:

containers:

- image: prom/node-exporter

ports:

- containerPort: 9100

protocol: TCP

---

apiVersion: v1

kind: Service

metadata:

labels:

k8s-app: node-exporter

namespace: kube-system

spec:

ports:

- name: http

port: 9100

nodePort: 31672

protocol: TCP

type: NodePort

selector:

k8s-app: node-exporter

위 내용을 배포하게 되면 Node Exportor 가 DaemonSet 으로 동작한다. 그리고 Prometheus 는 정해진 시간마다 Node Exportor 에 접속해 메트릭을 수집하고 저장하게 된다.

마치며

Prometheus 는 Kubernetes 와 잘 작동 한다. Kubernetes 를 하면서 다양한 방법으로 설치를 진행할 수 있는데, 한번 쯤은 수동으로 모두 해보는 것을 권장하고 한번 공부해보길 권장한다. 그것만으로도 Kubernetes 에 대해서 상당히 많은 부분을 익힐 수가 있다.

그런 면에서 Prometheus 를 수동으로 모두 설치해보는 것은 가치가 있다.

5 comments

04/22/2021 - 10:55 ironjo

안녕하세요 작성자님 글을 따라 테스트하다
저는 prometheus-deployment.yaml 을 배포하니 Pending상태로 떨어집니다.
‘ 0/1 nodes are available: 1 node(s) didn’t match node selector. ‘이렇게 문구가 출력되는데 이 문제가
#kubectl get nodes 의 결과값으로 나오는 NAME값을 넣어줘야하는건가요?
혹시 아시면 답변 부탁드립니다.

응답
- 04/23/2021 - 01:12 admin
  
  kubectl describe, kubectl logs 명령어를 이용해서 상태를 봐야 합니다.
  
  응답
- 10/30/2021 - 16:53 솔다
  
  kubectl get node –show-labels을 보시고 ~~~#k8s.io/knode 레이블이 있는지 보셔야 합니다.
  저는 docker에서 제공하는 k8s쿠버네티스를 사용하기 때문에 노드 레이블이 저렇게 지정되어있지 않았습니다.
  그래서 제가 사용하고 있는 노드 레이블의 값으로 deployment에 있는 nodeSelector값을 변경해주었습니다.
  
  응답
  - 10/30/2021 - 16:53 솔다
    
    오타가 잇는데 –show-labels 입니다.
    
    응답
- 10/30/2021 - 17:01 솔솔
  
  kubectl get nodes –show-labels를 통해나온 label의 key:value가 prometheus-deployment.yaml에 정의된 nodeSelector key:value값이 있는지 봐야합니다.
  본문에는 knode로 되어있을 건데, 기본적으로 node label을 저렇게 따로 설정하지 않으면 해당 레이블에 맞는 노드 셀렉터가 없어 설치가 안됩니다.
  node의 label을 추가하시거나, 아니면 kubectl get nodes –show-labels명령어를 통해 prometheus-deployment.yaml에 적힌 nodeSelector key값과 일치한 value로 수정하시면 됩니다.
  저는 참고로 nodeSelector에 적힌 key값과 일치하는 nodelabel이 있어서 value만 수정하여 정상적으로 설치 되었습니다.
  
  응답

2025 8월
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Prometheus

metric-server, metric-state-server

kube-state-metrics 설치

Prometheus 설치

계획

monitoring 네임스페이스 생성

ClusterRole 생성

Prometheus 를 위한 영구볼륨 생성

Prometheus 를 위한 ConfigMap 생성

Node Exportor

마치며

5 comments

Post a comment Cancel reply