MLOps 구축기 - 2. Minikube(GPU)

문주은·2024년 1월 16일

1. Minikube 설치 및 환경 설정

1-1. Minikube 설치

# install
$ curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
$ sudo install minikube-linux-amd64 /usr/local/bin/minikube

# conntrack 설치
$ sudo apt-get install -y conntrack

# cluster 시작
$ minikube start --driver=none

# minikube 정상 작동 여부 확인
$ minikube status
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured

# cluster 확인(using 'minkube kubectl')
$ minikube kubectl -- get po -A

# alias를 통해 kubectl 명령어를 사용하는 것 처럼 사용 가능
$ alias kubectl="minikube kubectl --"

# cluster 확인(using 'kubectl')
$ kubectl get po -A

1-2. addons

# minikube 대시보드 활성화 
$ minikube dashboard
* Verifying dashboard health ... 
* Launching proxy ... 
* Verifying proxy health ... 
* Opening http://127.0.0.1:45137/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/ in your default browser... 
  http://127.0.0.1:45137/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/

# minikube 에드온 확인
$ minikube addons list
|         ADDON NAME          | PROFILE  |    STATUS    |           MAINTAINER           | 
|-----------------------------|----------|--------------|--------------------------------| 
| ambassador                  | minikube | disabled     | third-party (ambassador)       | 
| auto-pause                  | minikube | disabled     | google                         | 
| csi-hostpath-driver         | minikube | disabled     | kubernetes                     | 
| dashboard                   | minikube | enabled ✅   | kubernetes                     |
--> dashboard enabled 상태여야함.

1-3. kubectl, kubelet, kubeadm 설치

$ sudo apt install kubectl kubelet kubeadm
$ curl -LO https://dl.k8s.io/release/v1.27.0/bin/linux/arm64/kubectl

1-4. 방화벽

  • 127.0.0.1로 접속하게 되어져 있기 때문에 external ip로 접속 가능하게 세팅 필요
  • proxy 설정
# proxy 설정
$ kubectl proxy --address='0.0.0.0' --disable-filter=true 
W0427 08:31:50.459704 1015419 proxy.go:175] Request filter disabled, your proxy is vulnerable to XSRF attacks, please be cautious
Starting to serve on [::]:8001
  • 외부 모든 ip에서도 '8001' 포트에 접근 가능하도록 설정함.
$ sudo ufw allow from any to any port 8001 proto tcp

$ sudo ufw status
Status: active
To                         Action      From
--                         ------      ----
22                         ALLOW       Anywhere
8001/tcp                   ALLOW       Anywhere
22 (v6)                    ALLOW       Anywhere (v6)
8001/tcp (v6)              ALLOW       Anywhere (v6)

1-4. k8s dashboard

http://127.0.0.1:45137/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/
해당 주소에서 127.0.0.1과 45137 포트 변경

->

$ http://192.168.0.37:8001/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/

# 확인
curl http://192.168.0.37:8001/api/v1/namespaces/kube-system/services/http:kubernetes-dashboard:/proxy/


2. Nvidia-docker 설치

2-1. Nvidia Container Toolkit 설치

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit nvidia-container-runtime 
$ sudo apt-get install -y nvidia-docker2
$ sudo systemctl restart docker

2-2. 확인

$ docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
$ docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

2-3. daemon.json 변경

$ sudo vi /etc/docker/daemon.json

# daemon.jon
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

2-4. 도커 재시작

$ sudo systemctl daemon-reload
$ sudo service docker restart

3. Nvidia 장치 플러그인 설치

3-1. minikube 설치

$ minikube start --driver=none --kubernetes-version=v1.22.0

$ minikube start --driver=none   --kubernetes-version=v1.21.7   --extra-config=apiserver.service-account-signing-key-file=/var/lib/minikube/certs/sa.key   --extra-config=apiserver.service-account-issuer=kubernetes.default.svc

$ minikube start --driver=none   --kubernetes-version=v1.22.3 --extra-config=apiserver.service-account-signing-key-file=/var/lib/minikube/certs/sa.key   --extra-config=apiserver.service-account-issuer=kubernetes.default.svc

3-2. nvidia 장치 플러그인 설치

$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/master/nvidia-device-plugin.yml

3-3. 확인

(base) user@GPU10:~$ kubectl get pod -A | grep nvidia
kube-system                 nvidia-device-plugin-daemonset-9zxn6          1/1     Running   0             71s 
(base) user@GPU10:~$ kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
NAME       GPU
minikube   5

3-4. gpu-container.yaml 생성

로컬 GPU 에 맞는 cuda version 이미지로 생성 주의!

  • CUDA Version: 11.7 이므로 nvidia/cuda:11.0-runtime 사용
  • 'spec.resources.requests' 와 'spec.resources.limits'에 nvidia.com/gpu를 포함해야 pod내에서 GPU 사용 가능!!
$ vim gpu-container.yaml

# gpu-container.yaml
apiVersion: v1
kind: Pod
metadata:
  name: gpu
spec:
  containers:
  - name: gpu-container
    image: nvidia/cuda:11.6.2-runtime-ubuntu20.04
    command:
      - "/bin/sh"
      - "-c"
    args:
      - nvidia-smi && tail -f /dev/null
    resources:
      requests:
        nvidia.com/gpu: 3
      limits:
        nvidia.com/gpu: 3


ETC Command

k8s service의 service ip address

kubectl get svc -A | grep kubernetes

kubernetes cluster와 kubectl 연결 정보

cat ~/.kube/config

KUBECONFIG 환경설정

export KUBECONFIG=~/.kube/config

sudo rm -rf /tmp/juju-mk
sudo rm -rf /tmp/minikube.

클러스터 로그인

kubectl config use-context minikube

event log 확인

kubectl get events -n kube-system


[확인 사항]

  • nvidia docker 설치
  • 드라이버 설치
  • docker start시 gpu 할당
  • 엔비디아 도커 설정 시 런타임 도커 설정
  • 엔비디아 도커 설정 시 containerd 설정
  • minikube 런타임 확인


Reference

https://anencore94.github.io/2020/08/19/minikube-gpu.html
https://mlops-for-all.github.io/docs/setup-kubernetes/setup-nvidia-gpu/

profile
Data Engineer

0개의 댓글