Vitis AI 설치 및 테스트

KiJungKong·2024년 1월 14일

Vitis AI는 우분투나 레드햇 계열에서만 설치 가능하다. 따라서 Vitis AI를 우분투가 깔린 WSL위의 도커위에 설치하였다.
기존에 쓰고 있던 AMD의 RX 570 GPU를 통해 GPU 가속을 하려면 컴퓨터의 윈도우를 다 밀고 리눅스를 새로 깔던지 SSD를 하나 더 사서 거기에 리눅스를 깔던지 해야한다는 번거로움이 있었다.
물론 AMD의 GPGPU 소프트웨어 스택인 ROCm이 2023년 7월 27일부로 윈도우용으로 출시가 되었다고는 하지만 역사가 짧다보니 가상머신인 WSL에 연결할 솔루션을 인터넷에서 아무리 찾아봐도 안 나온다.
애초에 최신버전 ROCm은 RX570을 지원하지 않는다. https://rocm.docs.amd.com/projects/radeon/en/latest/docs/prerequisites.html
따라서 어쩔수 없이 집에 돌아다니는 NVIDIA GTX960 GPU로 교체하여 진행하였다.

WSL에 NVIDIA GPU 연결

윈도우상에 적절한 NVIDIA GPU driver가 깔려있다는 전제 하에 WSL에 들어가서 아래와 같은 명령어를 친다. 이 명령어는 이 링크(CUDA Toolkit Archive | NVIDIA Developer)에서 복사해온 우분투가 깔린 WSL에 CUDA Toolkit을 설치하는 명령어이다.

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda-repo-wsl-ubuntu-12-3-local_12.3.2-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-3-local_12.3.2-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-3

# 출처: https://developer.nvidia.com/cuda-downloads

설치를 다 하고 나서 아래의 명령어를 치면

nvidia-smi -q

아래와 같은 결과를 볼 수 있다. 여기서 NVIDIA GeForce GTX 960이 잘 잡힌것을 볼 수 있으며 CUDA Version란을 보면 12.3 버전이 깔린 것을 볼 수 있다.

==============NVSMI LOG==============

Timestamp                                 : Sat Jan 13 15:21:43 2024
Driver Version                            : 546.33
CUDA Version                              : 12.3

Attached GPUs                             : 1
GPU 00000000:26:00.0
    Product Name                          : NVIDIA GeForce GTX 960
    Product Brand                         : GeForce
    Product Architecture                  : Maxwell
   
이하 생략

출처: CUDA Toolkit Archive | NVIDIA Developer

도커 설치

도커 공식 Repository 등록

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

도커 설치

sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

설치 잘 되었는지 테스트

❯ sudo docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

출처: Install Docker Engine on Ubuntu | Docker Docs

sudo 없이 docker를 실행할 수 있도록 만들기

그룹을 만들고
```
sudo groupadd docker
```
현재 내가 접속한 유저를 docker 그룹에 등록함
```
sudo usermod -aG docker $USER
```
WSL을 재시작 (파워쉘에서)
```
wsl --shutdown
wsl
```

sudo를 안 붙이고도 docker가 되는지 확인

❯ docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

출처: Linux post-installation steps for Docker Engine | Docker Docs

Vitis-AI 설치

설치하길 원하는 폴더로 이동
```
cd ~/works
```

git clone

git clone https://github.com/Xilinx/Vitis-AI

docker 빌드 (2시간 정도 걸림)

AMD gpu를 쓸 경우에는 이미 빌드된 이미지를 다운받아 쓸 수 있지만 NVIDIA gpu일 경우에는 직접 빌드 하는 방법말곤 없음

docker_build.sh 파일을 통해 빌드하며 아래 표와 같이 다양한 옵션이 있음

DOCKER_TYPE (-t)	TARGET_FRAMEWORK (-f)	Desired Environment
cpu	pytorch	PyTorch cpu-only
	tf2	TensorFlow 2 cpu-only
	tf1	TensorFlow 1.15 cpu-only

gpu	pytorch	PyTorch with AI Optimizer CUDA-gpu
	tf2	TensorFlow 2 with AI Optimizer CUDA-gpu
	tf1	TensorFlow 1.15 with AI Optimizer CUDA-gpu

rocm	pytorch	PyTorch with AI Optimizer ROCm-gpu
	tf2	TensorFlow 2 with AI Optimizer ROCm-gpu

gpu를 사용하고 파이토치가 아닌 텐서플로우2를 쓸 것이기 때문에 아래와 같이 적절하게 파라미터를 넣음

cd ./Vitis-AI/docker
./docker_build.sh -t gpu -f tf2

끝나면 마지막 부분에 아래와 같은 결과메시지를 볼 수 있음

 => => naming to docker.io/xilinx/vitis-ai-tensorflow2-gpu:3.5.0.001-cc6f2308a           0.2s

도커 이미지가 잘 생성이 되었는지 확인

docker images

하면 아래와 같은것들이 나옴 이 중에 방금 3번의 마지막부분에서 확인했던 것과 이름이 같은 첫번째 이미지가 우리가 사용할 이미지임

REPOSITORY                        TAG                   IMAGE ID       CREATED          SIZE
xilinx/vitis-ai-tensorflow2-gpu   3.5.0.001-cc6f2308a   6d613b5c604a   18 minutes ago   20.2GB
xilinx/vitis-ai-gpu-tf2-base      latest                3d7be673215c   46 minutes ago   12.2GB
hello-world                       latest                d2c94e258dcb   8 months ago     13.3kB

Alias 등록

간단하게 하기 위해서 Alias를 등록함

vi ~/.zshrc # zsh말고 bash를 쓰는경우에는 .bashrc

# vitis-ai
alias vitis-ai='cd ~/{설치한 경로}/Vitis-AI; ./docker_run.sh {만든 도커의 이름}:{만든 도커의 태그}'

# Example
alias vitis-ai='cd ~/works/AMD/Vitis-AI; ./docker_run.sh xilinx/vitis-ai-tensorflow2-gpu:3.5.0.001-cc6f2308a'

실행

아까 만든 Alias를 통해 실행하면 됨

vitis-ai

실행하고 나서 또 약간의 설치과정이 진행되다가 아래와 같은 것들이 나오면 성공!

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Setting up {유저 이름} 's environment in the Docker container...
usermod: no changes
Running as vitis-ai-user with ID 0 and group 0


==========================================

__      ___ _   _                   _____
\ \    / (_) | (_)            /\   |_   _|
 \ \  / / _| |_ _ ___ ______ /  \    | |
  \ \/ / | | __| / __|______/ /\ \   | |
   \  /  | | |_| \__ \     / ____ \ _| |_
    \/   |_|\__|_|___/    /_/    \_\_____|

==========================================

Docker Image Version: 3.5.0.001-cc6f2308a   (GPU)
Vitis AI Git Hash: cc6f2308a
Build Date: 2024-01-13
WorkFlow: tf2

vitis-ai-user@{유저 그룹}:/workspace$

CIFAR10 Dataset 예제 실행해보기

0. Vitis-AI에 파이썬 라이브러리 image-classifiers 설치

tmux를 사용하거나 WSL에 접속한 터미널 창을 두개 띄워서 두 터미널을 번갈아가며 작업해야됨

한 터미널에서는 Vitis-AI를 실행

vitis-ai

image-classifiers 설치

conda activate vitis-ai-tensorflow2

pip install image-classifiers

설치가 끝나면 다른 터미널에서 현재 실행중인 도커 컨테이너 목록을 확인함

컨테이너 ID를 잘 확인한 후

❯ docker ps -l
CONTAINER ID   IMAGE                                    COMMAND                  CREATED          STATUS          PORTS     NAMES
013e7039a8fd   xilinx/vitis-ai-tensorflow2-gpu:latest   "/opt/nvidia/nvidia_…"   54 seconds ago   Up 28 seconds             dazzling_bohr

아래 명령어를 쳐서 도커 커밋함

docker commit -m "image-classifiers installed" {실행중인 도커 컨테이너 ID} {만들길 원하는 도커 컨테이너 이름:태그}

# Example
docker commit -m "image-classifiers installed" 013e7039a8fd xilinx/vitis-ai-tensorflow2-gpu:latest

Alias를 재 등록함

vi ~/.zshrc # zsh말고 bash를 쓰는경우에는 .bashrc

# vitis-ai
alias vitis-ai='cd ~/{설치한 경로}/Vitis-AI; ./docker_run.sh {만든 도커의 이름}:{만든 도커의 태그}'

# Example
alias vitis-ai='cd ~/works/AMD/Vitis-AI; ./docker_run.sh xilinx/vitis-ai-tensorflow2-gpu:latest'

Vitis-AI-Tutorials 다운로드

cd ~/works
git clone https://github.com/Xilinx/Vitis-AI-Tutorials.git

branch 바꾸기

다운 받으면 안에 LICENSE.txt, README.md 말고는 아무것도 없는 것을 볼 수 있음 전체 branch 목록을 확인하기 위해 아래 명령어를 침

git branch -a

아래 목록에서 remotes/origin/3.5로 들어가야됨

* (HEAD detached at origin/3.5)
  master
  remotes/origin/1.0
  remotes/origin/1.1
  remotes/origin/1.2
  remotes/origin/1.3
  remotes/origin/1.4
  remotes/origin/2.0
  remotes/origin/2.5
  remotes/origin/3.0
  remotes/origin/3.5
  remotes/origin/HEAD -> origin/master
  remotes/origin/master
(END)

아래의 명령어를 치면

git checkout remotes/origin/3.5

아래와 같이 Tutorials폴더가 생긴 것을 볼 수 있음

❯ ls
LICENSE.txt  README.md  Tutorials

Vitis-AI에 CIFAR10 Dataset 예제 복사

git clone에서 원하는 설치경로에 다운받은 Vitis-AI 폴더 그 자체가 Vitis-AI 도커 컨테이너 내의 /workspace 경로로 마운트 되어 있음, /workspace는 공유폴더로 이 공유폴더를 통해 PC와 도커 컨테이너간에 파일을 이동시킬수 있음

아래의 명령어를 통해 Vitis-AI 폴더 안에 tutorials라는 폴더를 생성한 뒤 그 폴더 안에 Vitis-AI-Tutorials안의 RESNET18을 복사함
```
mkdir Vitis-AI/tutorials
cp -r Vitis-AI-Tutorials/Tutorials/RESNET18 Vitis-AI/tutorials/ 
```

실행

conda activate vitis-ai-tensorflow2

cd /workspace/tutorials/RESNET18/files # your current directory
source run_all.sh run_clean_dos2unix
source run_all.sh cifar10_dataset
source run_all.sh run_cifar10_training

실행하면

... 생략

Epoch 32/50
156/156 [==============================] - 22s 138ms/step - loss: 0.0813 - accuracy: 0.9721 - val_loss: 0.1363 - val_accuracy: 0.9536
2024-01-13 02:27:32.944516: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 491520000 exceeds 10% of free system memory.
2024-01-13 02:27:39.837475: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 491520000 exceeds 10% of free system memory.
2024-01-13 02:27:40.441377: W tensorflow/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 356.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.


Elapsed time for Keras training (s):  709.710958



[DB INFO] saving HDF5 model...


[DB INFO] plot model...


[DB INFO] Make Predictions with Float Model on CIFAR10...

1250/1250 [==============================] - 15s 12ms/step - loss: 0.0943 - accuracy: 0.9660
X_Train Model Loss is 0.09425502270460129
X_Train Model Accuracy is 0.9660000205039978
313/313 [==============================] - 4s 13ms/step - loss: 0.6779 - accuracy: 0.8284
X_Test Model Loss is 0.6778731346130371
X_Test Model Accuracy is 0.8284000158309937
313/313 [==============================] - 3s 9ms/step
              precision    recall  f1-score   support

    airplane       0.78      0.92      0.84      1000
  automobile       0.83      0.94      0.89      1000
        bird       0.78      0.81      0.79      1000
         cat       0.73      0.64      0.68      1000
        deer       0.82      0.81      0.81      1000
         dog       0.72      0.77      0.74      1000
        frog       0.91      0.86      0.88      1000
       horse       0.89      0.85      0.87      1000
        ship       0.94      0.83      0.88      1000
       truck       0.91      0.86      0.88      1000

    accuracy                           0.83     10000
   macro avg       0.83      0.83      0.83     10000
weighted avg       0.83      0.83      0.83     10000


[DB INFO] Generate Training Curves File...

dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

[DB INFO] End of ResNet18 Training2 on CIFAR10...

다음과 같은 결과를 확인할 수 있음