[Linux]#3 CUDA setting

Clay Ryu's sound lab·2023년 10월 3일
0

Framework

목록 보기
31/49

This post is my log for setting cuda driver 525(535), toolkit 11.8(12.2), cudnn8.9.4.25 in ubuntu 22.04

following posts helped me a lot:
https://evols-atirev.tistory.com/43
https://aeong-dev.tistory.com/1

Cuda

update

sudo apt update && sudo apt upgrade -y

delete previous one (optional)

When something went wrong, just delete all and start again from building cuda drivers.

# delete driver
sudo apt remove --purge '^nvidia-.*'

# delete cuda related things
sudo apt remove --purge "*cublas*" "cuda*" "nsight*" 

# delete cuda files in local
rm -rf /usr/local/cuda-11.8/

cuda driver setup

sudo apt install nvidia-driver-525
sudo reboot

cuda toolkit setup

you need to find the right toolkit version related to cuda driver version.

535 version ex: https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-183-06/index.html

toolkit ex: https://developer.nvidia.com/cuda-toolkit-archive

# only for ssh
# nvidia-drm off before setup
systemctl isolate multi-user.target

# example
$ wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
$ sudo sh cuda_11.8.0_520.61.05_linux.run


$ sudo vim ~/.bashrc
# add following lines
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
$ source ~/.bashrc

# nvidia-drm on after setup
systemctl start graphical.target

Cudnn setup

https://developer.nvidia.com/rdp/cudnn-archive

Filezilla

use filezilla to transfer cudnn files downloaded from the site to server

$ tar -xvf cudnn-linux-x86_64-8.9.4.25_cuda11-archive.tar.xz 

# copy and paste files from extracted folder
sudo cp cudnn-linux-x86_64-8.9.4.25_cuda11-archive/include/cudnn*.h /usr/local/cuda/include
sudo cp cudnn-linux-x86_64-8.9.4.25_cuda11-archive/lib/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

or

sudo cp cudnn-linux-x86_64-8.9.7.29_cuda12-archive/include/cudnn*.h /usr/local/cuda/include
sudo cp cudnn-linux-x86_64-8.9.7.29_cuda12-archive/lib/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
# check validity
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

When you can't build docker

with following error message:

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

https://github.com/NVIDIA/nvidia-docker/issues/1243#issuecomment-694981577

Failed to initialize NVML: Unknown Error in Docker after Few hours error

https://stackoverflow.com/questions/72932940/failed-to-initialize-nvml-unknown-error-in-docker-after-few-hours

profile
chords & code // harmony with structure

0개의 댓글