This post is my log for setting cuda driver 525(535), toolkit 11.8(12.2), cudnn8.9.4.25 in ubuntu 22.04
following posts helped me a lot:
https://evols-atirev.tistory.com/43
https://aeong-dev.tistory.com/1
sudo apt update && sudo apt upgrade -y
When something went wrong, just delete all and start again from building cuda drivers.
# delete driver
sudo apt remove --purge '^nvidia-.*'
# delete cuda related things
sudo apt remove --purge "*cublas*" "cuda*" "nsight*"
# delete cuda files in local
rm -rf /usr/local/cuda-11.8/
sudo apt install nvidia-driver-525
sudo reboot
you need to find the right toolkit version related to cuda driver version.
535 version ex: https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-183-06/index.html
toolkit ex: https://developer.nvidia.com/cuda-toolkit-archive
# only for ssh
# nvidia-drm off before setup
systemctl isolate multi-user.target
# example
$ wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
$ sudo sh cuda_11.8.0_520.61.05_linux.run
$ sudo vim ~/.bashrc
# add following lines
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
$ source ~/.bashrc
# nvidia-drm on after setup
systemctl start graphical.target
https://developer.nvidia.com/rdp/cudnn-archive
use filezilla to transfer cudnn files downloaded from the site to server
$ tar -xvf cudnn-linux-x86_64-8.9.4.25_cuda11-archive.tar.xz
# copy and paste files from extracted folder
sudo cp cudnn-linux-x86_64-8.9.4.25_cuda11-archive/include/cudnn*.h /usr/local/cuda/include
sudo cp cudnn-linux-x86_64-8.9.4.25_cuda11-archive/lib/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
or
sudo cp cudnn-linux-x86_64-8.9.7.29_cuda12-archive/include/cudnn*.h /usr/local/cuda/include
sudo cp cudnn-linux-x86_64-8.9.7.29_cuda12-archive/lib/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
# check validity
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
with following error message:
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
https://github.com/NVIDIA/nvidia-docker/issues/1243#issuecomment-694981577