nvidia-smi
$ nvidia-smi
Mon Jan 8 01:37:50 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:01:00.0 Off | N/A |
| 0% 37C P8 12W / 170W | 2MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
nvcc --version
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
driver 와 cuda 버전이 다르므로..
nvidia-smi 에서
CUDA version : 12.2
Driver Version : 535.129.03
이어서
wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_535.104.05_linux.run
sudo sh cuda_12.2.2_535.104.05_linux.run
x Existing package manager installation of the driver found. It is strongly x
x recommended that you remove this before continuing. x
x Abort x
x Continue
Abort 를 해서 기존 driver 도 다 지우고 통합으로 맞춰서 설치해보자..
lsmod | grep nvidia
$ lsmod | grep nvidia
nvidia_uvm 1511424 0
nvidia_drm 77824 0
nvidia_modeset 1302528 1 nvidia_drm
nvidia 56659968 2 nvidia_uvm,nvidia_modeset
drm_kms_helper 311296 1 nvidia_drm
drm 622592 4 drm_kms_helper,nvidia,nvidia_drm
sudo lsof /dev/nvidia*
$ sudo lsof /dev/nvidia*
아무것도 뜨지 않았다. 물론 nvidia-smi 로 볼때, 없다. 그걸로 체크 가능..
sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia_uvm
sudo rmmod nvidia
$ sudo rmmod nvidia_drm
$ sudo rmmod nvidia_modeset
$ sudo rmmod nvidia_uvm
$ sudo rmmod nvidia
$ lsmod |grep nvidia
$
module 을 다 내리고, lsmod 로 확인 시, module 떨어진 것 확인.
sudo apt --purge remove nvidia
sudo apt remove --purge nvidia*
sudo apt remove --purge nvidia
sudo apt remove --purge nvidia-
sudo apt remove --purge nvidia-*
sudo apt remove --purge libvidia*
sudo apt autoremove
sudo apt autoclean
이지만, 앞전에 nvidia-toolkit 을 지워서 그런가.. 하위 2개만 반응하였다.
sudo apt remove --purge nvidia\*
sudo apt remove --purge nvidia-\*
sudo dpkg -l | grep nvidia
$ sudo dpkg -l | grep nvidia
rc libnvidia-compute-510:amd64 510.108.03-0ubuntu0.22.04.1 amd64 NVIDIA libcompute package
rc libnvidia-compute-525:amd64 525.105.17-0ubuntu0.22.04.1 amd64 NVIDIA libcompute package
rc libnvidia-compute-535:amd64 535.129.03-0ubuntu0.22.04.1 amd64 NVIDIA libcompute package
나머지도 다 지워야 한다.
$ sudo apt-get remove --purge libnvidia-compute-510:amd64
$ sudo apt-get remove --purge libnvidia-compute-525:amd64
$ sudo apt-get remove --purge libnvidia-compute-535:amd64
$ sudo dpkg -l | grep nvidia
$
지워진 것 확인.
sudo sh cuda_12.2.2_535.104.05_linux.run
$ sudo sh cuda_12.2.2_535.104.05_linux.run
x End User License Agreement x
x -------------------------- x
x x
x NVIDIA Software License Agreement and CUDA Supplement to x
x Software License Agreement. Last updated: October 8, 2021 x
x x
x The CUDA Toolkit End User License Agreement applies to the x
x NVIDIA CUDA Toolkit, the NVIDIA CUDA Samples, the NVIDIA x
x Display Driver, NVIDIA Nsight tools (Visual Studio Edition), x
x and the associated documentation on CUDA APIs, programming x
x model and development tools. If you do not agree with the x
x terms and conditions of the license agreement, then do not x
x download or use the software. x
x x
x Last updated: October 8, 2021. x
x x
x x
x Preface x
x -------
다시 설치 시작..
x CUDA Installer x
x - [X] Driver x
x [X] 535.104.05 x
x + [X] CUDA Toolkit 12.2 x
x [X] CUDA Demo Suite 12.2 x
x [X] CUDA Documentation 12.2 x
x - [ ] Kernel Objects x
x [ ] nvidia-fs x
x Options x
x Install x
x x
x x
일단 kernel objects nvidia-fs 는 잘 몰라서 우선 pass 후 install.
$ sudo sh cuda_12.2.2_535.104.05_linux.run
===========
= Summary =
===========
Driver: Installed
Toolkit: Installed in /usr/local/cuda-12.2/
Please make sure that
- PATH includes /usr/local/cuda-12.2/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-12.2/lib64, or, add /usr/local/cuda-12.2/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.2/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Logfile is /var/log/cuda-installer.log
[ERROR]: Install of driver component failed. Consult the driver log at /var/log/nvidia-installer.log for more details.
[ERROR]: Install of 535.104.05 failed, quitting
(Answer: Continue installation)
ERROR: The Nouveau kernel driver is currently in use by your system. This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding. Please consult the NVIDIA driver README and your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver.
lsmod | grep -i nouveau
$ lsmod |grep -i nouveau
nouveau 2306048 0
i2c_algo_bit 16384 1 nouveau
drm_ttm_helper 16384 1 nouveau
ttm 86016 2 drm_ttm_helper,nouveau
drm_kms_helper 311296 1 nouveau
drm 622592 5 drm_kms_helper,drm_ttm_helper,ttm,nouveau
mxm_wmi 16384 1 nouveau
wmi 32768 3 wmi_bmof,mxm_wmi,nouveau
video 65536 1 nouveau
sudo vi /etc/modprobe.d/blacklist.conf
# For nvidia original driver
# disable nouveau driver
blacklist nouveau
blacklist lbm-nouveau
options nouveau modset=0
alias nouveau off
alias lbm-nouveau off
sudo update-initramfs -u
$ sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-5.15.0-91-generic
and reboot again.
Check deletion of nouveau kernel driver
$ lsmod | grep -i nouveau
$
sudo sh cuda_12.2.2_535.104.05_linux.run
$ sudo sh cuda_12.2.2_535.104.05_linux.run
[sudo] password for d4r6j:
===========
= Summary =
===========
Driver: Installed
Toolkit: Installed in /usr/local/cuda-12.2/
Please make sure that
- PATH includes /usr/local/cuda-12.2/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-12.2/lib64, or, add /usr/local/cuda-12.2/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.2/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Logfile is /var/log/cuda-installer.log
sudo reboot
$ nvcc
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit
Check "nvcc"
$ find / -name nvcc 2>/dev/null
/usr/local/cuda-12.2/bin/nvcc
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0