nvidia-smi, nvcc version 차이..

d4r6j·2024년 1월 7일
0
post-thumbnail

check for reinstall nvidia driver

nvidia-smi

nvidia-smi

$ nvidia-smi
Mon Jan  8 01:37:50 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        Off | 00000000:01:00.0 Off |                  N/A |
|  0%   37C    P8              12W / 170W |      2MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

nvcc

nvcc --version

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

driver 와 cuda 버전이 다르므로..

download cuda toolkit

CUDA Tookit Archive - site

nvidia-smi 에서
CUDA version : 12.2
Driver Version : 535.129.03

이어서

CUDA 12.2.2 - download site

wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_535.104.05_linux.run

sudo sh cuda_12.2.2_535.104.05_linux.run
x Existing package manager installation of the driver found. It is strongly    x
x recommended that you remove this before continuing.                          x
x Abort                                                                        x
x Continue

Abort 를 해서 기존 driver 도 다 지우고 통합으로 맞춰서 설치해보자..

delete

Step 1 : check module

lsmod | grep nvidia

$ lsmod | grep nvidia
nvidia_uvm           1511424  0
nvidia_drm             77824  0
nvidia_modeset       1302528  1 nvidia_drm
nvidia              56659968  2 nvidia_uvm,nvidia_modeset
drm_kms_helper        311296  1 nvidia_drm
drm                   622592  4 drm_kms_helper,nvidia,nvidia_drm

Step 2 : check running nvidia driver

sudo lsof /dev/nvidia*

$ sudo lsof /dev/nvidia*

아무것도 뜨지 않았다. 물론 nvidia-smi 로 볼때, 없다. 그걸로 체크 가능..

Step 3 : remove module

sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia_uvm
sudo rmmod nvidia

$ sudo rmmod nvidia_drm
$ sudo rmmod nvidia_modeset
$ sudo rmmod nvidia_uvm
$ sudo rmmod nvidia

$ lsmod |grep nvidia
$

module 을 다 내리고, lsmod 로 확인 시, module 떨어진 것 확인.

Step 4 : Delete for all nvidia

sudo apt --purge remove nvidia
sudo apt remove --purge nvidia*
sudo apt remove --purge nvidia
sudo apt remove --purge nvidia-

sudo apt remove --purge nvidia-*
sudo apt remove --purge libvidia*
sudo apt autoremove
sudo apt autoclean

이지만, 앞전에 nvidia-toolkit 을 지워서 그런가.. 하위 2개만 반응하였다.

sudo apt remove --purge nvidia\*
sudo apt remove --purge nvidia-\*

Step 5 : Check deletion of nvidia driver

sudo dpkg -l | grep nvidia

$ sudo dpkg -l | grep nvidia
rc  libnvidia-compute-510:amd64           510.108.03-0ubuntu0.22.04.1             amd64        NVIDIA libcompute package
rc  libnvidia-compute-525:amd64           525.105.17-0ubuntu0.22.04.1             amd64        NVIDIA libcompute package
rc  libnvidia-compute-535:amd64           535.129.03-0ubuntu0.22.04.1             amd64        NVIDIA libcompute package

나머지도 다 지워야 한다.

$ sudo apt-get remove --purge libnvidia-compute-510:amd64
$ sudo apt-get remove --purge libnvidia-compute-525:amd64
$ sudo apt-get remove --purge libnvidia-compute-535:amd64

$ sudo dpkg -l | grep nvidia
$

지워진 것 확인.

Installation

sudo sh cuda_12.2.2_535.104.05_linux.run

$ sudo sh cuda_12.2.2_535.104.05_linux.run
x  End User License Agreement                                                  x
x  --------------------------                                                  x
x                                                                              x
x  NVIDIA Software License Agreement and CUDA Supplement to                    x
x  Software License Agreement. Last updated: October 8, 2021                   x
x                                                                              x
x  The CUDA Toolkit End User License Agreement applies to the                  x
x  NVIDIA CUDA Toolkit, the NVIDIA CUDA Samples, the NVIDIA                    x
x  Display Driver, NVIDIA Nsight tools (Visual Studio Edition),                x
x  and the associated documentation on CUDA APIs, programming                  x
x  model and development tools. If you do not agree with the                   x
x  terms and conditions of the license agreement, then do not                  x
x  download or use the software.                                               x
x                                                                              x
x  Last updated: October 8, 2021.                                              x
x                                                                              x
x                                                                              x
x  Preface                                                                     x
x  -------

다시 설치 시작..

x CUDA Installer                                                               x
x - [X] Driver                                                                 x
x      [X] 535.104.05                                                          x
x + [X] CUDA Toolkit 12.2                                                      x
x   [X] CUDA Demo Suite 12.2                                                   x
x   [X] CUDA Documentation 12.2                                                x
x - [ ] Kernel Objects                                                         x
x      [ ] nvidia-fs                                                           x
x   Options                                                                    x
x   Install                                                                    x
x                                                                              x
x                                                                              x

일단 kernel objects \rightarrow nvidia-fs 는 잘 몰라서 우선 pass 후 install.

$ sudo sh cuda_12.2.2_535.104.05_linux.run
===========
= Summary =
===========

Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-12.2/

Please make sure that
 -   PATH includes /usr/local/cuda-12.2/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-12.2/lib64, or, add /usr/local/cuda-12.2/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.2/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Logfile is /var/log/cuda-installer.log

Install failed

[ERROR]: Install of driver component failed. Consult the driver log at /var/log/nvidia-installer.log for more details.
[ERROR]: Install of 535.104.05 failed, quitting

Whare is "Nouveau kernel driver" ?

(Answer: Continue installation)
ERROR: The Nouveau kernel driver is currently in use by your system.  This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding.  Please consult the NVIDIA driver README and your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver.

lsmod | grep -i nouveau

$ lsmod |grep -i nouveau
nouveau              2306048  0
i2c_algo_bit           16384  1 nouveau
drm_ttm_helper         16384  1 nouveau
ttm                    86016  2 drm_ttm_helper,nouveau
drm_kms_helper        311296  1 nouveau
drm                   622592  5 drm_kms_helper,drm_ttm_helper,ttm,nouveau
mxm_wmi                16384  1 nouveau
wmi                    32768  3 wmi_bmof,mxm_wmi,nouveau
video                  65536  1 nouveau

register nouveau driver on blacklist

sudo vi /etc/modprobe.d/blacklist.conf

# For nvidia original driver
# disable nouveau driver
blacklist nouveau
blacklist lbm-nouveau
options nouveau modset=0
alias nouveau off
alias lbm-nouveau off

sudo update-initramfs -u

$ sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-5.15.0-91-generic

and reboot again.

Check deletion of nouveau kernel driver

$ lsmod | grep -i nouveau
$

Install again

sudo sh cuda_12.2.2_535.104.05_linux.run

$ sudo sh cuda_12.2.2_535.104.05_linux.run
[sudo] password for d4r6j:
===========
= Summary =
===========

Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-12.2/

Please make sure that
 -   PATH includes /usr/local/cuda-12.2/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-12.2/lib64, or, add /usr/local/cuda-12.2/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.2/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Logfile is /var/log/cuda-installer.log

sudo reboot

$ nvcc
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit

Check "nvcc"

$ find / -name nvcc 2>/dev/null
/usr/local/cuda-12.2/bin/nvcc

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

0개의 댓글