[AI] RuntimeError: CUDA error: invalid device ordinal

HaloΒ·2025λ…„ 4μ›” 24일
0

Error

λͺ©λ‘ 보기
4/5
post-thumbnail

🧾 μ—λŸ¬ λ‚΄μš©

κ°€. ν•΄λ‹Ή μ½”λ“œ

if device >= 0:
        torch._C._cuda_setDevice(device)

λ‚˜. Error Log

RuntimeError: CUDA error: invalid device ordinal

CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

πŸ€” 원인

ordinal : μ„œμˆ˜ ( 첫째, λ‘˜μ§Έ 이런 것)

μ μ ˆν•˜μ§€ μ•Šμ€ device μ„œμˆ˜λž€λ‹€. deviceλ₯Ό 좜λ ₯해봀을 λ•Œ, 1이 λ‚˜μ˜€λŠ”λ° _cuda_setDevice(1)이 싀행이 μ•ˆλ˜λŠ”κ²ƒ κ°™λ‹€.


βœ… ν•΄κ²° 방법

μ‚¬μš©κ°€λŠ₯ν•œ GPU의 κ°œμˆ˜κ°€ 1개인데

torch._C._cuda_setDevice(device)

μ—¬κΈ°μ„œ deviceκ°€ 1μ΄λΌμ„œ μ—λŸ¬κ°€ λ–΄λ‹€.

device = _get_device_index(device)

μ•„λ§ˆλ„ μœ„μ— μ½”λ“œμ—μ„œ 인덱슀λ₯Ό 1을 λ°˜ν™˜ν–ˆλ‚˜λ³΄λ‹€.

λ”°λΌμ„œ μ•„λž˜μ™€ 같이 μˆ˜μ •ν•΄μ£Όμ—ˆλ‹€.

torch._C._cuda_setDevice(device-1 or 0)
profile
μƒˆλΌ 고양이 ν‚€μš°κ³  μ‹Άλ‹€

0개의 λŒ“κΈ€