[AI] RuntimeError: CUDA error: invalid device ordinal

๊ผฌ๊ผฌ๋ฌดยท2025๋…„ 4์›” 24์ผ

Error

๋ชฉ๋ก ๋ณด๊ธฐ
4/5
post-thumbnail

๐Ÿงพ ์—๋Ÿฌ ๋‚ด์šฉ

๊ฐ€. ํ•ด๋‹น ์ฝ”๋“œ

if device >= 0:
        torch._C._cuda_setDevice(device)

๋‚˜. Error Log

RuntimeError: CUDA error: invalid device ordinal

CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

๐Ÿค” ์›์ธ

ordinal : ์„œ์ˆ˜ ( ์ฒซ์งธ, ๋‘˜์งธ ์ด๋Ÿฐ ๊ฒƒ)

์ ์ ˆํ•˜์ง€ ์•Š์€ device ์„œ์ˆ˜๋ž€๋‹ค. device๋ฅผ ์ถœ๋ ฅํ•ด๋ดค์„ ๋•Œ, 1์ด ๋‚˜์˜ค๋Š”๋ฐ _cuda_setDevice(1)์ด ์‹คํ–‰์ด ์•ˆ๋˜๋Š”๊ฒƒ ๊ฐ™๋‹ค.


โœ… ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•

์‚ฌ์šฉ๊ฐ€๋Šฅํ•œ GPU์˜ ๊ฐœ์ˆ˜๊ฐ€ 1๊ฐœ์ธ๋ฐ

torch._C._cuda_setDevice(device)

์—ฌ๊ธฐ์„œ device๊ฐ€ 1์ด๋ผ์„œ ์—๋Ÿฌ๊ฐ€ ๋–ด๋‹ค.

device = _get_device_index(device)

์•„๋งˆ๋„ ์œ„์— ์ฝ”๋“œ์—์„œ ์ธ๋ฑ์Šค๋ฅผ 1์„ ๋ฐ˜ํ™˜ํ–ˆ๋‚˜๋ณด๋‹ค.

๋”ฐ๋ผ์„œ ์•„๋ž˜์™€ ๊ฐ™์ด ์ˆ˜์ •ํ•ด์ฃผ์—ˆ๋‹ค.

torch._C._cuda_setDevice(device-1 or 0)
profile
"์™œ"๋ผ๋Š” ๋‹จ์–ด๋กœ ๊ผฌ๋ฆฌ๋ฅผ ๋ฌด๋Š” ๊ฒƒ์„ ์ข‹์•„ํ•ด์š”

0๊ฐœ์˜ ๋Œ“๊ธ€