16. GPU

이세진·2022년 4월 3일

컴퓨터 구조

0

Computer Science

목록 보기

60/74

생성일: 2021년 12월 4일 오후 4:15

GPUs are SIMD Engines Underneath

instruction pipeline operates like a SIMD pipeline
However, programming is done using threads, NOT SIMD instructions

How Can You Exploit Parallelism Here?

for (i=0; i < N; i++)
	C[i] = A[i] + B[i];

위의 코드에서 insctruction-level 병렬성을 활용하기 위한 3 가지 옵션

Sequential (SISD)
Data-Parallel (SIMD)
Multithreaded (MIMD/SPMD)

1. Sequential (SISD)

Pipelined processor
Out-of-order execution processor
- Independent instructions executed when ready
- Different iterations are present in the instruction window and can execute in parallel in multiple functional units
- the loop is dynamically unrolled by the hardware
Superscalar or VLIW processor

2. data Parallel (SIMD)

Each iteration is independent
Programmer or compiler generates a SIMD instruction to execute the same instruction from all iterations across different data
Best executed by a SIMD processor (vector, array)

3. Multithreaded

Each iteration is independent
Programmer or compiler generates a thread to execute each iteration. Each thread does the same thing (but on different data)
Can be executed on a MIMD machine
This particular model is also called : SPMD (Single Program Multiple Data)
Can be executed on a SIMT machine (Single Instruction Multiple Thread)

GPU is a SIMD (SIMT) Machine

It is programmed using threads (SPMD programming model)
- Each thread executes the same code but operates a different piece of data
A set of threads executing the same instruction are dynamically grouped into a warp (wavefront) by the hardware

SPMD on SIMT Machine

SIMD vs. SIMT Execution model

SIMD: A single sequential instruction stream of SIMD instructions → each instruction specifies multiple data inputs
- [VLD, VADD, VST], VLEN
SIMT: Multiple instruction streams of scalar instructions → threads grouped dynamically into warps
- [LD, ADD, ST], NumThreads
- 장점
  - Can treat each thread separately
  - Can group threads into warps flexibly

Fine-Grained Multithreading of Warps

Assume a warp consists of 32 threads
If you have 32K iterations, and 1 iteration/thread → 1K warps
Warps can be interleaved(끼워지다) on the same pipeline → Fine grained multithreading of warps

Iter : 3332 + 1 ⇒ 2032 + 1 Iter. 3432 + 2 ⇒ 2032 + 2

Warps and Warp-Level FGMT

Warp: A set of threads that execute the same instruction (on different data elements) → SIMT (Nvidia-speak)
All threads run the same code

나중은 결코 오지 않는다.

이전 포스트

15. SIMD 1,2

다음 포스트

17. Memory Hierarchy and Caches

0개의 댓글