컴퓨터구조

히치키치·2021년 9월 27일

1-7. The Power Wall

Power Trend

Clock rate와 power 함께 증가
최근에 성장세 멈춤
이유 : 상용 마이크로프로세서의 냉각문제로 사용 가능한 전력 한계 도달
CMOS IC technology
$POWER$ = $Capacitive$ $load$ X $Voltage^{2}$ X $Frequency$

Reducing Power

Power wall 더이상 voltage와 heat 줄이기 불가능

1-8. The switch from uniprocessors to multiprocessors

성장세 :
25%/year -> 52%/year (컴퓨터 구조 발전 + 효율적인 아이디어 등장) -> 22%/year ->3.5%/year (power, instruction level parallelism, memory latency 제한)
Chip에 한 개 보다 많은 processor 둬 multicore processor 실현
명령어 수준의 parallelism과 달리 parallel programming 요구됨
Hw가 여러개의 명령어를 한 번에 실행
Programmer로부터 숨겨져 있음
어려움
구현하는 프로그래밍, load를 공평하게 balancing하기, communication과 synchronization의 optimizing

1-9. Benchmarking the intel core i7

생략

1-10. Fallacies and pitfalls

함정 : 컴퓨터의 한 부분만 개선하고 그 개선된 양에 비례해 전체 성능 좋아지리라 기대하는 것
1. 설계원칙 3 : make the common case faster
2. 전체 성능 개선 정도는 해당 사건 발생 빈도와 관련됨
3.

오류 : 이용률이 낮은 컴퓨터는 전력 소모가 적다. (Low power at Idle)
1. I7 power benchmark

load	Power
100%	258w
50%	170w(66%)
10%	121(47%)

2. Google data center Load 주로 10~50%이며 전체 시간의 1%도 안되는 경우에만 100% 사용
3. load(부하)에 비례하는 power(전력소모) 가지는 processor 만들기

함정 :성능식의 일부를 성능 척도로 사용
1. MIPS : millions of instruction per second
2. MIPS 수치로 고려되지 못하는 것
컴퓨터 사이 ISA 차이 고려 X
명령들 사이 CPI/복잡도 고려 X
3.

1-11. Concluding Remarks

바탕이 되는 기술들이 발전됨에 따라 비용과 성능은 개선됨
hw와 sw에서 동시에 hierarchical layer of abstraction
ISA(Instruction Set Architecture)는 hw와 sw의 interface
excution time이 가장 적합한 성능 측정 수치
power (전력 소모)는 발전 저해 요소
parallelism을 이용해 성능 개선

2. Instruction

2-1. Hardware’s Operation

Arithmetic Operation

3개 operand : 2개 source와 1개 destination
하드웨어를 단순하게 하자 : 모든 명령어가 피연산자 3개 가짐
설계 원칙 1 : Simplicity favours regularity
Regularity는 implementation을 simpler하게 함
Simplicity는 lower cost로 higher performance를 가능하게 함

2-2. Hardware’s Operand

Register Operand

모든 산술연산은 register 사용
register는 각 32bit로 32개 존재
(Word = 32bit = 4byte)
설계원칙 2 : Smaller is faster
Register가 32개로 제한된 이유
이유 1 : register 갯수 많아짐 -> 전기 신호 더 멀리 전달해야 함 -> clock cycle 길어짐
이유 2 : 명령어 형식에서 register가 사용하는 bit 수와 연관 됨

Memory Operand

메모리에 array, structure, dynamic data 저장
메모리 <—데이터 전송 명령어 (load, store)—> 레지스터 -> 산술 연산
byte 주소 사용 & word(4byte)의 나란한 배열
(0,1,2,3,4) 아닌 (0,4,8,12,16) 사용
Big Endian: 제일 왼쪽 최상위(MSB) 바이트 주소를 워드 주소로 사용
Cf) Little Endian : 제일 오른쪽 최하위(LSB) 바이트 주소를 워드 주소로 사용

Memory vs Register

메모리보다 레지스터에 access 시간 더 빠름
메모리 데이터 사용은 load와 store 등의 추가적인 명령어 작성 필요
Compiler는 레지스터를 가능한 많이 사용해 효율성 높이기
register spilling : 자주 사용하지 않는 변수만 메모리에 넣기

Immediate Operand

Addi와 상수값 이용
Subi는 따로 존재 X -> 음의 상수값 사용
설계원칙 3 : make the common case faster
상수 피연산자를 자주 사용하니 상수 field 가지는 산술명령어 사용
매번 메모리에서 load 할 필요 없음

Constant Zero

상수 0
레지스터 간의 값 이동/복사
Add $t2 $s1 $zero # t2 <- s1 + 0

Signed and Unsigned Number

Unsigned Binary Integer

$X$ = $X_{n-1}2^{n-1}+X_{n-2}2^{n-2} + … +X_{1}2^{1}+X_{0}2^{0}$
32bit 사용
0 (X 모두 0) ~ 4,294,960,295 (X 모두 1)

2s Complement Signed Integer

$X$ = $-X_{n-1}2^{n-1}+X_{n-2}2^{n-2} + … +X_{1}2^{1}+X_{0}2^{0}$
32bit 사용
-2,147,483,648 ( $X_{n-1}=1$ , 나머지 0) ~ 2,147,483,648 ( $X_{n-1}=0$ , 나머지 1)
MSB인 31bit는 sign bit임
1 이면 0 또는 양수, 0 이면 음수
0 : 0000 0000 … 0000
-1 : 1111 1111 … 1111
가장 negative : 1000 0000 … 0000
가장 positive : 0111 1111 … 1111

Signed Negation

$X$ + $\bar{X}$ = -1
$-X= \bar{X}+1$ #complement -> add 1 -> 음수 값

Signed Extension

Numeric value 유지 & use more bit
Addi : extend immediate value
lb, lh : extend loaded byte/halfword
beq, bnq : extend displacement
Signed value : extend MSB(sign bit) to left
Unsigned value : extend with 0

히치키치

이전 포스트

컴퓨터 그래픽스

다음 포스트