5 - Memory Hierarchy

이태곤·2022년 12월 14일

computer architecture

Computer Architecture

목록 보기

13/13

1. Memory Hierarchy

목표 : 서로 다른 속도, 용량을 가진 디바이스들의 효율적인 구조화를 통해 사용자 입장에서 unlimited fast memory가 있는 것과 같은 illusion을 제공
Multiple levels of memory with different speeds and sizes

-> 디바이스 설계를 위한 H/W 기술과 Hierarchy를 효율적으로 동작시킬 수 있도록 Caching 필요!
Memory Technology
: Processor (CPU)에서 물리적, 논리적으로 멀리 떨어져 있을수록 속도가 느리고 용량이 빠르며 가격이 저렴해진다.
- Disk Memory : non-volatile, managed by OS
- Main Memory (DRAM) : Volatile, managed by OS
- Cache : Volatile, managed by H/W
- Register : Volatile, managed by H/W

2. Locality

캐시는 CPU 와 메인 메모리 사이의 속도 간극을 줄여주는 완충재역할을 한다.

많은 데이터가 있는 상대적으로 속도가 느리고 용량이 큰 디바이스로부터 자주, 곧 사용할 데이터들을 효율적으로 선별해서 상대적으로 속도가 빠르고 용량이 작은 디바이스로 데이터를 이동시킨다.
-> 도서관 : Main Memory
-> 서재 : SRAM (Cache)
-> 나의 책상 : Register

-> 효율적으로 데이터들을 선별하기 위해서 locality를 이용해야 한다.

Locality : locality를 활용해 어떤 데이터들을 캐쉬 메모리에 올릴지 효율적으로 결정할 수 있다.
- Temporal locality : 최근에 참조되었던 데이터는 다시 사용될 확률이 높다.
- Spatial locality : 참조한 데이터에 인접해있는 데이터들은 곧 사용될 확률이 높다.
  - Temporal locality : sum, i
  - Spatial locality : array 'a'
Qualitative Estimates of Locality

-> j 값, Column값을 루프마다 증가시켜 접근함으로써 Spatial locality 활용하여 데이터를 보다 빠르게 접근, 읽을 수 있으므로 시스템의 전체적인 성능을 증가시킬 수 있다.

-> i 값, row를 루프마다 증가시킴으로써 Spatial locality를 활용할 수 없으므로 데이터를 가져오는 시간이 증가하게되어 전체적인 시스템 성능효율이 저하될 수 있다.

3. Cache

Terminoloy
- Block (line) : Memory device 사이에서 데이터가 이동하는 최소단위
  -> 보통 메모리에 접근하여 얻을 수 있는 데이터는 1byte, 반면 Block은 보통 4bytes, 16bytes
- Hit : CPU가 요청한 데이터가 Cache Memory에 있을 경우
  -> Hit Time : time to access the upper level
- Miss : CPU가 요청한 데이터가 Cache Memory에 없을 경우
  -> Miss penalty : time to get block from lower level + time to replace in upper level
- Upper level : SRAM (Cache, small, fast, expensive)
- Lower level : DRAM (large, slow, cheap)
Benefits
- Reduction of memory bandwidth
- Cache is transparent
  -> 캐쉬 메모리를 추가하더라도 ISA를 변경할 필요없다.
  -> 사용자 입장에서는 단순히 캐쉬 메모리를 늘리더라도 컴퓨터 내부적인 변경사항을 전혀 신경쓰지 않고 기존과 완전히 동일하게 사용할 수 있다.
Cache Memory Concept : 캐쉬 메모리를 얼마나, 몇개 사용하느냐에 따라서 No cache, One level cache, Multi-level cache로 나눌 수 있다.
Basic Question about "Cache"

How to place the data in a cache?
: By direct-mapped cache, Associative cache. . .
Which data should be transferred from memory to cache?
: By locality
If cache is full, which data in the cache should be evicted?
: By replacement policy -> FIFO, LRU, LFU, Random
How do we know if the data is present or not in cache?
: By valid bit -> Decide whether cache hit or cache miss
How to handle when the desired data does not exist in cache?
: Cache miss
How to implement an efficient system using a cache?
: Improving performance -> Miss ratio↓, Hit raito↑
: Correctness -> 동일한 entity가 서로다른 memory hierarchy에 있을 때 생기는 Consistance 문제
- ex) Cache에 있는 entity값이 update되었지만 main memory에 있는 동일한 entity가 update되지 않을 경우 발생하는 consistance 문제!

General Cache Mechanics
- Request 14 -> 14 is in cache -> Hit
- Request 12 -> 12 is not in cache -> Miss -> 12 is fetched from main memory
  - Placement policy : 12 (block)이 cache로 이동 될 때 어디에 위치해야 하는지 결정
  - Replacement policy : Cache에서 어떤 block을 eviction할 것인지 결정

4. General Cache Organization (S, E, K)

-> S : Set의 갯수?
-> E : 1개의 SET에 있는 몇 개의 block?
-> K : block의 크기?

Cache Organization : Blocks or Lines
- Block size : cache와 memory 사이에서 데이터가 이동하는 단위
  - power of 2
  - 1 word (4B)
    -> 각 개별 메모리 주소 4개에 해당하는 data를 담고있다.
    -> Block consists of adjacent bytes by spatial locality
- Block Offset : block안에서 targeting 하는 데이터가 몇번째에 있는지?
  - log2(K) = offset bits, where k is the size of block
- Example 풀기!
Cache Organization : Cache Size
- Cache Size : the number of blocks
- Example
  - C = 32 KiB : If block size 64B -> 2^9 (512) Blocks
- Direct-mapped-Cache : 각각의 특정 메모리 주소가 들어갈 수 있는 cache 메모리의 slot이 정해져있을 경우
- Cache slot : memory block address mod # of cache blocks
- 장점
  - 메모리 주소가 mapping 되어질 Cache slot을 Simple하게 계산할 수 있다.
  - CPU가 특정 데이터를 요청했을 때 Cache 어디에 위치해있는지 빠르게 찾을 수 있다.
    -> 특정 메모리 주소를 Cache slot mapping 하는데 있어서 제약이 크고 간단하기 때문이다.
- 단점 : 비어있는 Cache slot이 있더라도 활용하기 어렵다.
Place Data in Cache
- Memory : 64B, Cache : 16B
- Block Num 하위 2bit는 index의 위치를 결정
- Example
  
  →
Place Data in Cache Collision
- Tag bit : Cache 크기의 제한 -> 서로다른 메모리주소들이 하나의 공통된 캐쉬 공간을 공유하고 있기 때문에 있는 데이터가 실제 메모리 어디에서 왔는지 알야아 한다.
- Tag = rest of address bits
Address breakdown
- Index : Cache에서 몇 번째 block?
- Tag : 해당 block 데이터가 실제 메모리주소 어디로 부터 왔는지?
- Offset : block안에서 몇 번째 byte의 데이터?
Cache Read
Direct-Mapped Cache Example I
- index : Block 갯수 2^10 -> 10 bits 필요
- Block offset : 4B -> 2 bits 필요
- Tag : 64 - 10 - 2
- Valid bit 1 = present
- Valid bit 0 = not present, initial value
Direct-Mapped Cache Example II
- 문제점 : 26, 18과 같이 동일한 인덱스를 갖고있는 서로다른 메모리들이 공통된 캐쉬공간 요청하면 Miss ratio↑, Temporal locality를 활용할 수 없다.
  -> Solution I : index에 여러개의 block을 넣을 수 있다.
  -> Solution II : block 사이즈를 늘림으로써 Miss 발생시에 인접한 데이터들도 함께 캐쉬에 load
  -> 목적 : Hit rate를 높여 성능을 개선하는 것!
Types of Cache Miss
- Cold start miss : block에 처음 접근할 때 발생
- Conflict miss : 동일한 index를 가진 서로 다른 메모리 주소가 하나의 block에 매핑되서 발생
  -> Direct-mapped caches have more conflic misses than E-way set associative, where E > 1
  -> Fully-associative has no conflict miss
- Capacity miss : cache의 크기가 충분히 크지 않을 때 발생
Associativity Cache
- Direct-mapped (One-way) : 하나의 index(set)에 하나의 block을 가질 수 있다.
  - Simple, Cheap, Fast
- Set associative (n-way) : 하나의 index (set)에 여러 개의 blocks (entries)들을 가질 수 있다.
  - More complicated hardware = more power consumed, slower than Direct-mapped
    -> Longer hit time, Higher H/W cost : 추가적인 H/W를 통해 index내에서 tag 비교를 통해 타겟 메모리 주소값을 찾고, 선택해야한다.
  - Decrease miss rate
- Fully associative : block들이 cache의 어디에나 위치할 수 있다.
  - Decrease miss rate; however, more power consumption
- Example
Direct Mapped Cache (E = 1)
- Cache block size : 8B
- Address of int : block offset is 4
E-way Set Associative Cache (E = 2)
- Valid bit과 Tag bit을 비교하여 match
- Address of short : block offset is 4
- No match -> replacement policy
4-way Set Associative Cache
- 4KB Cache, Block size 4B
- 4-way : 1,024 index → 256 index
  -> 하나의 set에 4개의 block
  -> Longer hit time
  -> Tag bit, Valid bit 비교하는 Multiplexer 등 추가적인 H/W 필요!
Cache Performance
- Miss Rate : Fraction of memory references not found in cache
- Hit Time : time to deliver a block from cache to processor
- Miss Penalty : Additional time due to miss
  - Memory access time
- Average Memory Access Time (AMAT)
  = Hit time + Miss Rate X Miss penalty
  - Example
Block Size Considerations
- Block size↑ -> Miss rate↓ by spatial locality
- Cache size 또한 동시에 증가시켜줘야 temporal locality를 활용해서 Miss rate↓
  -> block size를 늘려서 spatial locality를 증가시킴에따라 Cache에 담을 수 있는 block 갯수가 줄어듬
Write Handling in Cache : Consistency 문제를 해결하기 위한 방법
- Write-through cache : 데이터에 update가 일어나면 서로 다른 layer에 있는 각각의 디바이스에 속해있는 모든 데이터를 update
  
  -> Simple, Memory traffic 증가, Slow
  -> 한 번에 Update를 처리할 수 있도록 buffer를 이용하기도 한다.
- Write-back cache : block이 cache로부터 replace될 때 Dirty bit checking을 통하여 main memory의 데이터도 synchronization, 업데이트
  
  -> dirty bit을 통해 변경사항 여부를 알린다.
  -> Fast, Complex, Consistency문제를 완전히 해결할 수 없음
Cache Misses
: Cause stall the CPU pipeline
- Instruction cache miss
- Data cache miss
CPU time
- Memory stall cycles
- Example
Multilevel Cache
: Processors have more than 1 level of cache
- Example