Memory Virtualization: Paging(1)

3zu·2022년 5월 3일

ASID LRU Memory virtualization PFN Paging Translation Lookaside Buffer address translation least recently used locality page page table replacement policy tlb vpn

운영체제

목록 보기

11/20

Recap

Fragmentation : wasted space. huge fragmentation = low utilization
- External : free gaps between allocated chunks
  request memory를 충족하지 못하는 chunk들이다. request size보다 전체 free space가 더 크더라도 그 space가 fragmented되어 segment를 할당할 수는 없는 상태이다.
- Internal : don't need all memory within allocated chunks
  우리가 utilize하지 못하는 space이다. process에게 할당되었지만 사용되지 않는 공간인데 이미 process에게 할당되었기 때문에 그 공간을 utilize하지 못한다.

이 그림처럼 각 process는 base/bound pair로 이루어진 translation map을 가지고 있고 각 segment가 서로 다른 physical address space에 mapping되어있다.

Paging

segmentation은 process의 address space를 split하는 개념이었다.
segmentation은 process의 address space를 contiguous area로 잘랐고 그 점이 fragmentation issue를 만들고 memory allocation을 어렵게 만들었다.

그렇다면

process의 address space를 쪼개는 대신 physical memory space를 쪼개면 어떨까?

process address space를 code, data, heap, stack 이렇게 segment로 자르지 말고 physical address space small fixed size frame를 자르는 것이 paging이다.

paging은 address space를 page라고 불리는 fixed-sized unit(fixed-sized small segmentation)으로 쪼갠다.
segmentation은 code, stack, heap처럼 logical segment의 variable size로 쪼갰다.

paging에서는 physical memory도 page frame이라고 부르는 몇 개의 page로 쪼개진다.
physical address space가 multiple number of page로 쪼개지고 process는 이제 그 page로 구성되는 것.

page table은 process마다 존재하며 virtual address를 physical address로 translate할 때에 사용된다.

Advantages of Paging

Flexibility : address space의 abstraction을 효과적으로 support한다.
small fixed sized unit을 사용하기 때문에 fragmentation을 줄일 수 있다. 그리고 size가 fix되어있기 때문에 bound register가 필요없다. 그냥 그 범위를 넘어가는지만 체크하면 된다.
또한 heap과 stack이 어떻게 자라고 사용되는지에 대한 가정이 필요하지 않다.
Simplicity : free-space management가 쉬워진다.
physical memory의 free space를 manage하기 위해 fixed size unit을 사용하기 때문에 coalesce나 compaction을 수행할 필요가 없다.
address space의 page와 physical memory의 page frame이 같은 크기이다. 따라서 free list를 allocate하고 keep하는 것이 쉽다.

아래의 그림은 16-bytes의 page frame을 가진 128-bytes physical memory이다.
또한 16-bytes의 page를 가진 64-bytes의 address space이다.

Address Translation

virtual address는 두 component로 이루어져 있다:

VPN : virtual page number
VPN은 virtual page number를 identify하는 데에 사용된다. VPN에 맞는 physical page frame number(PFN)을 찾아서 offset을 더한다. 그것이 우리가 실제로 사용하는 physical address이다.
Offset : offset within the page

64-bytes address space에서 virtual address 21은 다음과 같이 표현된다.
VPN부분은 address translation을 통해 PFN를 찾고, offset은 그대로 사용된다.

Where Are Page Table Stored?

virtual address를 physical address로 바꾸기 위해서는 page table이 필요하다. 그래서 page table이 memory space를 occupy하게 된다.

이전에는 segment를 나타내기 위해서 base/bound pair를 사용했지만 page를 사용하게 되면 page의 size가 일정하므로 bound를 나타낼 필요가 없어진다.
더 작은 page를 사용할 수록 physical memory를 표현하기 위해 더 많은 page가 필요하고 각 process가 엄청 많은 수의 page를 갖게 된다면 매우 큰 page table을 할당해야 한다.
page table의 size는 실제 physical address에서 추출한 일부 bits임.

4-KB page로 이루어진 32-bit address space가 있고 20-bits가 VPN으로 사용된다면
1KB가 2^10이므로 4-KB page라면 offset은 12-bits가 된다.
그럼 당연히 VPN은 20-bits가 된다.

page table은 VPN과만 관련이 있다.
VPN이 20-bits이므로 page table은 2^20 entry를 가지고 있고, 32-bits이므로 각 page entry는 32-bits(4-bytes)이다.

따라서 page table의 크기는

(2^20 entries) * (4 Bytes per page table entry) = 4MB

cf) 2^20 = 1MB

process 입장에서는 어떤 page가 할당될지 모르기 때문에 미리 4MB page table을 할당하고 있어야 한다. 즉, 각 process의 page table마다 4MB가 필요한 것이다.

그냥 몇 KB의 메모리만 사용하는 엄청 작은 single process를 실행하기 위해서도 OS는 전체 32bit address space를 cover하기 위해 4MB의 page table이 필요하다.

어떤 process를 실행하든 ready for process를 하기 전에 page table allocation이 완료되어야 한다.

How to Implement Paging?

page table의 base address는 CR3라는 control register에 들어있다.
process가 switch돼서 새로운 process를 사용할 때에는 새로운 CR3를 사용해야 한다. 그렇지 않으면 old process에게 valid한 page table을 new process가 사용하기 때문이다.

VPN으로 따라간 결과 map되지 않은 page number에 접근하고자 하면 segmentation fault를 발생시킨다.

page table은 PFN을 가지고 있으며 각 process에게 존재한다.
page table은 virtual page number를 적지 않는다. VPN은 implicit한 개념이기 때문이다. page table에는 PFN과 각 virtual page에 대한 permission에 대한 정보가 담겨 있다.
4-byte page이므로 16진수 기준 2bit를 in page offset으로 비워두어야 한다.

Common Flags Of Page Table Entry

Valid Bit : translation이 valid한지 나타낸다.
Protection Bit : page가 read, write, execute될 수 있는지 나타낸다.
Present Bit : 이 page가 physical memory에 존재하는지 disk에 존재하는지 나타낸다.
Dirty Bit : 이 page가 memory에 옮겨진 후 modify됐는지 나타낸다.
Reference Bit(Accessed Bit) : page가 access 되었는지 나타낸다.

Paging: Too Slow

적절한 page table entry의 위치를 찾기 위해서는 page table의 base address를 갖고 있어야 하며 이것이 CR3 레지스터에 들어있다.

모든 memory reference마다 paging은 OS가 추가적인 memory reference를 하도록 요구한다.
virtual address를 physical address로 바꾸기 위해서 page table에 access해야하기 때문이다.
그리고 이 과정이 매우 많이 일어나기 때문에 hardware의 도움이 필요하다.

아래와 같은 코드를 생각해보자.

int array[1000];
...
for (i = 0; i < 1000; i++)
	array[i] = 0;

이 코드를 assembly code로 만들면 다음과 같다.
%edi는 array의 base address를 담고 있고, %eax는 index i를 담고 있다.

0x1024	movl $0x0, (%edi, %eax, 4)
0x1028	incl %eax
0x102c	cmpl $0x03e8, %eax
0x1030	jne 0x1024

위의 어셈블리 코드는 아래와 같은 memory access를 수행한다.
맨 아래의 code 영역에서, instruction을 fetch하기 위한 memory access를 보여준다. 특정 instruction을 실행하기 위해서 해당 주소에 access해야한다.

그 위의 array 영역은 (%edi, %eax, 4)를 통해 access하는 address이다. %eax가 4씩 증가함에 따라 이 address가 조금씩 증가하고 있다.

그 위의 page table영역에서는 접근하는 page에 대한 내용이 나와있다.
instruction은 서로 인접해있기 때문에 같은 page안에 0x1024~0x1030이 들어있지만 0x40000은 멀리 떨어져 있기 때문에 다른 page에 access하게 됨을 확인할 수 있다.

이전의 segmentation으로 인한 fragmentation 문제는 해결했지만 memory 참조를 많이 하기 때문에 속도가 느려지고 아직 request되지 않은 memory를 위해 page table을 할당하고 있어야 한다는 문제가 생겼다 (fragmentation).

결국 page table에서도 또 다시 fragmentation 문제가 생긴 것이다. process를 별로 돌리지도 않는데 사용하지도 않을 무식하게 큰 page table을 할당하고 있어야 하니까.
작은 프로세스 하나를 돌리려고 사용하지 않는 so many entry를 할당하는 것은 쓸모가 없고 internal fragmentation을 유발한다.

이 문제를 해결하기 위해서 이전에 single big chunk를 fixed size로 쪼갰던 것처럼 page를 다시 쪼개서 last level의 작은 조각을 page로 쓰면 된다.
page가 클 때 쓰이지 않는 공간 때문에 internal fragmentation이 생기는 것이기 때문에 multi-level paging을 사용하면 사용하지 않는 공간을 줄일 수 있다.

다만 multi-level paging을 사용하면 final physical address를 얻기 위해서 여러 page table을 방문해야 한다는 downside가 있다.

page table의 문제점
1. memory access to address translation -> TLB
2. fragmentation in a page table -> smaller page tables

Translation Lookaside Buffers (TLB)

TLB는 page table에 대한 memory access를 줄여준다.

TLB는 memory management unit(MMU)의 일부이다. virtual-to-physical address translation에서 자주 쓰이는 값을 caching하는 것이다.
hardware에서 address translation을 수행하더라도 page table에 매우 많이 접근해야하기 때문에 cache를 사용한다.

translation의 결과를 small size buffer(TLB)에 넣고 CPU가 logical address를 줄 때마다 제일 먼저 TLB에서 그 virtual page number를 찾고 hit하면 TLB에 저장되어 있던 physical frame number를 사용한다.
TLB miss라면 page table에 접근해 PFN을 알아내야 한다 (miss panalty).

virtual address에서 VPN을 뽑아내고 TLB에 해당 VPN에 대한 PFN 값이 들어있는지 확인한다.

TLB hit이면 해당 PFN을 그대로 사용하고 TLB miss이면 page table에 access해서 PFN을 찾아낸다.
그리고 그 결과로 TLB를 update한다.

TLB는 memory가 아니라 hardware이기 때문에 CPU cycle이 훨씬 덜 소모되는 cheap한 방법이다.
mode switch는 TLB를 shutdown할 필요가 없다. 어차피 같은 process 내부이기 때문이다.
하지만 context switch에서는 TLB를 shutdown해야하기 때문에 mode switch에서보다 expensive하다.
여기서 총 3번의 TLB miss가 난다.
TLB miss의 miss panalty는 physical address를 알아내기 위해 memory access를 해야한다는 것이다.

Locality

Temporal Locality

최근에 access된 instruction이나 data가 가까운 미래에 다시 access되는 경향이 있는 것이다.

Spatial Locality

최근에 access된 address 근처의 memory가 가까운 미래에 다시 access되는 경향이 있는 것이다.

Who Handles The TLB Miss?

CISC에서는 TLB miss를 hardware가 handle한다.
hardware가 page table에서 적합한 page table entry를 찾아 translation을 찾고 TLB를 update한 뒤 instruction을 다시 실행한다.
hardware-managed TLB이다.

RISC에서는 TLB miss를 software가 handle한다.
TLB miss가 발생하면 hardware는 exception을 발생시키고 trap handler를 호출해 TLB miss를 handle하게 된다.
software-managed TLB이다.

TLB Issue: Context Switching

context switching을 할 때 TLB를 전부 비워버리면 TLB miss가 계속 나기 때문에 이를 방지하기 위해 TLB table을 share할 수 있다.
TLB table을 그냥 share하면 어떤 entry가 어떤 process에 해당하는 것인지 구분할 수 없기 때문에 more bits를 사용해서 구분해주어야 한다.
이 때 사용하는 것이 address space identifier(ASID)이다. additional bit으로 어떤 process의 entry인지 나타낸다.