10. Demand Paging (2)

초강송·2024년 12월 4일

목록 보기

12/13

다음 내용은 아주대학교 김상훈 교수님 운영체제 강의 및 강의 자료와 Operating Systems: Three Easy Pieces(https://pages.cs.wisc.edu/~remzi/OSTEP/)을 참고하여 작성한 글입니다.

Virtual Memory (VM)

Use a large and contiguous virtual address for memory references
CPU (and MMU) performs address translation at run time
- virtual address -> corresponding physical address
CPUs ask operating systems to help if necessary

VM Advantages

Separate user logical memory from physical memory
- Abstract main memory into an extremely large, uniform array of storage
- Free programmers from the concerns of memory-storage limitations
Allow processes to run with only a part of its entire memory in main memory
Allow address spaces to be shared by several processes
Less I/O needed to load or swap processes

Features and issues in VM

Shared Memory

Want to share data between processes
- To exchange data between processes
- To reduce memory footprint
OS can allow processes to share the same page by manipulating their page tables
- Load data on a page frame
- Set the PTEs of processes to point to the same page frame

즉 그림처럼 서로 여러 두 logical pages가 같은 physical page frame에 mapping 될 수 있다. 이때, OS는 page frame을 참조하는 logical page의 수를 Ref count로 기록한다.

Copy-on-Write

when the OS needs to copy a page from one address space to another, instead of copying it, it can map into the target address space and mark it read-only in both address space.
- both only read the page: no further action is taken
- one tries to write: trap the OS and OS allocate a new page

예를 들어, fork의 경우 child address space 생성을 위해 parent address space를 duplicate 했었지만, shared memory를 사용하면 이 과정이 훨씬 간단해진다. child가 parent의 mapping 정보를 share 하면 되기 때문이다. 한 가지 고려해야 할 점은, 서로의 write 행위가 서로에게 영향을 미쳐서는 안 된다는 것이다. 즉, parent가 수정한 내용이 child에게까지 반영이 되어선 안 된다. 따라서 아래와 같이 copy-on-write가 적용된다.

Instead of copying pages, create shared mappings to the same page frames in physical memory
Shared pages are protected as read-only
Writes generate a protection fault
OS copies the page, chages page mapping, and restart write instruction

예시로 살펴보자. 여기서는 process A가 fork 하여 process B가 생겼다. process A와 process B는 mapping 정보를 공유한다. 즉 process A의 page table이 duplicate 된다. (오른쪽 그림의 physical frame에 적힌 숫자는 이 frame에 mapping 되어 있는 page 수를 나타낸다.)

이때, process A와 process B의 write permission은 모두 끈다. process A가 수정한 사항이 process B에게까지 반영이 되어서 안 되기 때문이다. 그렇다면 원래 이 page가 write가 가능한데 fork를 하면서 꺼진 건지, 원래부터 write가 불가능했는지 어떻게 알 수 있을까? 이는 page table에 적어둘 수도 있고, OS가 맨 처음에 executable file을 보고 code, data 등의 range를 기록해 둘 때 각 영역에 대한 permission도 다른 자료 구조에 같이 기록할 수도 있다고 한다.

이와 같은 상황에서 원래는 x = 10이었던 값을 process B가 x = 20으로 수정하려고 하면 write bit가 꺼져 있으므로 page fault가 발생할 것이다. 그러면 OS는 valid bit, PFN, 원래 permission과 현재 permission 정보를 모두 이용하여 판단을 내린다. 이 상황에서는 mapping이 올바르게 되어 있으니 valid bit은 켜져 있을 것이고, PFN 또한 제대로 적혀 있을 것이다. 또 현재 permission은 R이지만 원래 permission이 RW이므로, OS는 이 physical frame을 복제하여 값을 x = 20으로 update 하고, update 된 frame과 fault가 난 page를 mapping 할 것이다. 그런 다음 이 mapping 정보를 담은 PTE의 write bit를 켜고, 원래 ref_count가 2였던 frame을 ref_count = 1로 수정할 것이다.

만약 process A도 x = 10이 아니라 x = 30으로 값을 수정하고 싶다면 어떡해야 할까? 해당 page는 더 이상 process B와 공유하지 않는 page이다. 즉 이 상황에서는 copy-on-write 대상인 page를 참조하는 process가 process A뿐이다. 따라서, OS는 해당 PTE의 write bit를 켜서 copy 없이 process A가 write 하게끔 한다.

Demand zeroing of pages

예를 들어, malloc(1GB)를 실행했다고 하자. Naive하게 생각해 보면, OS가 우리가 요청한 크기만큼의 pages를 physical memory에서 찾아서 준 것 같다. 그러나 이러한 naive implementation은 cost가 크다. 특히, OS가 할당해 준 page를 실제로 사용하지 않을 경우엔 더더욱 비효율적이다.

이를 해결하기 위해 demand zeroing을 사용한다. 이 기법을 활용하면 os는 훨씬 적은 작업만으로 요청을 처리할 수 있다.

앞선 상황처럼 malloc(1GB)를 실행하는 경우, OS는 실제로 해당 크기만큼의 memory를 즉시 physical memory에서 할당하지 않는다. 대신, 요청된 memory만큼 table에 marking해 둔다(주로 "reserved for OS" 을 의미하는 bits를 사용한다.).

그러다 실제로 process가 그 영역에 접근을 하면, trap이 발생해 OS가 개입한다. 이 trap을 처리하는 과정에서 OS는 전체적인 상황을 보고 "아까 이 process가 달라고 한 공간이지만 내가 안 준 거구나." 라고 판단을 내릴 수 있을 것이다. 이 상황에서 만약 process가 시도한 게 read이면, os는 새로운 page를 할당하는 게 아니라 0으로 채워진 zero page로 mapping 해 준다. 이 process가 시도한 게 write라면, os는 그제서야 physical frame을 찾아 mapping 해 준다.

Where is the operating system in the memory?

In general, OSes split the virtual address space into two parts
- User address space and kernel address space
User processes share the same kernel address space

즉, 각 process의 page table 내에서 kernel 영역의 mapping은 동일하다

Trashing

Demand paging을 쓰는 환경에서 working sets of processes가 physical memory 크기보다 클 경우 paging으로 인해 computer의 성능이 급격히 낮아지는 것을 의미한다.
- Working set: a set of pages that a process is using actively
- Most of time is spent by an OS paging data back and forth from disks
- Possible solutions
  - Kill processes
  - Buy more memory

그림에서 x축을 working sets of processes로 봐도 된다.

Prepaging

Initiate paging in advance (aka prefetching)
- To avoid the larger number of page faults when a process starts up
It is important to bring in which pages and how many pages

disk는 data를 준비하는 시간이 data transfer 시간보다 길다. 또, page 1개를 fetch 해 오는 시간이 10초라고 하면 연속된 2 page를 fetch 해 오는 데 10초가 아니라 12초 정도가 걸린다. 따라서 OS 입장에서는 한 번 page를 fetch 해 올 때 연속적인 spatial locality를 갖는 pages까지 한 번에 갖고 오고 싶은 것이다.

Buddy System Allocator

paging을 공부하면서 더 이상 physically contiguous 하게 memory를 할당하지 않아도 됐다. 그러나 반드시 physically contiguous하게 memory를 할당해야 할 경우가 있다. 예를 들어, device controller는 보통 memory의 physical address와 data의 length를 이용하여 data를 읽어오거나 쓰는 작업을 수행한다. 즉 device controller는 '이 시작 주소에서 이 길이만큼만 읽어오면/쓰면 되구나' 이렇게 판단한다. 따라서 이런 상황에서 계속 page를 physically contiguous 하게 할당하게 되면 external fragmentation이 발생하기 마련이다.

Buddy System Allocator에서는 page를 $2^n$ 단위로 관리한다. 따라서 10 pages를 요청받을 경우 $2^4 = 16$ pages를 제공한다. 6 pages를 요청받을 경우 $2^3 = 8$ pages를 제공한다. 만약 8pages가 없을 경우 16pages가 8pages, 8pages로 쪼개진다 .이때, 8pages짜리 두 개를 서로의 buddy라고 표현한다.

chunk(physically contiguous memory)를 free 시킬 때는 이 chunk의 buddy가 free인지 혹은 사용 중인지 확인한다. 만약 buddy도 free 상태라면 이 두 chunk를 합친다. 아래의 이미지는 1 page가 사용 중이었다가 free 되어 buddy와 합쳐지는 상황이다. 맨 마지막 상태에서 볼 수 있듯 같은 크기(여기서는 4pages)라고 해도 buddy가 아니면 합쳐지지 않는다.

이런 식으로 memory allocation은 최대한 요청하는 크기에 딱 맞게 주고, free 상태인 buddy들은 합쳐서 빈 공간을 커지도록 만든다. 이를 통해 external fragmentation을 완전히 방지할 수는 없지만 어느 정도 control할 수 있다.