[ OS ] 09. Main Memory

38A·2023년 5월 28일

Operating System

목록 보기

8/9

🖥️ Background

Importance of memory management
- In multitasking system, several processes may exist in memory
- Memory is central of modern computer system
Memory structure
- A large array of bytes each with its own address
- Memory unit is interested only in the sequence of memory addresses

Basic Hardware

Protection from other processes
- Each process has a separate memory space.
- Can be implemented by base register and limit register
  → Why register? : memory에 넣으면 참조하는데 오래걸림. in CPU에 두고 빠르게 참조
- CPU H/W compares every address generated
- Base / limit registers can be loaded by privileged instruction

Address Binding

Most systems allow a user process to reside in any part of the physical memory
Representation of address
- Source program: symbolic address
  Ex) count (variable), function name
- Compiler
  - Symbol → relocatable address(지역적 주소) (binding)
    Ex) 14 bytes from beginning of the module
- Linkage editor or loader
  - Relocatable address → absolute address(전역적 주소, 절대적 주소) (binding)
    Ex) 0x74014
Address binding(변환과정): mapping one address space to another
→ 최종 binding을 의미
Compile ➔ linking ➔ loading ➔ execution (linking, loading을 OS에 따라 하나로 볼 수도 있음)
Address binding time
- Compile time: absolute code ( relocation X )
  - MS-DOS .com file
  - If starting location changes, the program should be recompiled
- Load time: relocatable code
  - If starting location changes, the program should be reloaded.
- Execution time → binding
  Ex) Systems that support logical addressing
  - Used by most general-purpose OS’s
  - Requires H/W support for address mapping

Logical vs. Physical Address Space

Logical address (in Process): address generated by CPU in common cases
Physical address (전역적 주소): address seen by memory unit
Logical address vs. physical address
- In compile-time and load-time address binding
  - Logical address = physical address
- In execution-time address binding
  - Logical (virtual) address $\neq$ physical address
Address spaces
- Logical address space: set of all logical addresses generated by a program
- Physical address space: set of all physical addresses corresponding to these logical addresses
Run-time address mapping is done by memory-management unit (MMU)
A simple method of address mapping using relocation register (base register)
- User program never sees physical addresses
  → Why? Process 별로 memory 영역이 다르기 때문 (독립적인 영역)

Dynamic Linking and Shared Library

→ 다시 load되지 않고 shared : 공유해서 사용시 주의!!

Static linking: all object modules are combined into the binary program image
Dynamic linking: linking is postponed until execution time
→ 불필요한 중복을 막고, 효율적으로
Ex) System libraries
- Stub: a small piece of code that indicates ...
  → 본체 대신 link, 필요할 때 load and link
  - How to locate the appropriate library routine
  - How to load the library if it is not ready
- After a library routine is dynamically linked, that routine is executed directly
  - stub replaces itself with address of the routine and execute it.
Advantages of dynamic linking
- Library code can be shared among processes.
  - Multiple versions of a library can coexist
- Library update doesn’t require re-linking.
Dynamic linking requires help from OS

🖥️ Swapping

When a process starts, it must be in memory for execution
However, a process can be swapped to a backing store and back into memory to continue execution
A process swapped-out will be swapped back into the same memory space it occupied previously
- Exception: execution-time binding → MMU relocation register만 바꿔주면 된다
Backing store: fast disk (usually, separated from file system → 다른 partition 필요)
- Should be large enough to accommodate copies of all memory images for all users
- Must provide direct access to memory images
Dispatching → context switcing에 무슨 일이 일어나야 하는지
- Firstly see if the process selected by scheduler is in memory.
- If not, it swaps out a process and swaps in the desired process
Context-switch time in swapping system
- Ex) size of a process: 100 MB → 쪼개서 time ⬇️
  Disk transfer rate: 50 MB/s → 보통 SSD는 10~30배
  Average latency: 8 ms → find data
  Then, swapping time: (100MB / 50MB/s + 8ms) * 2(swap out, in) = 4,016 ms
- Swapping time is affected by amount of memory swapped
  - We would need to swap only what is actually used.
Time slice should be long relative to swap time
→ swap time > time slice : CPU가 놀고, Disk만 돌게 된다
Many systems use modified versions of swapping
- Swapping is enabled only when memory usage exceeds some threshold

🖥️ Contiguous memory allocation

→ Contiguous(연속적인)
→ Externel fragmentation

Memory is usually divided into two partitions
- Resident OS
  - High (or low) memory with interrupt vector
- User processes
  - The other part of memory
Several user processes can reside in memory at the same time
→ How to allocate memory to processes waiting in input queue?

Memory Mapping and Protection

Memory mapping and protection are provided by relocation register and limit register
- Add memory access, MMU maps the logical address dynamically.
- Relocation and limit registers are loaded by dispatcher during context switching

Memory Allocation

Variable partition scheme (MVT) → Memory allocation 정책
- Initially all available memory forms a large block of free memory (hole)
- If a process arrives and needs memory, search a hole large enough for this process.
- If a process terminates, it releases its memory
  - Adjacent holes can be merged
- MVT: Multiprogramming with a Variable number of Tasks

Fragmentation

External fragmentation: free memory space broken into little pieces
→ Process memory 밖에 fragmentation
- Sometimes, it’s impossible to allocate large continuous memory although total size of free memory larger than the required memory
- 50-percent rule (analysis of first fit)
  - Given N allocated blocks, another 0.5N blocks will be lost to fragmentation. (1/3 of memory may be unusable)
- Fragmentation of main memory also affects backing store
Memory allocated to a process may slightly larger than requested memory.
Ex) allocating 18462 bytes(require) from hole whose size is 18464 bytes → 2 byte : internel framentation
- Reduce overhead to keep small holes.
- Generally, physical memory can be broken into fixed-size blocks.
Internal fragmentation
- Difference between the amounts of allocated memory and the requested memory
Remedy of external fragmentation: compaction
- Possible only if relocation is dynamic and is done at execution time
- Expensive
Alternative remedy: permit logical address space of the processes to be noncontiguous
- Paging → 일정한 단위로 쪼갬 → Externel fragmentation 해결, but internel fragmentation
- Segmentation

🖥️ Paging

→ Internel fragmentation

A memory-management scheme that permits the physical address space of a process to be noncontiguous.
Physical memory: consists of frames (fixed-size blocks) → location
Logical memory: consists of pages (fixed-size blocks) → mem content
Pages are mapped with frames through ⭐️ page table (mapping table)
Paging model of logical and physical memory
Logical address consists of a page number and a page offset → page 내에서 상대적 좌표
- Size of logical address space: 2 $^m$
- Page size: 2 $^n$ → Ex 256 → page offset : 8-bit
An example of paging
Paging hardware
Overhead of paging
- Page table → In memory, size가 상당히 크기 때문, lookup 해서 physical memory에 접근하기 때문에 memory 2번 access
- Internal fragmentation
  - In average, half page per process
    Note! In paging, no external fragmentation보다는 internel이 나음
In many systems ...
- Page size (=offset): 4 KB ~ 4 MB
  - 4KB = 4096 bytes(2 $^{12}$ ) = 12-bit (offset), 32-12 = 20-bit (page#) → 2 $^{20}$ = 1,000,000개의 table
  - 4MB = 2 $^{22}$ = 22-bit(offset), 32-22 = 10(page#) → 2 $^{10}$ = 1024개의 table
- Page table entry: 4 bytes (32 bits) → 20 + 12(속성 flag)

X86 Page-Table Entry

Frame Table

OS keeps information about frames in a frame table
- # of total fames
- Which frames are allocated
- Which frames are available
OS also maintains a list of free frames

Hardware Support

Most OS’s allocate a page table to each process
- Pointer to the page table is stored with register(PTBR) values (in PCB)
- In context-switching, dispatcher also loads the correct H/W page table
H/W implementation of page table
- A set of dedicated registers
- Efficiency is very important.
  Ex) DEC PDP-11: page table (8 entries) is kept in fast registers → not feasible if page table is large
Page-table base register (PTBR) → page table 시작 주소 저장
- Page table is stored in main memory
- PTBR points the page table (eg. CR3 register of x86)
  cf. Remaining problem: overhead to access page table in main memory (requires two accesses to access a byte.)
Translation look-aside buffer (TLB): small fast look up H/W cache
→ address translation을 위한 cache memory
- Associative, high-speed memory, consists of pairs (key(page #), value(frame #)
  - If page number of logical address is in TLB, delay is less than 10% of unmapped memory access
  - Otherwise (TLB miss), access page table in main memory
    - Insert (page number, frame number) into TLB
    - If TLB is full, OS select one for replacement
- Some TLB’s store address-space identifiers (ASID) in each entry (ex: MIPS)
  → Context switch시 page table 교체, TLB도 의미 X → ASID 사용해서 구분
  → page# + frame# + ASID
  - ASID identifies process
  - Processors w/o ASID (ex: x86) flush entire TLB on process switch
Paging using TLB

Associative Memory

Associative memory (like hash) – parallel search (H/W search, not S/W)
Address translation (p, d)
- If p is in associative register, get frame # out
- Otherwise, get frame # from page table in memory

Protection

Each frame has a protection bits
- Read-write / read only
- Attempt to write to a read-only page causes H/W trap
  → Extension: read-only, read-write, execute-only, ...
  - Combination of them
Each entry of page table has valid-invalid bit
- OS does not allows to access the page if it is invalid
Rarely does a process use all of its address range
- Page table can be wasting memory
  → some system provides page-table length register (PTLR) (valid한 table length 저장)
Valid-invalid bit in a page table

Shared Pages

Paging provides possibility to share common code
Ex) Process P1, P2, P3 runs the same text editor with different data (editor should be non-self-modifying code)

🖥️ Structure of page table

Hierarchical Paging

Motivation: If a system supports large logical address space, page table itself becomes significant overhead
Ex) 32-bit address space, page size is 4 KB(2 $^{12}$ )
size of page table = 2 $^{(32-12)}$ > 1 Million
- Difficult to allocate contiguous memory → page table은 연속적인 공간에 있어야함
Hierarchical paging
- Page table itself is also paged
  - Structure of logical address

Hashed Page Table

Use hashing to for page table
- Common approach for large address spaces (> 32bits) → 크기 때문에
- Each location of hash table consists of linked list of page table entries

🖥️ Segmentation

Memory from user’s view: collection of variable-size segments, with no necessary ordering
Ex) stack, math library, main program, ...
Segmentation: memory-management scheme that supports user view of memory
- Logical address space is a collection of segments
- Each segment has a name (number) and length
  Logical address = <segment-number, offset>
  Ex) C compiler might create segments such as ...
  - Code
  - Global variables, heap, stacks
  - Standard C library
Example

Hardware for Segmentation

2D address should be mapped 1D sequence of bytes
Segment table
- Consists of segment base and segment limit
  → base address and size→ memory mapping을 segment 단위로

Segmentation Architecture

Segment table – maps two-dimensional physical addresses; each table entry has:
- base – contains the starting physical address where the segments reside in memory
- limit – specifies the length of the segment
Segment-table base register (STBR) points to the segment table’s location in memory
Segment-table length register (STLR) indicates number of segments used by a program;
segment number s is legal if s < STLR
Protection and sharing is also possible for segmentation

🖥️ Example: Intel Pentium

Some architectures supports both paging and segmentation
- Intel x86: pure segmentation, segmentation with paging

Pentium Segmentation

Maximum size of a segment: 4 GB → 32-bit 머신이 가지는 가장 큰 address (2 $^{32}$ = 4G)
Maximum # of segments: 16 K (=16,384)
- 1st partition for private segments (8 K)
  → kept in local descriptor table (LDT) → process마다 8K segment table
- 2nd partition 8 K for shared segments (8 K)
  → kept in global descriptor table (GDT)
Pentium has ...
- Six segment registers (CS(Code), SS(Stack), DS(Data), ES(Extra), FS, GS)
  → segment #를 저장, offset은 pointer
- Six 8-byte microprogram registers to hold descriptors from LDT or GDT
  - Cache for LDT/GDT entry → segment table을 caching
Segment descriptor
- 64 bit information that describes attribute of a segment
Logical address: pair of (selector, offset)
- Selector: 16 bit number
  - Segment number (13 bits=8 k)
  - GDT or LDT (1 bit)
  - Protection (2 bits)
- Offset: 32-bit number

Pentium Paging

Pentium supports a page size of 4 KB or 4 MB
- Page directory entry includes page size flag
- 4 KB page: two-level paging (page size flag = 0)
  - PSE bit of CR4 register
- 4 MB page (page size flag = 1)