Lec 4-Virtual Memory(3)

Lee Nam Gil·2025년 5월 19일

Operating Systems

목록 보기

10/18

TLB: Translation Lookaside Buffer

Problem : Page Table is slow

Page tables are large, and they must go in RAM
- Each virtual memory access must be translated
- Each translation requires a table lookup in memory
- Memory overhead is doubled

Caching

Cache page table entries in the CPU's MMU
TLB stores recently used PTEs
CPU cache is very fast
- Translations that hit the TLB don't need to be looked up in memory

TLB Entry

Do not cache the entire page -> need the index (VPN)
G : global page?
ASID : address space ID
D : dirty bit : written recently?
V : valid bit : valid?
C : cache coherency

TLB control flow

TLB Miss
- Load the page table entry from the memory,
  Add it to the TLB
  Retry instruction

Comparison

10KB array of integers
4KB pages
without TLB
- How many memory accesses are required?
  - 10KB/4 = 2560 integers
  - Each requires one page table lookup, one memory read
  - 5120 reads
with TLB
- TLB starts off cold (empty)
- 2560 to read the integers
- 3 page table lookups (3 pages will be cached)
- 2563 reads

Locality

Spatial locality
- If you access address x, you will access x+1 soon
- Most of the time, x and x+1 are in the same page
Temporal locality
- If you access address x, you will access x again soon
- The page containing x will still be in the TLB

Problems

TLB entries may not be valid after a context switch
VPNs are the same, but PFN mappings have changed

Solutions

Clear the TLB after each context switch
- Each process will start with a cold cache
Associate an ASID with each process
- ASID is just like a PID in the kernel
- CPU can compare the ASID of the active process to the ASID stored in each TLB entry
- If don't match, TLB entry is invalid
- Why don't use PID?
  - If different processes have same ASID, the memory are shared

Replacement Policies

FIFO
Random
LRU: Least Recently Used

Hardware vs Software management

Hardware TLB
- Pros
  - Easier to program = Less work for kernel developers,
    CPU does a lot of work
    - CPU manages all TLB entries
- Cons
  - Complex management is difficult
    Limited ability to modify the TLB replacement policies
Software TLB
- Pros
  - Greater flexibility = No predefined data structure for the page table
  - Free to implement TLB replacement policies
- Cons
  - More work for kernel developers
    - CPU don't
      - Try to read the page table
      - Add/remove entries from the TLB
- TLB Miss
  - CPU don't manage, occur Exception
  - Then, OS handles the exception
    - Insert the PTE into the TLB
    - Tell the CPU to retry the previous instruction

Advanced Page Tables

Problem: Page Table size

Example
- 32-bit system with 4KB pages
- Each page table is 4MB
- Most entries are invalid

Solution 1: Bigger pages

Increase the size of pages
32-bit system with 4MB pages
1024 * 4 bytes per page = 4KB page tables
Problems
- Increased internal fragmentation
- Programs will have much less than 4MB

Solution 2: Alternate Data structures

Hash table or Tree
Why non-feasible?
- Can be done of the TLB is software managed
- If the TLB is hardware managed, OS must use the page table format specified by the CPU

Solution 3: Inverted Page Tables

Flip the table

Map physical pages to virtual pages
Pros
- Only one physical memory,
  -> only need one inverted page table
Cons
- Lookups are more expensive
  - Table must be scanned to locate a given VPN
- What about shared memory?
  - Implement VPN with a list
    -> More time with searching

Solution 4: Multi-Level Page Tables

Split the linear page table into a tree of sub-tables

Branches of the tree that are empty can be pruned
Space / Time tradeoff
- Pruning reduces the size of the table (Save space)
- Tree must be traversed to translate virtual address (Increased access time)

Example

16KB address space
64 byte pages, 14-bit virtual address, 8-bit for the VPN, 6-bit for the offset
How many entries?
- $2^8=256$ entries
256 entries, each is 4 bytes large
-> 1KB linear page tables
1KB linear page table can be divided into 16 64-byte tables
- Each sub table holds 16 page table entries

32-bit x86 Two Level Page Table

64-bit x86 Four Level Page Table

Hybrid approach

IA-32 architecture
- Both segmentation and segmentation with paging
  - Each segment can be 4GB
  - Up to 16K segments per process
  - Divided into two partitions
    - First partition of up to 8K segments are LDT (Local Descriptor Table)
    - Second partition of up to 8K segments shared among all processes GDT (Global Descriptor Table)

CPU generates logical address (selector, offset)
Segmentation unit generates linear address (virtual address) using selector
MMU translates linear address into physical address

Descriptor Table

GDTR points GDT
Put LDT Descriptor value into LDTR, then LDTR points LDT
Get the starting address of segment using selector
Add offset and get linear address

Lee Nam Gil

이전 포스트

Lec 4-Virtual Memory(2)

다음 포스트

Lec 4-Virtual Memory(3)

Operating Systems

TLB: Translation Lookaside Buffer

Problem : Page Table is slow

Caching

TLB Entry

TLB control flow

Comparison

Locality

Problems

Solutions

Replacement Policies

Hardware vs Software management

Advanced Page Tables

Problem: Page Table size

Solution 1: Bigger pages

Solution 2: Alternate Data structures

Solution 3: Inverted Page Tables

Flip the table

Solution 4: Multi-Level Page Tables

Split the linear page table into a tree of sub-tables

Example

Hybrid approach

Descriptor Table

Lec 4-Virtual Memory(2)

Lec 4-Virtual Memory(4)

0개의 댓글