Understanding xv6: OS Organization

1231·2024년 3월 28일
0

Understanding_xv6

목록 보기
1/6

Kernel Organization:

monolithic kernel:
entire operating system resides in the kernel, so that all system calls runs in supervisor mode.

downside of monolithic kernel:
Interfaces between each parts of operating system is often complex and therefore it is easy for developers to make mistake.
In monolithic kernel, each mistakes are critical to kernel; one mistake might fail the whole kernel and reboot might be required.

micro kernel:
amount of system code that run in supervisor mode is minimumized so that the risk of the mistake can be reduced.
(OS services running as a process are called "server".)

File system is implemented in user mode.
When an application like shell want to read/write file, the application send message through kernel interface.
The kernel interface consists of few low-level functions for starting application, accessing hardware, and sending a message to other user process.

xv6 is monolithic kernel, like any other Unix based operating systems.

The first address space :

Address space layout:

address space includes user memory starting at virtual memory zero.
Instructions comes first, followed by global variables, then the user stack, and heap memory that can be expanded by malloc().
Maximum address is 2^38-1, MAXVA.
Xv6 uses code in a trampoline page to trasition into kernel and back; trapframe is necessary to save/restore the context of the process.

Each process has two stacks, user stack & kernel stack.
user stack: application is executed in this stack.
kernel stack: system call code, interrupt processing is executed in this stack.

How the first address space is organized:
The first step to provide isolation is setting up the kernel address space for its own use.

1. The boot loader:

BIOS: when PC boot, it executes program called BIOS(Basic Input/Output System).
Its job is to load the kernel boot loader from the boot sector(512 byte from 0x7c00), then transfer control to the code loaded from boot sector.

Boot loader: stored at 0x7c00 through 0x7e00. It load the xv6 kernel into memory from disk, then transfer control to the xv6 kernel. this comprises two source files; bootasm.S & bootmain.c.

Bootloader simulates Intel 8088; use 16-bit register, 20-bit memory address.
->use segment registers CS(Code Segment), DS(Data Segment), SS(Stack Segment), ES(Extra Segment) to provide additional bit.
ex) CS << 4 + offset register = 20-bit memory address
The BIOS does not guarantee anything about the contents of %ds, %es, %ss, so first order of business after disabling interrupts is to set %ax to zero and then copy that zero into %ds, %es, and %ss.

12 start:
 13   cli                         # BIOS enabled interrupts; disable
 14
 15   # Zero data segment registers DS, ES, and SS.
 16   xorw    %ax,%ax             # Set %ax to zero
 17   movw    %ax,%ds             # -> Data Segment
 18   movw    %ax,%es             # -> Extra Segment
 19   movw    %ax,%ss             # -> Stack Segment
 20

to remain compatible with older architecture like 8088 20-bit memory address, A20 gate is disabled by default. A20 gate can be enabled by keyboard controller.
If the second bit of the keyboard controller’s output port is low, the 21st physical address bit is always cleared; if high, the 21st bit acts normally. The boot loader must enable the 21st address bit using I/O to the keyboard controller on ports 0x64 and 0x60.

21   # Physical address line A20 is tied to zero so that the first PCs
 22   # with 2 MB would run software that assumed 1 MB.  Undo that.
 23 seta20.1:
 24   inb     $0x64,%al               # Wait for not busy 
 25   testb   $0x2,%al				  # check if keyboard controller is not busy
 26   jnz     seta20.1    
 27
 28   movb    $0xd1,%al               # 0xd1 -> port 0x64
 29   outb    %al,$0x64				  # set 
 30
 31 seta20.2:
 32   inb     $0x64,%al               # Wait for not busy
 33   testb   $0x2,%al
 34   jnz     seta20.2
 35
 36   movb    $0xdf,%al               # 0xdf -> port 0x60
 37   outb    %al,$0x60

BIOS start in Real-mode.
Protected mode allows address to have 32-bit address.
In protected mode, the segment register is index into GDT.
GDT(Global Descriptor Table): in x86,memory management are controlled through tables of descriptors. Each table entry specifies a base physical address, a maximum virtual address called the limit, and permission bits for the segment.

To enable protected mode, setting the 1 bit(CR0_PE) to cr0.
Enabling protected mode does not immediately change how the processor translates logical to physical addresses; it is only when one loads a new value into a segment register that the processor reads the GDT and changes its internal segmentation settings. (ljmp to specify $cs segment selector)

39   # Switch from real to protected mode.  Use a bootstrap GDT that makes
 40   # virtual addresses map directly to physical addresses so that the
 41   # effective memory map doesn't change during the transition.
 42   lgdt    gdtdesc #load global descriptor  table 
 43   movl    %cr0, %eax
 44   orl     $CR0_PE, %eax #enable protected mode by setting 1 bit in cr0.
 45   movl    %eax, %cr0
 46
 47 //PAGEBREAK!
 48   # Complete the transition to 32-bit protected mode by using a long jmp
 49   # to reload %cs and %eip.  The segment descriptors are set up with no
 50   # translation, so that the mapping is still the identity mapping.
 51   ljmp    $(SEG_KCODE<<3), $start32 //set code segment selector as Base:SEG_KCODE<<3,offset:start32

start32(first 32-bit action) initializes data segment register with SEG_KDATA, and then setting up the stack in unused memory for executing bootmain.c C code.
The stack grows down from 0x7c00($start) toward 0x00000.
Finally boot loader call bootmain C code. Its jobs is to load the kernel from disk to memory and transfer control to kernel. It only returns if something's gone wrong in bootmain code. In that case, it returns to 0x8a00 port where the nothing is conntected in real machine; in simulator, it is connected to its simulator. It then loops.

64   # Set up the stack pointer and call into C.
 65   movl    $start, %esp
 66   call    bootmain
 67
 68   # If bootmain returns (it shouldn't), trigger a Bochs
 69   # breakpoint if running under Bochs, then loop.
 70   movw    $0x8a00, %ax            # 0x8a00 -> port 0x8a00
 71   movw    %ax, %dx
 72   outw    %ax, %dx
 73   movw    $0x8ae0, %ax            # 0x8ae0 -> port 0x8a00
 74   outw    %ax, %dx
 75 spin:
 76   jmp     spin
 77

The bootmain.c expects to find kernel image(ELF format) at second disk sector.
It places first 4096 byte size ELF header in 0x10000, and then check if this is ELF executable file.
Then loading the data from the disk(readseg()) and set remainder of the sector to zero(stosb()). Finally, call entry point from the ELF header.

 17 void
 18 bootmain(void)
 19 {
 20   struct elfhdr *elf;
 21   struct proghdr *ph, *eph;
 22   void (*entry)(void);
 23   uchar* pa;
 24
 25   elf = (struct elfhdr*)0x10000;  // scratch space
 26
 27   // Read 1st page off disk
 28   readseg((uchar*)elf, 4096, 0);
 29
 30   // Is this an ELF executable?
 31   if(elf->magic != ELF_MAGIC)
 32     return;  // let bootasm.S handle error
 33
 34   // Load each program segment (ignores ph flags).
 35   ph = (struct proghdr*)((uchar*)elf + elf->phoff);
 36   eph = ph + elf->phnum;
 37   for(; ph < eph; ph++){
 38     pa = (uchar*)ph->paddr;
 39     readseg(pa, ph->filesz, ph->off);
 40     if(ph->memsz > ph->filesz)
 41       stosb(pa + ph->filesz, 0, ph->memsz - ph->filesz);
 42   }
 43
 44   // Call the entry point from the ELF header.
 45   // Does not return!
 46   entry = (void(*)(void))(elf->entry);
 47   entry();
 48 }

The kernel has been compiled and linked so that it can be found at virtual memory starting at 0x80100000. (as kernel.asm describes)
The paging hardware is not yet enabled. Once the paging hardware is enabled, 0x80100000 will point to 0x00100000.
kernel.ld specifies ELF to cause boot loader to load kernel at memory starting at physical memory 0x00100000.

2. entry

executes kernel starting at entry.

First, set page directory:
loads physical address of the entrypgdir to register %cr3.
It set up page table that maps virtual memory address 0x80000000(KERNBASE) to physical memory address 0x0.
The entry page table is defined in main.c
->Entry 0 maps virtual memory 0:0x400000 to physical memory 0:0x400000. This mapping is required as long as entry is executing at low address. It is removed after entry execution. Entry 512 maps virtual memory KERNBASE:KERNBASE+0x400000 to physical memory 0:0x400000 where the instruction and data needed for kernel is loaded by boot loader. It restricts size of instruction and data to 4MB.

Second, enable paging
by setting CR0_PG at register %cr0, the paging feature is enabled and kernel can start to use high address.

Third, set up the stack pointer.
assembly directive ".comm" allocates specified size at data section.

Fourth, jump to main and switch to executing at high address.

 43 # Entering xv6 on boot processor, with paging off.
 44 .globl entry
 45 entry:
 46   # Turn on page size extension for 4Mbyte pages
 47   movl    %cr4, %eax
 48   orl     $(CR4_PSE), %eax
 49   movl    %eax, %cr4
 50   # Set page directory
 51   movl    $(V2P_WO(entrypgdir)), %eax  # macro V2P_WO subtracts KERNBASE to find out physical address 
 52   movl    %eax, %cr3
 53   # Turn on paging.
 54   movl    %cr0, %eax
 55   orl     $(CR0_PG|CR0_WP), %eax
 56   movl    %eax, %cr0
 57
 58   # Set up the stack pointer.
 59   movl $(stack + KSTACKSIZE), %esp  # stack grows down 
 60
 61   # Jump to main(), and switch to executing at
 62   # high addresses. The indirect call is needed because
 63   # the assembler produces a PC-relative instruction
 64   # for a direct jump.
 65   mov $main, %eax
 66   jmp *%eax
 67
 68 .comm stack, KSTACKSIZE

Creating First Process

main initializes several settings and call userinit() to create first process.
userinit() first call allocproc() that tries to find a UNUSED process in process table and mark it EMBRYO.

allocproc() also sets up the new process with a specially prepared kernel stack and set of kernel registers that cause it to ‘return’ to user space when it first runs. It does that part by causing process to execute forkret and then trapret.
This forkret() function will return to whatever address is at the bottom of the
stack. In the bottom of the stack, trapret() exists. trapret() restores user register from the values stored at the top of the kernel stack.
userinit() writes user register-like values at the top of the kernel stack.
These value is 'struct trapframe' which stores user register.
allocproc():

105   // Set up new context to start executing at forkret,
106   // which returns to trapret.
107   sp -= 4; //decrement; push
108   *(uint*)sp = (uint)trapret;
109
110   sp -= sizeof *p->context;
111   p->context = (struct context*)sp;
112   memset(p->context, 0, sizeof *p->context);
113   p->context->eip = (uint)forkret

First process is going to execute small program(initcode.S).
It needs memory to store the program.
setupkvm() to set up page table for mapping memory that only kernel uses.
Its first memory is filled with compiled initcode.S(the linker embeds it in the kernel) by using defined two symbol '_binary_initcode_start[]', '_binary_initcode_size[]'.
Userinit copies that binary into the new process’s memory by calling inituvm, which allocates one page of physical memory, maps virtual address zero to that memory, and copies the binary to that page.
userinit():

128   initproc = p;
129   if((p->pgdir = setupkvm()) == 0)
130     panic("userinit: out of memory?");
131   inituvm(p->pgdir, _binary_initcode_start, (int)_bi	nary_initcode_size);

Then userinit sets up the trap frame with the initial user mode state.

Running the first process

after calling userinit(), main calls mpmain() to start first process.
mpmain() calls scheduler() to find RUNNABLE process and start it.

scheduler():
it sets its per-cpu struct variable proc to selected process(initproc).
swtichuvm() tells hardware to start using page table of the selected process. It is possible to changing page table while running kernel because the page table mapping for the kernel data/code is identical(userinit()).
swtch() perform context switching to target process's thread. The current context is not the process but rather per-cpu scheduler context, so its context is stored in the hardware register(cpu->scheduler).
proc.c scheduler()

322 void
323 scheduler(void)
324 {
325   struct proc *p;
326   struct cpu *c = mycpu();
327   c->proc = 0;
....
339       // Switch to chosen process.  It is the process's job
340       // to release ptable.lock and then reacquire it
341       // before jumping back to us.
342       c->proc = p;
343       switchuvm(p);
344       p->state = RUNNING;
345
346       swtch(&(c->scheduler), p->context);
347       switchkvm();
....
356 }

swtch() loads the saved context register of the target process including the stack pointer and target instruction. The final ret instruction pops the target process’s %eip from the stack, finishing the context switch.
swtch.S

 10 swtch:
 11   movl 4(%esp), %eax
 12   movl 8(%esp), %edx
 13
 14   # Save old callee-saved registers
 15   pushl %ebp
 16   pushl %ebx
 17   pushl %esi
 18   pushl %edi
 19
 20   # Switch stacks
 21   movl %esp, (%eax)
 22   movl %edx, %esp
 23
 24   # Load new callee-saved registers
 25   popl %edi
 26   popl %esi
 27   popl %ebx
 28   popl %ebp
 29   ret

allocproc() already set the context->eip to forkret.
forkret() does some initalization that can be done only in context of regular process, not in the main.
Then the forkret() returns to the top of the current stack where the trapret() resides.
trapret() first set the %esp to process' trap frame.
Trapret also uses pop instructions to restore registers from the trap frame just as swtch did with the kernel context:
trapasm.S

 18   # Call trap(tf), where tf=%esp
 19   pushl %esp
 20   call trap
 21   addl $4, %esp #set the %esp to process's trap frame. 
 22
 23   # Return falls through to trapret...
 24 .globl trapret
 25 trapret:
 26   popal #restore general register 
 27   popl %gs 
 28   popl %fs
 29   popl %es
 30   popl %ds #retore %gs, %fs, %es, %ds 
 31   addl $0x8, %esp  # skip two field, trapno and errcode
 32   iret #pops %cs, %eip, %flags, %esp, and %ss from the stack. 

It now begin to execute at tf->eip, that points to virtual address 0, initcode.S.
At this point, %eip holds zero and %esp holds 4096.

allocuvm() (used by sbrk() later) set up the process’s page table so that virtual address zero refers to the physical memory allocated for this process, and set a flag (PTE_U) that tells the paging hardware to allow user code to access that memory. The fact that userinit() set up the low bits of %cs to run the process’s user code at CPL=3 means that the user code can only use pages with PTE_U set, and cannot modify sensitive hardware registers such as %cr3.

initcode() will invoke exec() system call.
pushing $argv, $init, $0.
then set SYS_EXEC to %eax and execute $T_SYSCALL.
exec() will start to run the program named by $init, which is /init.
Init creates a new console device file if needed and then opens it as file descriptors 0, 1, and 2. Then it loops, starting a console shell, handles orphaned zombies until the shell exits, and repeats.
initcode.S

 9 .globl start
 10 start:
 11   pushl $argv
 12   pushl $init
 13   pushl $0  // where caller pc would be
 14   movl $SYS_exec, %eax
 15   int $T_SYSCALL
 16
 17 # for(;;) exit();
 18 exit:
 19   movl $SYS_exit, %eax
 20   int $T_SYSCALL
 21   jmp exit
 22
 23 # char init[] = "/init\0";
 24 init:
 25   .string "/init\0"
 26
 27 # char *argv[] = { init, 0 };
 28 .p2align 2
 29 argv:
 30   .long init
 31   .long 0
 32

init.c

1 // init: The initial user-level program
  2
  3 #include "types.h"
  4 #include "stat.h"
  5 #include "user.h"
  6 #include "fcntl.h"
  7
  8 char *argv[] = { "sh", 0 };
  9
 10 int
 11 main(void)
 12 {
 13   int pid, wpid;
 14
 15   if(open("console", O_RDWR) < 0){
 16     mknod("console", 1, 1);
 17     open("console", O_RDWR);
 18   }
 19   dup(0);  // stdout
 20   dup(0);  // stderr
 21
 22   for(;;){
 23     printf(1, "init: starting sh\n");
 24     pid = fork();
 25     if(pid < 0){
 26       printf(1, "init: fork failed\n");
 27       exit();
 28     }
 29     if(pid == 0){
 30       exec("sh", argv);
 31       printf(1, "init: exec sh failed\n");
 32       exit();
 33     }
 34     while((wpid=wait()) >= 0 && wpid != pid)
 35       printf(1, "zombie!\n");
 36   }
 37 }

0개의 댓글