[OSTEP] Virtualization : Process

Soeng_dev·2020년 12월 29일
0

Virtualization

Seemingly OS runs multiple programs simultaneously, in fact, runs process one by one in only one CPU(or few CPUs). It's called time sharing, there's tradeoff between performance and number of concurrent processes.
For virtualization OS needs both low-level machinery called mechanisms and high-level intelligence called policies.

Concept of process

- Definition of Process

Abstraction of running program
It's constituted by component of machine state which is all of what a program can read or update while running.
Common process APIs of modern OS are as follows
: Create, Destroy, Wait, Status, Miscellaneous control(deal with problem of process other way than just kill it)

- Process Creation

Load code and static data into address space of the process
(Early OSes all stuff is loaded at once; Modern OSes currently needed pieces of code or data while execution is loaded)
Allocate memory for run-time : stack and heap
Other initialization tasks, particularly I/O(input, output) initialization
Start the program running at the entry point, namely main() and OS transfers control of CPU to the newly-created process.

- Process State

Running

A process is running on a processor

Ready

Ready to run but OS has chosen not to run

Blocked

Make the process not to run until some other event takes place. 
It's used when a process initiates I/O, the other process uses CPU for resource efficiency.

Scheduled / Descheduled

A process is moved from ready to running / from running to ready

ZOMBIE (final) state

State of process has exited but hasn't been cleaned up yet.
Called zombie state in UNIX-based system

How to implement process

- Data Structure

Process list

Whole list of current process, aka task list

PCB

Process Control Block, individual structure of information of a process, also called process descriptor

- Process Optimization

Two policies :
Switch to the other process when current process issues I/O, for CPU utilization.
After I/O done, run the process again immediately. 
Since CPU utilization is key for efficiency(less time for the whole processes done), the faster a process is done, the more likely to use CPU (that can be freely used for scheduling) we get.

- Process control in UNIX

In UNIX shells, some commands(signals subsystem) deliver a specific signal to current running process for convenience.(And process should use signal() system calls to catch them.)

User

For the systems that many people using at the same time, needs to restrict signal to control process appropriately.
So use the concept of user : who exercises full control over their own process to prevent malicious signal by others.
And generally, system needs who can administer it. It's called superuser or root user in UNIX-based systems.

Process APIs

Three APIs(system calls) to create new process
fork(), exec(), wait()

- fork

fork creates new process(child process), which starts from the line calls fork();

- exec

exec is used when wanna run a different program in a certain process.
Overwrites code with loaded code called by exec, address space of the process re-initialized with newly-loaded code. 
But does not create new process.

- wait

wait(), waitpid() let the process called wait();(usually parent) wait until the other process(usually child) done.

- Reason for using fork, exec, wait

Why separate creation function to fork and exec?

When creation of new process and setting task for it done separately, can handle the process environment so that enables various features(e.g. change file descriptor).
Also, separation lets shell run code after fork(), before exec() so that can utilize useful tools such as pipes(ch 5, p.6)
With such kind of handling above output of child to stdout(e.g.printf) is redirected to designated path or file descriptor.
But some other opinions exist.
For example, a recent paper by systems researchers from Microsoft, Boston University, and ETH in Switzerland details some problems with fork(), and advocates for other, simpler process creation APIs such as spawn()
(spawn 참고글 https://ohgyun.com/453 , https://bit.ly/3SGtRgZ)

Why use wait(), waitpid()?

CPU scheduler schedules process seemingly in non-determinism manner, because of complexity of scheduler. So wait(), waitpid() are needed.

- Useful linux shell commands for process managing

ps 

allows you to see which processes are running

top

displays the resources(including CPU) how much processes of the system are using

kill, killall

send arbitrary signals to processes to kill process

CPU meters

to get a quick glance understanding of the load on your system

Background Knowledge

- CPU 

Central processing unit

- Register

Component of CPU, kinda temporary work station of CPU

- 'Array of pointers' vs 'Pointer to an array'

Array of pointers

Array that consists of pointers of data type
[data type] * [name of array][nb of elements]

Pointer to an array

A pointer to array of datatype
[data type] (*[name of array]) [nb of elements]

The way UNIX manages file descriptors

Determines the file descriptor to output, the first free file descriptor starts from zero(=STDOUT FILENO)

Kernel pipes

consist of queue data structure

profile
Software Engineer

0개의 댓글