System Call - Thread

Hyungseop Lee·2023년 12월 4일

[INU, 2-2, TA] Open Source Software Design

목록 보기

11/11

Processes and Threads

Process : an instance of a running(or suspended) program
A single process may have multiple threads of execution
- Each thread has its own function calls & local variables
  Need program counter and stack for each thread
- Threads are so-called light-weight processes that share access to a single memory space.
  (thread는 memory space를 독립적으로 가지고 있을 필요가 없음)

A single-threaded vs. a multi-thread processes

global한 data, files descriptor는 모든 thread 간의 공유가 되고 있음.
하지만 각각의 thread들은 각각의 register, stack segment가 있음.
Kernel 영역은 똑같고, User 영역이 다름.
multi-thread process의 경우, Stack 영역에 각각의 thread별로 따로따로 존재한다.
각기 다른 function들을 호출해가기 때문.

Why use threads, not processes?

병렬적인 처리를 위해 여러 process를 쓰면 되는데, 왜 여러 thread를 이용할까?
- process를 안 쓰는 이유 :
  fork()를 통해 process를 생성하는데,
  overhead가 너무 크다 (복제하기 위해 time, memory 소요가 큼)
  또한 process가 생성되면, 두 process 간에 data를 주고받기 어렵다.
- thread를 사용하는 이유 :
  thread는 process 내에 여러 개의 thread가 동시에 존재하면서
  하나의 memory space를 공유하기 때문에 data 공유(global, static)가 쉽다.
  thread를 생성한다는 것은 stack 영역을 하나만 더 추가하는 것이므로 time, memory를 절약할 수 있다.

Multi-threading

최근에는 CPU cores들을 여러개 갖고 있는 multi-processor machine이 있다.
각각의 CPU에 thread들이 수행되도록 하여 병렬로 처리할 수 있다.
- 만약 하나의 CPU에 여러 thread들이 있다면, time sharing을 해서 수행되기 때문에 시간을 단축시킬 수 없는데
- 각각의 CPU에 하나의 thread들이 있다면 time sharing하지 않고 진정한 병렬적 처리가 가능하게 된다.

POSIX threads

Unix 계열의 운영체제 중에서 multi thread programming을 하기 위해 필요한 API(함수)들을 정의해 놓음

Pthread API

Thread Management (create, exit, join, detach)

Creating and Destroying Threads :
Using Pthreads :
gcc -o <threaded_program.c> -lpthread : "pthread library와 link하여 compile하라"

pthread_create(3) : Create a new thread of control
- void* (*start_routine)(void*) :
  function pointer인데,
  넘겨줄 argument는 void ➡️ 아무 pointer 변수를 넘겨줘도 된다. 사용할 때 type casting하여 사용.
  반환할 때도 void ➡️ 마찬가지로 아무 pointer 변수를 넘겨줘도 된다.
- error codes :
  Returns 0 for success, else error code.
- pthread_t type :
  Opaque : pthread_t type이 내부적으로 어떠한 type으로 만들어졌는지 알 필요가 없다.
  pthread_t가 어떻게 구성되어 있는지 확인하려 하지 말고(Opaque),
  pthread_t를 다루기 위한 보조함수(pthread_self(), pthread_equal())를 활용한다.
  - pthread_self(3), pthread_equal(3) :
pthread_exit(3) : Thread termination
4가지 방법
1. The thread function performs return with a return value :
2. The thread calls pthread_exit() with a return value :
3. Any of the threads calls exit() or the main thread performs a return :
  그 어떠한 thread라도 exit()를 call하면 해당 process가 종료.
  process가 종료되니까 그 안에 있는 모든 thread들도 종료.
4. The thread is canceled using pthread_cancel(3) :
  하나의 thread가 다른 thread를 종료시킬 수 있다.
pthread_join(3) : waiting for threads
만약 여러 thread를 기다리려면 각각의 thread에 대해서 join해줘야 한다.
pthread_detach(3) :
명시된 thread가 종료되었을 때, 그 thread는 join하는 thread 없이 바로 갖고 있는 resource들을 release.

Examples

pthread_create(3)

pthread_create 예제 1 :
➡️ main thread에서 pthread_exit(3); or pthread_join(3)을 안했기 때문에
printf("I am a main thread\n");를 수행하고나서
곧바로 return 0;을 만나서 process가 종료되기 때문에
우리가 예상했던 결과인 myfunc()의 "Thread created!\n"가 출력되지 않는다.

pthread_exit(3)

pthread_create 예제 1 + pthread_exit(3) :
main에서 process가 종료되지 않도록
pthread_exit()을 통해 다른 thread가 계속 진행되도록 한다.

pthread_join(3)

pthread_create 예제 1 + pthread_join(3) :
pthread_create 예제 2 :
scheduler간 차이 때문에
thread가 생성되었을 때, main thread와 생성된 thread의 진행과정을 알 수 없다.
global variable sharing 예제 :

Optimization Option(Volatile)

Optimization Option 예제 :
-O2라는 2-level optimization option을 주었더니 counter가 모두 0으로 출력되었다.
왜 그렇게 출력되었을까?
program이 multi-core에서 multi-thread로 실행되고 있는데,
disk에서 값을 가져오는 것은 시간이 매우 오래 걸리고,
CPU가 memory에서 값을 읽어오는 것도 시간이 걸리기 때문에
OS는 optimization을 통해 각각의 core의 register나 cache에 값을 갖다 놓는다.
main thread와 생성된 thread들이 사용하는 register와 cache가 multi-core이기 때문에
서로 다른 L1-cache에 있는 값을 공유할 수 없다.
하지만 L2-cache는 공유할 수 있다.
따라서 optimization을 통해 memory에 있는 counter, is_running의 값이 각각의 thread가 있는 core의 L1 cache에 값을 주기 때문에 thread들 간의 값을 공유할 수 없는 것이다.
따라서 이를 해결하기 위해 counter, is_running을 선언할 때, volatile을 선언함으로써
memory에 대한 optimization을 하지 말라는 option을 줄 수 있다.
thread handler에 data 넘겨주기 :

Critical Section, Mutex, Synchronization

Critical section 문제 :
우리의 예상과 달리 10,000이 아니라 더 작은 숫자가 출력되었다...
두 thread 간의 synchronization이 되지 않아 cnt라는 global variable을 동시에 처리하게 되어 문제가 발생한 것이다.
Critical section 문제 + mutex :
mutex 사용시 주의사항 : 너무 많이 호출하면 안된다.
single thread -> about 11s4 threads -> about 59s
1개의 thread만 이용한 것이 훨씬 빠르게 나왔다.
왜 single thread programming이 더 빠른 수행 결과를 보였을까?
➡️ lock을 사용하면 overhead와 delay가 발생한다.
따라서 for문 안에서 lock을 1,000,000,000번 부르는 것은 multi-thread programming 효과가 오히려 떨어지게 되어 성능과 속도가 떨어지게 되는 것이다.
따라서 for문에서는 local variable로 part_sum을 선언하여 해당 thread별 구간만큼 더하고
for loop 이후에 global variable sum에 합산할 때에만(critical section) lock을 사용함으로써
overhead와 delay를 최소화하여 아래와 같이 programming해야 한다.

single thread -> 1.234s
4 threads -> 0.334s