[python] GIL

alirz-pixel·2023년 1월 17일

목록 보기

5/5

파이썬에서 멀티스레딩은 사실 싱글스레딩 방식으로 실행된다.
GIL (Global Interpreter Lock) 이라는 것 때문인데, GIL 은 등장배경을 보게되면 바로 이해할 수 있다.

GIL 등장 배경

파이썬의 GIL은 Garbage Collection에서 사용되는 Reference Counting과 연관이 깊다.

Reference Counting은 객체의 메모리 해제를 통제하기 위한 방법으로 객체 자신을 참조하는 수를 저장한다.

위의 사진에 나와있는 것처럼 참조의 수가 0이 되면 Garbage Collection은 해당 객체의 메모리를 해제시켜준다. 이 과정이 아무런 대책없이 멀티스레딩 환경에 놓여있다고 한다면, race condition 문제가 발생할 것이다. (즉, 아직 참조하고 있는 객체가 있음에도 메모리를 해제해 버린다거나 참조하고 있는 객체가 없음에도 메모리 해제를 못시켜 메모리 누수가 발생할 수 있다는 뜻이다)

이를 방지하기 위해 파이썬에서는 GIL을 두게 된 것이다.

멀티스레딩과 싱글스레딩의 속도 비교

import time
import random
from threading import Thread

ITERATION = 500000000


def get_max(n):
    max([random.random() for _ in range(n)])


def single_test(n):
    single_tot = 0
    for i in range(n):
        s_beg = time.time()
        get_max(ITERATION)
        s_end = time.time()
        single_time = s_end - s_beg
        single_tot += single_time
        print(f"{i + 1}: single thread elapsed time: {single_time}")
    print("=" * 14)
    print(f"single thread mean elapsed time: {single_tot / n}")


def multi_test(n):
    multi_tot = 0
    for i in range(n):
        t1 = Thread(target=get_max, args=(ITERATION // 2,))
        t2 = Thread(target=get_max, args=(ITERATION // 2,))
        m_beg = time.time()
        t1.start()
        t2.start()
        t1.join()
        t2.join()
        m_end = time.time()
        multi_time = m_end - m_beg
        multi_tot += multi_time
        print(f"{i + 1}: multi thread elapsed time: {multi_time}")
    print("=" * 14)
    print(f"multi thread mean elapsed time: {multi_tot / n}")


if __name__ == '__main__':
    n = 10
    single_test(n)
    multi_test(n)

500000000개의 랜덤한 값을 가지는 배열 중 가장 큰 값을 가져오는 코드로 싱글스레드와 멀티스레드의 속도를 비교해보았다.

1: single thread elapsed time: 32.11880588531494
2: single thread elapsed time: 32.080703258514404
3: single thread elapsed time: 32.19195246696472
4: single thread elapsed time: 32.4189031124115
5: single thread elapsed time: 32.26317024230957
6: single thread elapsed time: 32.31819009780884
7: single thread elapsed time: 31.88320779800415
8: single thread elapsed time: 32.69598054885864
9: single thread elapsed time: 32.43097686767578
10: single thread elapsed time: 32.49436163902283
==============
single thread mean elapsed time: 32.289625191688536

1: multi thread elapsed time: 32.71271061897278
2: multi thread elapsed time: 32.40934228897095
3: multi thread elapsed time: 32.62379765510559
4: multi thread elapsed time: 32.60519814491272
5: multi thread elapsed time: 32.5047767162323
6: multi thread elapsed time: 32.71178865432739
7: multi thread elapsed time: 32.3445520401001
8: multi thread elapsed time: 32.58146667480469
9: multi thread elapsed time: 32.85973906517029
10: multi thread elapsed time: 32.27901744842529
==============
multi thread mean elapsed time: 32.56323893070221

위의 결과를 보면, 멀티 스레딩의 속도가 더 느린 것을 확인할 수 있다.
이는 GIL에 의하여 병렬 실행은 되지 않았는데, 무의미한 context switching 시간이 더해져 더 느리게 나온 것이다.

멀티스레딩이 더 빠른 경우

위에서 이야기한대로라면 파이썬에서의 멀티스레딩은 느리다고만 생각될 수 있다.
하지만 다행이게도 멀티스레딩이 더 빠른 경우도 존재한다.

위와 같이 CPU-Bound program 에서는 병렬처리가 되지 않기 때문에 느렸지만, Sleep이나 I/O Bound program에서는 멀티스레딩이 더 빠르다.
I/O Bound program는 보통 I/O 이벤트에 대해 처리하는 비용보다 I/O 이벤트가 발생하기까지 기다리는 비용이 더 크기 때문이다.
(하지만, I/O Bound Program은 사실 멀티스레딩보다는 비동기 프로그램으로 짠다고 생각하는게 더 맞다고 볼 수 있다)

alirz-pixel

이전 포스트

[python] GIL

python

GIL 등장 배경

멀티스레딩과 싱글스레딩의 속도 비교

멀티스레딩이 더 빠른 경우

정규표현식

0개의 댓글

관련 채용 정보