파이썬 메소드 성능 비교

GGob2._.·2023년 9월 17일

algorithm

목록 보기

46/55

해당 내용은 코딩 테스트 합격자 되기의 저자께서 작성한 코드를 기반으로 작성했습니다.
출처

1. append() vs +

리스트에 항목을 추가할 때, append()와 단순 + 연산의 동작 및 시간 복잡도에 대해 알아본다.

append(): 리스트 끝에 새 요소를 추가하는 연산, 시간 복잡도는 O(1)
+: 두 리스트를 합치는 연산으로, 시간 복잡도는 O(n)
리스트 크기에 비례하게 시간이 소요되며, 반복문 안에서 사용하는 경우
새 리스트를 매번 생성하기 때문에, 시간 소요가 큼

아래는 10,000만큼의 길이를 갖는 리스트 생성 비교 코드다.

import time

num_elements = 10000

# append() 사용
start_time = time.time()
lst_append = []
for i in range(num_elements):
    lst_append.append(i)
append_time = time.time() - start_time

# + 연산자 사용
start_time = time.time()
lst_plus = []
for i in range(num_elements):
    lst_plus = lst_plus + [i]
plus_time = time.time() - start_time

print(f"Using append() took: {append_time:.6f} seconds")# 
print(f"Using + operator took: {plus_time:.6f} seconds")# 

-------
Using   append() took: 0.002993 seconds
Using + operator took: 0.306183 seconds

2. for loop vs list comprehension

리스트를 생성하거나 초기화할 때 사용하는 for loop와 list comprehension의 시간 복잡도 및 성능에 대해 비교한다.

for loop: 기본적인 반복문으로, 반복적인 메서드 호출을 통해 연산을 수행하며, 일반적인 시간 복잡도는 O(n)
list comprehension: for loop와 비슷하게 반복연산을 수행하지만,
내부 로직의 최적화가 더 잘 되어있어 더 빠른 연산이 가능하다.
시간 복잡도는 O(n)로 같지만, 상수 시간 요소가 for loop보다 작다.

아래는 10,000,000까지 수의 제곱 수를 갖는 리스트 생성 비교 코드다.

import time

num_elements = 10000000

# for 루프 사용
start_time = time.time()
squared_numbers = []
for i in range(num_elements):
    squared_numbers.append(i * i)
for_loop_time = time.time() - start_time

# list comprehension 사용
start_time = time.time()
squared_numbers_comprehension = [i * i for i in range(num_elements)]
list_comprehension_time = time.time() - start_time

print(f"Using for loop took: {for_loop_time:.6f} seconds") 
print(f"Using list comprehension took: {list_comprehension_time:.6f} seconds") 

-----------
Using           for loop took: 1.523899 seconds
Using list comprehension took: 0.899564 seconds

3. list pop() vs deque pop()

list와 deque의 pop() 연산 수행 속도 및 시간 복잡도를 비교한다.

list.pop(0): 리스트의 첫 번째 요소를 제거하고, 나머지 요소들을 한 칸씩 앞으로 이동한다. 시간 복잡도는 O(n)이지만, 반복적으로 수행할 경우 O(n^2)
deque.popleft(): deque 라이브러리에서 제공하는 기능으로, 상수 시간 O(1)에 첫 번째 요소를 제거할 수 있다. 반복적으로 수행 할 경우 O(n)

아래 코드는 100,000개의 요소를 가진 list와 deque에서 pop()연산을 비교한다.

import time
from collections import deque

num_elements = 100000

# 리스트에서 pop(0) 사용
lst = list(range(num_elements))
start_time = time.time()
while lst:
    lst.pop(0)
list_pop_time = time.time() - start_time

# deque에서 popleft() 사용
dq = deque(range(num_elements))
start_time = time.time()
while dq:
    dq.popleft()
deque_popleft_time = time.time() - start_time

print(f"Using list's pop(0) took: {list_pop_time:.6f} seconds")
print(f"Using deque's popleft() took: {deque_popleft_time:.6f} seconds")

-------------
Using     list's pop(0) took: 1.441148 seconds
Using deque's popleft() took: 0.014957 seconds

4. list.sort() vs heapq.heappush()

정렬된 list를 만들 때, list.sort()와 heapq.heappush() 연산의 수행 속도 및 시간 복잡도를 비교한다.

list.sort(): 리스트에 요소를 추가한 다음, 매번 정렬을 수행해야 한다. append() 연산의 경우 O(1)의 시간 복잡도를 가지지만, sort() 연산은 O(n log n)의 시간 복잡도를 가진다. 전체적으로는 O(n^2 log n)
heapq.heappush(): heapq 라이브러리는 이진 힙 자료구조를 사용하여 요소를 추가하더라도 힙 속성을 유지한다. heappush() 연산의 시간 복잡도는 O(n log n)이므로, 전체적으로 O(n log n)

아래 코드는 10,000개의 요소를 생성하여 리스트에 넣고 정렬하는 연산을 비교한다.

import heapq
import time
import random

num_elements = 10000
lst = []
heap = []

# 일반 리스트 사용
start_time = time.time()
for _ in range(num_elements):
    val = random.randint(1, num_elements)
    lst.append(val)
    lst.sort()
list_time = time.time() - start_time

# heapq 사용
start_time = time.time()
for _ in range(num_elements):
    val = random.randint(1, num_elements)
    heapq.heappush(heap, val)
heap_time = time.time() - start_time

print(f"Insertion & sort in list took: {list_time:.6f} seconds")
print(f"Insertion using heapq took: {heap_time:.6f} seconds")

------------
Insertion & sort in list took: 0.428877 seconds
Insertion using    heapq took: 0.007989 seconds

5. list in vs set in

리스트와 set 자료구조 내에 특정 인자가 확인하는 in() 연산의 성능 및 시간 복잡도를 비교한다.

list in: 리스트에서 in연산자를 사용하는 경우 O(n)의 시간 복잡도를 가진다. 이때, 각 요소를 처음부터 순차적으로 탐색하여 값을 검색한다.
set in: 집합에서 in연산자를 사용하는 경우 일반적으로 O(1)의 시간 복잡도를 가진다. 이는 집합이 해시 테이블 기반의 자료구조이기 때문에, 요소의 해시 값을 사용하여 빠른 탐색이 가능하기 때문이다.

아래 코드는 1,000,000개의 요소에서 특정 값을 탐색하는 연산을 비교한다.

import time

# 데이터 준비
num_elements = 1000000
test_value = num_elements + 1  # 이 값은 리스트와 집합에 없음.

lst = list(range(num_elements))
s = set(lst)

# 리스트에서 in 테스트
start_time = time.time()
result_list = test_value in lst
end_time = time.time()
list_time = end_time - start_time

# 집합에서 in 테스트
start_time = time.time()
result_set = test_value in s
end_time = time.time()
set_time = end_time - start_time

print(f"List membership test took: {list_time:.6f} seconds")
print(f"Set membership test took: {set_time:.6f} seconds")


----------------
List membership test took: 0.015955 seconds
Set  membership test took: 0.000000 seconds

6. String += vs "".join()

문자열을 합칠 때 사용하는 += 연산과 "".join() 연산의 성능 및 시간 복잡도를 비교한다.

+=: 문자열 결합을 위해 주로 사용되는 연산이다. 문자열은 변경이 불가능한 객체이므로, 매번 += 연산을 수행할 때 마다 새로운 문자열 객체가 생성되고, 이전 문자열의 내용과 새로운 문자열의 내용이 복사된다. 시간 복잡도 O(n^2)를 가진다.
''.join(): 문자열 리스트를 한 번에 결합하는 방법을 제공한다. 내부적으로 더 효율적인 메모리 관리를 수행하며, 문자열을 한 번에 합치기 때문에 O(n)의 시간 복잡도를 가진다.

문자열이 큰 경우 ''.join() 메소드가 훨씬 빠른 성능을 보인다.

아래 코드는 [abcd] 문자열을 1,000,000번 반복한 객체를 다른 문자열과 합치는 경우의 연산을 비교한다.

import time

num_elements = 1000000
strings = ["abcd"] * num_elements

# += 연산자 사용
start_time = time.time()
result_str = ""
for s in strings:
    result_str += s
plus_equals_time = time.time() - start_time

# join() 메서드 사용
start_time = time.time()
result_str = "".join(strings)
join_time = time.time() - start_time

print(f"Using += operator took: {plus_equals_time:.6f} seconds")
print(f"Using join() method took: {join_time:.6f} seconds")

------------------
Using     += operator took: 0.514103 seconds
Using join() operator took: 0.020838 seconds