list_데이터_확장

heyme·2023년 8월 24일

list에 무언가를 추가할 때, 데이터가 커질수록 추가를 어떻게 하느냐에 따라 속도가 달라진다.
list에 요소를 추사하는 4가지 경우를 두고 각각 얼마나 시간이 소요되는지 테스트를 해보았다.

최악은 리스트에 리스트 붙이기, 제일 말이안되게 오래 걸린다.
list1의 크기가 커질수록 할때마다 list1을 불러와야 하니 그런듯 // O(n^2) 걸림

list1 = list1 + list2

그 다음은 +=로 붙이기. 훨씬 빨라지지만 선호하지 않는다. // extend 기능

list1 += list2

그리고 append! 무난히 잘 사용하지만 최적화 할때는 잘 사용하지 않는다.

for x in list2:
    list1.append(x)

list의 길이를 미리 구하여 해당 자리에 값 넣기.

list1 = [0] * len(list2)
# list1 = [0 for _ in list2]
for i, x in enumerate(list2):
    list1[i] = x

보통은 append를 많이 사용하지만, 아주 큰 데이터가 들어왔을때 실행시간이 훨씬 길어졌다.
append 대신, append할 list 크기를 미리 지정하고 for 문 등으로 값을 지정하는 방식이 더 나은데, 그 이유는 다음과 같다.

다음은 gpt 답변:
메모리 할당: 리스트의 크기를 미리 정의하면 메모리가 한 번에 할당됩니다. 반면 append() 메소드를 사용할 경우, 리스트의 크기가 늘어날 때마다 추가적인 메모리 할당이 필요할 수 있습니다. 이로 인해 메모리 재할당과 데이터 복사 작업이 여러 번 발생할 수 있습니다.

인덱스 액세스: 리스트 크기를 알면 미리 할당된 메모리 위치에 직접 데이터를 할당하기 때문에, 데이터 삽입 시간이 매우 빠릅니다. append() 메소드에서는 리스트의 끝에 데이터를 추가하는 작업이 필요합니다.

실제로 append() 메서드는 리스트의 끝에 요소를 추가하는 것이 매우 효율적입니다. 내부적으로 list는 동적 배열로 구현되어 있기 때문에, 여분의 메모리를 가지고 있어서 대부분의 경우에는 append() 호출 시 추가적인 메모리 할당이 필요하지 않습니다. 문제는 list의 기본 용량이 초과될 경우 발생합니다. 이 경우 새로운, 더 큰 메모리 공간을 할당하고 기존의 데이터를 새로운 메모리 위치로 복사해야 합니다. 이런 재할당이 발생하는 시점은 드물지만, 발생할 경우 추가적인 시간이 소요됩니다.

리스트의 크기를 알면 이를 미리 지정하므로, 이러한 메모리 재할당이 발생하지 않습니다. 따라서, "리스트의 끝에 데이터를 추가하는 작업" 자체가 시간이 오래 걸리는 것은 아닙니다. 차이는 메모리 재할당과 관련된 것입니다.

# TEST CODE
import random
from time import time


n = 100000
list_param = random.sample(range(n), n)


list_case_1 = []
start1 = time()
for item in list_param:
    list_case_1 = list_case_1 + [item]
print(f"Duration for 1 case : {(time() - start1):04f}s")
#print(list_case_1)

list_case_2 = []
start2 = time()
for item in list_param:
    list_case_2 += [item]
print(f"Duration for 2 case : {(time() - start2):04f}s")
#print(list_case_2)

list_case_3 = []
start3 = time()
for item in list_param:
    list_case_3.append(item)
print(f"Duration for 3 case : {(time() - start3):04f}s")
#print(list_case_3)

list_case_4 = [0] * len(list_param) #deque()
start4 = time()
for i, item in enumerate(list_param):
    list_case_4[i] = (item)
#print(list_case_4)
print(f"Duration for 4 case : {(time() - start4):04f}s")

Duration for 1 case : 29.881827s
Duration for 2 case : 0.019768s
Duration for 3 case : 0.016583s
Duration for 4 case : 0.020160s

.
예1)

def datagen(frames, mels):
    img_batch, mel_batch, frame_batch, coords_batch = [], [], [], []

    if args.box[0] == -1:
        if not args.static:
            face_det_results = face_detect(frames) # BGR2RGB for CNN face detection
        else:
            face_det_results = face_detect([frames[0]])
    elif args.box[0] == 0:
    
        if not args.static:
            #boxes = np.load('/root/share/xxxx/face_detect_box_image.npy')
            #boxes = np.load('/root/share/xxxx/face_detect_box_thbb.npy')
            boxes = np.load('/root/share/xxxx/face_det_80s_box.npy')
            
            boxes_results = [0]*len(boxes)
            for i, box  in enumerate(boxes):
                #boxes_results.append(np.array(box))
                #boxes_results[i] = np.array(box)
                boxes_results[i] = box
                #print('frame : ', f) #f[y1: y2, x1:x2])
                #print('boxes : ', int(box[0])) #(y1, y2, x1, x2))
                #results[i] = [x2, x1, y2, y1] #[x1, y1, x2, y2]
            #print(results)

예2)

    def load_data(self, input, sr, state):
        cnt = input.size//(1*sr)
        global xx
        xx = state
        wav = input
        input_data = [] * (cnt - 1)
        for idx in range(cnt-1): # if xx==0.5 else cnt):
            if state == 0.5:
                data = wav[int((idx)*sr +(0.5*sr)):int((idx+1)*sr + (0.5*sr))]
            else:
                data = wav[idx*sr:(idx+1)*sr]
            mel_data = torch.FloatTensor(self.mel_spectrogram(data, 128, sr))
            #input_data.append(mel_data.unsqueeze(0))
            input_data[idx] = mel_data.unsqueeze(0)
        return torch.cat(input_data, dim=0).unsqueeze(0)

.
.

heyme

이전 포스트

Server & Client

다음 포스트

속도_최적화

2개의 댓글

안쏘

2023년 8월 24일

뀨><

1개의 답글