18. Yolo v1평가지표 설계 + Train/val 수행 (4) - 인공지능 고급(시각) 강의 예습

안상훈·2024년 6월 22일

CNN ICT 이노베이션 스퀘어 PyTorch 딥러닝

개요

1. 평가지표 개요

2. 평가지표 코드구현

3. 훈련/검증

4. 결과 분석

인공지능-시각

목록 보기

22/54

개요

본 블로그 포스팅은 수도권 ICT 이노베이션 스퀘어에서 진행하는 인공지능 고급-시각 강의의 CNN알고리즘 강좌 커리큘럼 중 일부 항목에 대한 예습 자료입니다..

1. 평가지표 개요

Yolo v1논문을 보면 위 사진처럼 참 다양한 성능 결과치때문에 머리가 아파오기 시작한다.

이중 그림으로 되어 있는 원형 그래프 부터 이해를 시작하도록 하자.

해당 그래프를 이해하려면 Image Detection Task에서 정의 하는 Error Type Definitions에 대해 개념학습이 필요하다.
위 그림에서 설명하는 6가지 에러 요인이며, 각 내용은 아래와 같다.

1) Cls (Classification) : 잘못된 클래스로 예측
2) Loc (Localization) : B_Box의 위치가 많이 어긋남
3) Cls+Loc (Classification and Localization) : 1), 2) 오류 둘다 발생
4) Duplicate (중복) : 동일객체를 중복탐지함
5) Bkgd (Background) : 없는데 있다 탐지한 것
6) Missed (누락) : 탐지를 못함

이 Error 항목에 대한 이해를 선행한 후 아래의 원형 그래프를 본다면 어느정도 성능을 가늠할 수 있을 것이다.
그래서 위 그래프를 요약하자면 Yolo v1는 Fast R-CNN 대비
Detection Error에 해당하는
3(Sim), 4(Other), 5(Bkgd)
항목이 개선된 모델이라고 생각할 수 있다.

다음으로 Yolo v1의 평가 지표 중 mAP 항목에 대한 정의는 아래와 같다.

mAP는 위 그림에서 설명하고 있는 Precision(정밀도)를 사용한 데이터셋에 대해서 평균을 때린 것이다.

좀더 정확하게는 모든 클래스에서 출력된 AP를 다시한번 mean을 수행한 것이라 보면 되는데
아무튼 평균이다.

주요 평가지표인 정밀도(Precision)와
재현율(Recall)의 차이점을 본다면

Precision : 모델이 예측한 값들 중 실제 정답인 비율 $\rightarrow$ 모델의 예측이 얼마나 정확한가?

Recall : 실제 정답 항목 중 모델이 정답을 내놓은 비율 $\rightarrow$ 모델이 얼마나 놓치지 않고 예측했는가?

이렇게 볼 수 있다.

필자는 위 평가표의 항목 중
1) IOU
2) IOU를 응용해 측정하는 Precision, Recall
3) 클래스 분류 정확도 $\rightarrow$ Top1 error

3가지 평가지표를 선정하여 Yolo v1 훈련/검증 시 매 epoch마다 학습 진행률을 체크하고자 한다.

2. 평가지표 코드구현

1) compute_iou 함수

def compute_iou(box1, box2):
    # 각 바운딩 박스의 좌표값 [x_1, y_1, x_2, y_2] 생성
    box1_x1 = box1[..., 0] - box1[..., 2] / 2
    box1_y1 = box1[..., 1] - box1[..., 3] / 2
    box1_x2 = box1[..., 0] + box1[..., 2] / 2
    box1_y2 = box1[..., 1] + box1[..., 3] / 2

    box2_x1 = box2[..., 0] - box2[..., 2] / 2
    box2_y1 = box2[..., 1] - box2[..., 3] / 2
    box2_x2 = box2[..., 0] + box2[..., 2] / 2
    box2_y2 = box2[..., 1] + box2[..., 3] / 2

    # 교차 영역의 좌표 계산
    inter_x1 = torch.max(box1_x1, box2_x1)
    inter_y1 = torch.max(box1_y1, box2_y1)
    inter_x2 = torch.min(box1_x2, box2_x2)
    inter_y2 = torch.min(box1_y2, box2_y2)

    # 교차 영역의 면적계산 -> 교차가 없을 때는 0으로 음수값 방지
    inter_area = (inter_x2 - inter_x1).clamp(0) * (inter_y2 - inter_y1).clamp(0)

    # 각 바운딩 박스의 면적 계산
    box1_area = (box1_x2 - box1_x1) * (box1_y2 - box1_y1)
    box2_area = (box2_x2 - box2_x1) * (box2_y2 - box2_y1)

    # 합집합 영역 계산
    union_area = box1_area + box2_area - inter_area

    return inter_area / union_area

위 함수는 인공지능 고급(시각) 강의 예습 - 19. Yolo v1 (2) Loss 함수 설계에서 설명했으니 넘어가도록 하겠다.
이렇게 쓰는 이유는 이 포스트만 봐서는 뒤 이어 진행되는 평가지표를 설계하는데 사용된 Pythonic한 코드 이해가 어렵기에
해당 포스트를 꼭 참조해 달라는 필자의 권유라 생각해주기 바란다.

2) 평가지표 함수

def cal_acc_func(output, label, iou_th=0.5, S=7, B=2, C=20,):
    # 모델의 출력 결과물과 정답지는 [Batch_size, S, S, B*5+C]
    # 4차원 데이터임을 잊지말자

    # B_Box정보만 추려서 [Batch_size, S, S, B, 5] 5차원으로
    pred_boxes = output[..., :B*5].view(-1, S, S, B, 5)
    target_boxes = label[..., :B*5].view(-1, S, S, B, 5)

    # 다시 위에서 [x, y, w, h]정보만 추리고
    # [Batch_size * S * S * B, 4] 2차원으로 변환
    pred_box = pred_boxes[..., :4].reshape(-1, 4)
    target_box = pred_boxes[..., :4].reshape(-1, 4)

    # YoloLoss()함수에서 사용하지 않은 IOU계산을 여기에서 하네
    ious = compute_iou(pred_box, target_box)
    
    # iou성능 값이 각 GridCell 별로 해서
    # [Batch_size * S * S * B] 벡터값으로 나왔으니
    # mean을 때려서 스칼라 값으로 평균을 냄
    iou_score = ious.mean().item()
    
    # B_Box의 CS정보가 따지고 보면 IOU값에 해당하니
    # 해당 값으로 Output값을 Iou_th(0.5)를 기준으로 [0 or 1]
    # 매트릭스로 변환 -> 따라서 [Batch_size, S, S, B]
    pred_cs = (pred_boxes[..., 4].reshape(-1) > iou_th).float()
    target_cs = target_boxes[..., 4].reshape(-1)
    # Target_CS정보는 원래 [0 or 1]정보만 포함하니 추출만 진행

    # CS 정보를 기반으로 True Positve, False (Positive, Negative) 계산
    TP = (pred_cs * target_cs).sum().item()
    FP = (pred_cs * (1 - target_cs)).sum().item()
    FN = ((1 - pred_cs) * target_cs).sum().item()

    # 정밀도(precision), 재현율(recall) 계산
    precision = TP / (TP + FP) if (TP + FP) > 0 else 0
    recall = TP / (TP + FN) if (TP + FN) > 0 else 0

    # 클래스 확률 정보(Class Probability) 추출 후 차원변환
    # [Batch_size, S, S, C] -> [Batch_size * S * S, C]
    pred_cp = output[..., B*5:].view(-1, C)
    target_cp = label[..., B*5:].view(-1, C)

    # Class확률정보에서 가장 높은 값을 갖는 idx 정보 추출
    pred_classes = torch.argmax(pred_cp, dim=1)
    target_classes = torch.argmax(target_cp, dim=1)
    # 추출 결과물은 [Batch_size * S * S] 1차원이 된다.

    # target_classes에서 객체가 존재하는 Grid cell만 선별하여
    # 마스크를 만들고 이를 필터링으로 쓴다.
    non_zero_mask = target_classes != 0
    f_pred_classes = pred_classes[non_zero_mask]
    f_target_classes = target_classes[non_zero_mask]

    # Top-1 Error 계산
    top1_error = (~f_pred_classes.eq(f_target_classes)).float().mean().item()

    return iou_score, precision, recall, top1_error

음.. 그림을 그리면서 중요 항목만 설명하자면

Output(모델 출력)값과 target(데이터셋 라벨)값은
2회의 차원 변환을 수행해서
2차원에 마지막 차원이 [x, y, w, h]인 Tensor로 바꿔준다

여기서 1번째 차원변환은 view()이지만
2번째 차원변환은 reshape()인 이유는

view()는 메모리의 연속성이 있는 자료형에 대한 차원변환이고
reshape()는 메모리의 연속성이 없는 자료형도 차원변환을 해주기 때문이다.

음.. 쉽게 설명하자면 view()은 처음 딱 한번만 적용할 수 있는 차원변환이라 보면 된다.

다음으로 객체가 존재할 때 해당 객체의 종류를 판별하는 classification과정에 대한 코드 설명이다.
차원변환 $\rightarrow$ torch.argmax(dim=1)
과정으로 최대값에 대한 idx정보만을 추출한다.
이때 grid_cell에 객체가 아에 없는 경우는
torch.argmax(dim=1)을 수행 시 0이란 값이
출력되기에 해당 정보는 마스크를 만들어서 필터링 해줘야한다.

위 두 중요한 항목에 대한 개념설명을 진행했으니
이제 Yolo v1에 대하여
Pascal VOC 2007데이터셋을 기반으로
훈련/검증 작업을 수행하도록 하겟다.

3. 훈련/검증

1) Yolo v1 모델 GPU로 이전 및 검토

from torchsummary import summary #설계한 모델의 요약본 출력 모듈

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = Yolov1()
model.to(device) #모델을 GPU로
summary(model, input_size=(3, 448, 448), device=device.type)

2) 하이퍼 파라미터 설정

Yolo v1논문에서 작업한 버전은

optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=0.0005)
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[75, 90], gamma=0.1)

위 방식으로 SGD를 사용하고 총 epoch에 대해서 75%, 90%일때 Learning Rate값을 감소시키는 scheduler를 적용했다.

하지만 필자의 경우

from torch import optim

#LossFn, Optimizer, scheduler 정의
criterion = YoloLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

위 코드로 optimizer, scheduler 를 설계했다.

3) train, evaluate 함수 설계

from tqdm import tqdm #훈련 진행상황 체크

#tqdm 시각화 도구 출력 사이즈 조절 변수
epoch_step = 5

def model_train(model, data_loader, 
                loss_fn, optimizer_fn, scheduler_fn, 
                processing_device, epoch):
    model.train()  # 모델을 훈련 모드로 설정

    global epoch_step

    # loss와 accuracy를 계산하기 위한 임시 변수를 생성
    run_size, run_loss, = 0, 0
    #Yolo v1의 성능지표는 아래의 4개 항목으로 출력된다.
    #이때 성능지표는 이동평균 필터로 계산한다.
    avg_iou, avg_precision, avg_recall, avg_top1_error = 0, 0, 0, 0
    n = 0  # 값의 개수

    # 특정 에폭일 때만 tqdm 진행상황 바 생성
    if (epoch + 1) % epoch_step == 0 or epoch == 0:
        progress_bar = tqdm(data_loader)
    else:
        progress_bar = data_loader

    for image, label in progress_bar:

        # 입력된 데이터를 먼저 GPU로 이전하기
        image = image.to(processing_device)
        label = label.to(processing_device)

        # 전사 과정 수행
        output = model(image)

        loss = loss_fn(output, label)

        #backward과정 수행
        optimizer_fn.zero_grad()
        loss.backward()
        optimizer_fn.step()

        # 스케줄러 업데이트
        scheduler_fn.step()

        #현재까지 수행한 loss값을 얻어냄
        run_loss += loss.item() * image.size(0)
        run_size += image.size(0)

        #성능지표 계산하기
        iou_score, precision, recall, top1_error = cal_acc_func(output, label)

        # 이동평균 필터로 성능지표 계산하기
        n += 1
        avg_iou = avg_iou + (iou_score - avg_iou) / n
        avg_precision = avg_precision + (precision - avg_precision) / n
        avg_recall = avg_recall + (recall - avg_recall) / n
        avg_top1_error = avg_top1_error + (top1_error - avg_top1_error) / n

        #tqdm bar에 추가 정보 기입
        if (epoch + 1) % epoch_step == 0 or epoch == 0:
            desc = (f"[훈련중] 로스값: {run_loss / run_size:.4f}, "
                    f"IOU값: {avg_iou:.4f}, "
                    f"정밀도: {avg_precision:.4f}, "
                    f"재현율: {avg_recall:.4f}, "
                    f"Top-1_err: {avg_top1_error:.4f}")
            progress_bar.set_description(desc)

    # avg_accuracy = correct / len(data_loader.dataset)
    avg_loss = run_loss / len(data_loader.dataset)
    avg_KPI = [avg_iou, avg_precision, avg_recall, avg_top1_error]
    return avg_loss, avg_KPI

def model_evaluate(model, data_loader, loss_fn, 
                   processing_device, epoch):
    model.eval()  # 모델을 평가 모드로 전환 -> dropout 기능이 꺼진다
    # batchnormalizetion 기능이 꺼진다.
    global epoch_step

    # 초기값 설정
    run_loss = 0
    avg_iou, avg_precision, avg_recall, avg_top1_error = 0, 0, 0, 0
    n = 0  # 값의 개수

    # gradient 업데이트를 방지해주자
    with torch.no_grad():

        # 특정 에폭일 때만 tqdm 진행상황 바 생성
        if (epoch + 1) % epoch_step == 0 or epoch == 0:
            progress_bar = tqdm(data_loader)
        else:
            progress_bar = data_loader

        for image, label in progress_bar:  # 이때 사용되는 데이터는 평가용 데이터
            # 입력된 데이터를 먼저 GPU로 이전하기
            image = image.to(processing_device)
            label = label.to(processing_device)

            # 평가 결과를 도출하자
            output = model(image)
            loss = loss_fn(output, label)

            # 배치의 실제 크기에 맞추어 손실을 계산
            run_loss += loss.item() * image.size(0)

            # 성능 지표 계산
            iou_score, precision, recall, top1_error = cal_acc_func(output, label)
            
            # 이동평균 계산
            n += 1
            avg_iou = avg_iou + (iou_score - avg_iou) / n
            avg_precision = avg_precision + (precision - avg_precision) / n
            avg_recall = avg_recall + (recall - avg_recall) / n
            avg_top1_error = avg_top1_error + (top1_error - avg_top1_error) / n

        #tqdm bar에 추가정보 기입은 eval은 뺀다... 귀찮음

        # accuracy = correct / len(data_loader.dataset)
        avg_loss = run_loss / len(data_loader.dataset)
        avg_KPI = [avg_iou, avg_precision, avg_recall, avg_top1_error]
        return avg_loss, avg_KPI

두 함수에서 출력되는 훈련/검증의 성과지표는
Loss, IOU, Precision, Recall, Top1 error
5가지로 좀 많기에
Loss를 제외한 항목은 KPI라는 리스트를 만들어서
return하게 만들었다.

4) 실행코드

# 학습과 검증 손실 및 정확도를 저장할 리스트
his_loss = []
his_KPI = []
num_epoch = 50

for epoch in range(num_epoch):
    # 훈련 손실과 훈련 성과지표를 반환 받습니다.
    train_loss, train_KPI = model_train(model, train_loader, 
                                        criterion, optimizer, scheduler, 
                                        device, epoch)

    # 검증 손실과 검증 성과지표를 반환 받습니다.
    test_loss, test_KPI = model_evaluate(model, test_loader, 
                                         criterion, device, epoch)

    # 손실과 성능지표를 리스트에 저장
    his_loss.append((train_loss, test_loss))
    his_KPI.append((train_KPI, test_KPI))

    # epoch가 특정 배수일 때만 출력하기
    if (epoch + 1) % epoch_step == 0 or epoch == 0:
        print(f"epoch {epoch+1:03d}, Training loss: {train_loss:.4f}")
        print(f"Test loss: {test_loss:.4f}")
        print(f"Train KPI: IOU: {train_KPI[0]:.4f}, "+
              f"Precision: {train_KPI[1]:.4f}, "+
              f"Recall: {train_KPI[2]:.4f}, "+ 
              f"Top-1_err: {train_KPI[3]:.4f}")
        print(f"Test KPI: IOU: {test_KPI[0]:.4f}, "+
              f"Precision: {test_KPI[1]:.4f}, "+
              f"Recall: {test_KPI[2]:.4f}, "+
              f"Top-1_err: {test_KPI[3]:.4f}")

epoch를 50으로 설정하고 5의 배수 epoch만 결과값을 출력하게 코드를 작성했는데도 꽤 많은 양의 데이터가 출력되서 난감하다...

4. 결과 분석

평가지표 5개에 대한 그래프를 그리는 코드는 아래와 같다.

import numpy as np
import matplotlib.pyplot as plt
#histroy는 [train, test] 순임
#KPI는 [iou, precision, recall, top1_error] 순임
np_his_loss = np.array(his_loss)
np_his_KPI = np.array(his_KPI)

# his_loss에서 손실 데이터 추출
train_loss, val_loss = np_his_loss[..., 0], np_his_loss[..., 1]

# his_KPI에서 각 성능 지표 추출
train_iou, val_iou = np_his_KPI[..., 0, 0], np_his_KPI[..., 1, 0]
train_precision, val_precision = np_his_KPI[..., 0, 1], np_his_KPI[..., 1, 1]
train_recall, val_recall = np_his_KPI[..., 0, 2], np_his_KPI[..., 1, 2]
train_top1_errors, val_top1_errors = np_his_KPI[..., 0, 3], np_his_KPI[..., 1, 3]

# 2x3 플롯 설정
plt.figure(figsize=(10, 5))

# Train-Val Loss
plt.subplot(1, 2, 1)
plt.plot(train_loss, label='Train Loss')
plt.plot(val_loss, label='Val Loss')
plt.xlabel('Training Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Train-Val Loss')

# Train-Val Top-1 Error
plt.subplot(1, 2, 2)
plt.plot(train_top1_errors, label='Train Top-1 Error')
plt.plot(val_top1_errors, label='Val Top-1 Error')
plt.xlabel('Training Epochs')
plt.ylabel('Top-1 Error')
plt.legend()
plt.title('Train-Val Top-1 Error')

plt.tight_layout()
plt.show()


plt.figure(figsize=(16, 6))

# IOU
plt.subplot(1, 3, 1)
plt.plot(train_iou, label='Train IOU')
plt.plot(val_iou, label='Val IOU')
plt.xlabel('Training Epochs')
plt.ylabel('IOU')
plt.legend()
plt.title('IOU')

# Precision
plt.subplot(1, 3, 2)
plt.plot(train_precision, label='Train Precision')
plt.plot(val_precision, label='Val Precision')
plt.xlabel('Training Epochs')
plt.ylabel('Precision')
plt.legend()
plt.title('Precision')

# Recall
plt.subplot(1, 3, 3)
plt.plot(train_recall, label='Train Recall')
plt.plot(val_recall, label='Val Recall')
plt.xlabel('Training Epochs')
plt.ylabel('Recall')
plt.legend()
plt.title('Recall')

plt.tight_layout()
plt.show()