AI_Tech부스트캠프 week6...[1]CV 기초대회(7) Training Process

Leejaegun·2024년 9월 14일

NaverAIBoostCamp

AI_tech_CV트랙 여정

목록 보기

17/74

1. 학습 파이프라인 전체 개요

1.1 전체적인 프로세스

① 학습 Task 선언

무엇을 학습할지?
데이터를 구성하고 Dataset class 와 DataLoader를 선언
적절한 Model 선별

② 학습 방법 선정

손실함수 선언(Loss)
학습의 최적화 방법(optimizer)
학습률 조정 방법(Learning Rate Scheduler)

③ 학습 설계

어떻게, 얼마나 학습할지?
-> 기능들을 모아 하나의 Trainig Loop 설계
몇번의 Loop 를 학습할지에 대한 Step 와 Iteration 선정

2. 학습 Task 선언

2.1 Dataset 과 DataLoader 선언

Dataset과 DataLoader

데이터를 처리하고 모델 학습에 사용하기 위한 핵심 Class
데이터를 Load, 전처리 그리고 배치 처리 등을 용이하게 하기 위해 사용
데이터를 처리하는 코드를 단순하게 line by line으로 학습 Loop 에 직접구현가능

2.2 Dataset

기본적인 torch Dataset 클래스를 상속받아 다양한 soruce로부터 데이터를 로드하고 전처리하느 방법을 정의
-> 필요에 따라 Custom

import os
import pandas as pd
from torchvision.io import read_image

class CustomImageDataSet(Dataset):
	def __init__(self, annotations_file, img_dir, transform = None, target_transform = None):
    	self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform
   
	def __len__(self): # 데이터셋의 전체 아이템수 반환
    	return len(self.img_labels)
        
    def __getitem__(self): ## 주어진 인덱스에 해당하는 데이터를 학습에 적합한 형태로 변환하고 제공
    	img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx,0])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx,1]
        if self.transform:
        	image = self.transform(image)
        if self.target_transform:
        	label = self.target_transform(label)
       	return image, label

2.3 DataLoader

Dataset으로부터 데이터를 Batch 단위로 로드하는 역할
이를 통해 모델을 배치단위로 진행가능 -> 메모리사용 최적화
Dataset은 재료, DataLoader는 운송수단.

from torch.utils.data import DataLoader

train_dataset =  CustomDataset(train_data, train=True)
test_dataset = CustomDataset(test_data, train= False)

train_dataloader = DataLoader(
			train_dataset,
            batch_size = 64,
            shuffle = True,
            num_workers =0, #데이터를 불러오는데 사용하하는 작업자수(Cpu 코어)
            drop_last = True #마지막 배치의 크기가 batch_size에서 정한 것보다 작을경우 버릴지 말지 결정.
            )


test_dataloader = DataLoader(
			test_dataset,
            batch_size = 64,
            shuffle =False,
            num_workers = 0,
            drop_last = False       
            )

2.4 Model선언

앞서 정의한 Dataset과 상호적으로 입력과 정답을 조절

class SimpleCNN(nn.Module):
	def __init__(self, num_classes : int):
    	super(SimpleCNN, self).__init__() #nn.Module 상속받음 -> 
        self.conv1 = nn.Conv2d(3,32,kernel_size = 3, padding = 1)
        self.conv2 = nn.Conv2d(32,64,kernel_size = 3, padding = 1)
        self.conv3 = nn.Conv2d(64,128,kernel_size = 3, padding = 1)
        self.pool = nn.MaxPool2d(2,2)
        self.fc1 = nn.Linear(128*4*4,512)
        self.fc2 = nn.Linear(512, num_classes)
        self.relu = nn.ReLU()
        
	def forward(self, x:torch.Tensor) ->  torch.Tensor:
    	x = self.pool(self.relu(self.conv1(x))
        x = self.pool(self.relu(self.conv2(x))
        x = self.pool(self.relu(self.conv3(x))
        x = torch.flatten(x,1)
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

3. 학습 방법 선정

3.1 손실 함수(Loss Function)선택

손실함수(loss function)

모델의 현재 상태에서의 예측값과 실제 값 사이의 차이를 수치적으로 나타내는 함수, 이 손실값을 최소화 하는 방향으로 모델 파라미터 조정.
손실함수를 선택할 때는, 문제의 유형과 특성에 맞는 함수 선택해야함.

CrossEntropyLoss

다중 클래스 분류문제에 적합

import torch.nn as nn

loss_fn = nn.CrossEntopyLoss(
	weight = None, # 각 클래스에 대한 가중치를 조절
    ignore_index = -100, #label중 특정 인덱스를 무시하고 싶을때 사용(ex)패딩값 무시.)
    reduction = "mean", #배치크기로 계산되어진 손실값 어떻게 다룰지
    label_smoothing = 0.0 #label의 0,1 값을 완화아여 모델이 너무 확신하지 않도록 만듦.
)

loss = loss_fn(predicts, labels)

Custom Loss

CrossEntropy Loss 외에도, 어떻게 학습을 하고 싶은지에 따라 Custom 한 Loss 구성

내가 하고자 하는 Task 에 맞는 데이터와 모델이 있을때, 어떻게ㅐ 학습할지에 따라 Loss 구성하고 선택.

Optimizer

모델의 학습 과정에서 손실함수를 최소화 하는 모델 파라미터를 찾는 과정을 자동화

import torch.optim as optim
model = CustomModel()

optimizer = optim.Adam(
	model.parameter(), #학습가능한 모든 파라키터들을 리스트 형태로 반환
    lr = 0.001,
    betas = (0.9,0.999), #모멘텀으로 0.9는 1차 모멘텀 0.999는 2차 모멘텀
    weight_decay = 0.01 #가중치 감소_L2규제.
    )

🤔Adam이 optimizer 최고 일까?
SGD가 더 낫다고 하는 논문들이 있다.
① Adam vs. SGD: Closing the generalization gap on image classification
https://www.opt-ml.org/papers/2021/paper53.pdf

② SGD better
https://ar5iv.labs.arxiv.org/html/2010.05627

3.2 학습률 조정(Learning Scheduler)

Learning Rate Scheduler

학습률은 모델의 파라미터 업데이트 크기 결정
너무 높은 학습률은 불안정하게 만들고 , 너무 낮은 학습률은 수렴속도를 느리게함.
스케쥴러(scheduler)는 초기에 높은 학습률을 설정하여 빠르게 학습하고, 이후 학습률을 낮춰 수렴과정을 안정화.
torch는 다양한 스케쥴러를 가지고 있음.

from torch.optim.lr_scheduler import (
	StepLR,
    ExponentialLR,
    CosineAnnealingLR,
    ReduceLROnPlateau
)

가장 많이 쓰는 위 4가지에 대해서 알아보자.
① StepLR

학습률을 일정 주기마다 감소.
모델이 빠르게 학습 수렴할때 사용.

② ExponentialLR

학습률을 일정 주기마다 지수적으로 감소
학습률을 초기에 부드럽게 감소

③ CosineAnnealingLR

학습률을 Cosine 형태로 하고 초기에 높은 학습률 -> 낮은 학습률 ->다시 높은 학습률 로 loop를 돎.
여러 주기에 걸쳐 학습률 조정하면서 장기간에 걸쳐서 하고자 할대 사용

④ ReduceLROnPlateau

성능개선이 이루어지지 않을 때 학습률 감소시키는 방향.

✍️ 학습률에 대해서 코드로 보고 싶으면 여기 들어가셈.

https://www.kaggle.com/code/isbhargav/guide-to-pytorch-learning-rate-scheduling

4. 학습 설계

4.1 모델 학습 루프 설계

Training Loop

모델 성능을 향상시키기 위해 앞서 구성한 요소들을 연결하고 반복적으로 수행할 수 있도록 구성

훈련 에폭 함수

def train_epoch(self)-> float:
	self.model.train()
    total_loss = 0.0
    progress_bar = tqdm(self.train_loader , desc = "Training",leave =False) #desc(description) :진행바 앞에 설명 추가하는 옵션
    #leave : 진행바가 완료된 후에도 화면에 남을지 여부,
    for images, targets in progress_bar:
    	images, targets = images.to(self.device), targets.to(self.device)
        self.optimizer.zero_grad()
        outputs = self.model(images)
        loss = self.loss_fn(outputs, targets)
        loss.backward()
        self.optimizer.step()
        self.scheduler.step()
        total_loss += loss.item()
        progress_bar.set_postfix(loss = loss.item()) #set_postfix는 진행바 끝에 추가적인 정보 보여줌. , .item()하면 tensor값을 파이썬 float 값으로 바꿔줌.
   return total_loss / len(self.train_loader)

검증함수(Validate)

def validate(self) ->float:
	self.model.eval() #모델을 평가모드로 설정
    
    total_loss = 0.0
    progress_bar = tqdm(self.val_loader, desc ="Validation", leave = False)
    
    with torch.no_grad(): #model forward간에 발생하는 gradient 계산 비활성화.
    	for images, targets in progress_bar:
        	images, targets = images.to(self.device), targets.to(self.device)
            outputs =  self.model(images)
            loss = self.loss_fn(outputs, targets)
            total_loss += loss.item()
            progress_bar.set_postfix(loss = loss.item())
    
    return total_loss/len(self.val_loader)

전체 훈련 과정 관리 함수

def train(self) -> None:
	for epoch in range(self.epochs):
    	print(f"Epoch {epoch+1}/{self.epochs}")
    
    	train_loss =self.train_epoch()
        val_loss = self.validate()
        
        print(
        	f"Epoch {epoch+1}",
            f"Train_loss : {train_loss:.4f}",
            f"Validate_loss : {val_loss:.4f}
        )
        
        self.save_model(epoch, val_loss)
        self.scheduler.step()