[YOLO] #3. YOLOv8 코드 공부하기

임소현·2023년 6월 27일

YOLO

목록 보기

3/5

YOLO v8 github : https://github.com/ultralytics/ultralytics
train code (YOLO v5) github : https://github.com/ultralytics/yolov5/blob/master/train.py

위 링크는 YOLOv8과 v5 코드 관련 깃허브다. v5와 v8은 비슷한 부분이 많으므로 두 가지 버전 모두 찾아보며 공부해볼 예정이다.

train.md

모델의 train 모드에 대해 설명되어 있는 부분이다. 모델을 사용하는 방안은 크게 3가지로 구분된다.

yaml 파일 -> 새로운 모델을 생성
Build a new model from YAML and start training from scratch
yolo detect train data=coco128.yaml model=yolov8n.yaml epochs=100 imgsz=640

model = YOLO('yolov8n.yaml')  # build a new model from YAML

pt 파일 -> 이미 사전 훈련된 모델을 로드
Start training from a pretrained *.pt model
yolo detect train data=coco128.yaml model=yolov8n.pt epochs=100 imgsz=640

model = YOLO('yolov8n.pt')  # load a pretrained model (recommended for training)

yaml & pt 파일 -> 전이된 가중치를 사용

model = YOLO('yolov8n.yaml').load('yolov8n.pt')  # build from YAML and transfer weights

#train 모드
Build a new model from YAML, transfer pretrained weights to it and start training
yolo detect train data=coco128.yaml model=yolov8n.yaml pretrained=yolov8n.pt epochs=100 imgsz=640

 model.train(data='coco128.yaml', epochs=100, imgsz=640)

#gpu 사용

        ```python
        from ultralytics import YOLO
        
        # Load a model
        model = YOLO('yolov8n.pt')  # load a pretrained model (recommended for training)
        
        # Train the model with 2 GPUs
        model.train(data='coco128.yaml', epochs=100, imgsz=640, device=[0, 1])

#중간의 모델 정보 저장

from ultralytics import YOLO
        
        # Load a model
        model = YOLO('path/to/last.pt')  # load a partially trained model
        
        # Resume training
        model.train(resume=True)

ULTRALYTICS > ultraylytics > yolo > engine > model.py

본 파일에는 YOLO 객체를 생성할 수 있는 클래스가 정의되어 있다. 앞에서 YOLO()라는 모델을 생성할 때, 어떤 메커니즘으로 작동하는지를 나타낸다.
코드를 이해하는 데 있어 가장 중요한 요소는 바로 YOLOv8 task를 나타낸 딕셔너리 task_map이다.

TASK_MAP = {
    'classify': [
        ClassificationModel, yolo.v8.classify.ClassificationTrainer, yolo.v8.classify.ClassificationValidator,
        yolo.v8.classify.ClassificationPredictor],
    'detect': [
        DetectionModel, yolo.v8.detect.DetectionTrainer, yolo.v8.detect.DetectionValidator,
        yolo.v8.detect.DetectionPredictor],
    'segment': [
        SegmentationModel, yolo.v8.segment.SegmentationTrainer, yolo.v8.segment.SegmentationValidator,
        yolo.v8.segment.SegmentationPredictor],
    'pose': [PoseModel, yolo.v8.pose.PoseTrainer, yolo.v8.pose.PoseValidator, yolo.v8.pose.PosePredictor]}

task는 총 4가지로 classifiy, detect, segment, pose로 구분된다. task에 대한 각각의 trainer, validator, predictor 또한 추가적으로 정의되어 있다.

클래스를 정의하는 부분 중 가장 중요한 클래스 함수는 _new, _load 이다. new 함수에서는 모델을 새로 생성하며, load 함수는 기존에 사전 훈련된 가중치를 로드하여 모델을 생성한다. 이때, task에 따라 task에 맞는 모델 정의를 진행하며, 모델 정의 시 인자로 전달된 weight 값인 cfg를 다시 인자로 전달한다.
load함수에서는 weight 파일이 .pt 와 그 외 다른 경우일 경우로 구분한다. pt가 아닌 다른 경우일 경우에는 ckpt 값은 None으로 정의된다. 모델 생성을 한 후에는 overrides 딕셔너리에는 'model' : weights, 'task':task 값이 저장된다.

        # Train the model with 2 GPUs
        model.train(data='coco128.yaml', epochs=100, imgsz=640, device=[0, 1])

다음은 model train 과정에 관한 코드이다. 사용 예시 코드를 통해 알 수 있는 부분은 함수의 인자로 data, epochs, imgsz, device 등 다양한 파라미터 값을 전달할 수 있다. 코드에서는 이 부분은 **kwangs로 받게 된다. train 함수에서는 4가지 task 중 가장 먼저 classify task에 대한 작업을 먼저 실행하며(다시 살펴보니 model.py 코드에는 task를 정의하는 부분이 없음, task.py의 guess_model_task라는 함수에서 task를 추측하는 것으로 task 결정), 이때 classify object에 대한 trainer class 객체를 만들어 훈련을 진행한다.

ULTRALYTICS > ultralytics > nn > task.py

이전에 언급했던 task map 딕셔너리에서 모델을 생성하는 코드가 있는 task 파일이다. 여기서 네가지 task에 대한 basemodel 생성이 가능하다.

class DetectionModel(Basemodel)

task detection에 관한 모델 클래스이다. 4가지 task 모두 basemodel class를 상속받는다.

 def __init__(self, cfg='yolov8n.yaml', ch=3, nc=None, verbose=True):  # model, input channels, number of classes, weight는 yolov8n으로 default값
        super().__init__()
        self.yaml = cfg if isinstance(cfg, dict) else yaml_model_load(cfg)  # cfg dict -> 딕셔너리

        # Define model
        ch = self.yaml['ch'] = self.yaml.get('ch', ch)  # input channels #채널 값 얻어서 저장
        if nc and nc != self.yaml['nc']: #nc : number classes : 클래스 개수 저장
            LOGGER.info(f"Overriding model.yaml nc={self.yaml['nc']} with nc={nc}")
            self.yaml['nc'] = nc  # override yaml value
        self.model, self.save = parse_model(deepcopy(self.yaml), ch=ch, verbose=verbose)  # model, savelist
        self.names = {i: f'{i}' for i in range(self.yaml['nc'])}  # default names dict, names라는 딕셔너리에 각 클래스에 대한 라벨명 저장
        self.inplace = self.yaml.get('inplace', True)

        # Build strides
        m = self.model[-1]  # Detect() ? head의 detect 의미, yolo의 마지막 layer
        if isinstance(m, (Detect, Segment, Pose)): #m이 세가지 중 하나에 속하는 경우
            s = 256  # 2x min stride
            m.inplace = self.inplace
            forward = lambda x: self.forward(x)[0] if isinstance(m, (Segment, Pose)) else self.forward(x) #m이 segment, pose 중 하나에 속하는 경우
            m.stride = torch.tensor([s / x.shape[-2] for x in forward(torch.zeros(1, ch, s, s))])  # forward -> 채널 수 x grid x grid
            self.stride = m.stride
            m.bias_init()  # only run once
        else: #m이 classifier일 경우
            self.stride = torch.Tensor([32])  # default stride for i.e. RTDETR

        # Init weights, biases
        initialize_weights(self) #weight 값 초기화
        if verbose:
            self.info()
            LOGGER.info('')

근데 detection layer는 모델의 마지막 layer여서 그런지 이해하기가 어려워 일단 다른 task 모델의 코드 먼저 살펴보기로 하였다.

class ClassificationModel(BaseModel)

흠.. 이 코드 역시 어렵다. init py 에서의 중요한 함수는 _from_detection_model인데 함수의 인자는 nc = 1000, cutoff=10이다. 여기서 cutoff는 backbone 까지만 사용한다는 걸 의미하는 것 같은데 정확하게는 잘 모르겠다. 또, c라는 변수에는 Classify(ch, nc)로 채널 수와 클래스 수를 함수 인자로 분류를 진행하게 된다. Classify() 함수 코드는 다음과 같다.

class Classify(nn.Module):
    """YOLOv8 classification head, i.e. x(b,c1,20,20) to x(b,c2)."""

    def __init__(self, c1, c2, k=1, s=1, p=None, g=1):  # ch_in, ch_out, kernel, stride, padding, groups
        super().__init__()
        c_ = 1280  # efficientnet_b0 size
        self.conv = Conv(c1, c_, k, s, p, g)
        self.pool = nn.AdaptiveAvgPool2d(1)  # to x(b,c_,1,1)
        self.drop = nn.Dropout(p=0.0, inplace=True)
        self.linear = nn.Linear(c_, c2)  # to x(b,c2)

    def forward(self, x):
        """Performs a forward pass of the YOLO model on input image data."""
        if isinstance(x, list):
            x = torch.cat(x, 1)
        x = self.linear(self.drop(self.pool(self.conv(x)).flatten(1)))
        return x if self.training else x.softmax(1)

드디어 알만한 딥러닝 layer 구조가 나왔다. convolution layer, pooling layer, dropout layer를 정의한 후, convolution -> pooling flatten -> drop -> linear layer로 forword 과정을 거친다.
(그래도 뭔소리인지 모르겠다..)

ULTRALYTICS > ultralytics > yolo > v8 > classify > train.py

train.py는 이전 classificationModel을 정의한 후, trainer에 저장할 때 생성하는 클래스 객체이다. model.py에서 trainer 객체 생성 후 train 함수를 실행하는 것으로 모델의 훈련이 시작된다.

음.. 좀 잘못 공부하고 있었던 것 같다. task는 yolo 모델로 작업할 수 있는 task를 의미했고, detection 관련 코드만 공부해도 충분할 것 같다. 그럼 다음 시간에 다시 차근차근 공부해보는 걸로 하겠다.

임소현

이전 포스트

[YOLO] #2. YOLO v8 추가 공부하기

다음 포스트

[YOLO] #3. YOLOv8 코드 공부하기

YOLO

[YOLO] #2. YOLO v8 추가 공부하기

[YOLO] #4. YOLOv8 detection 코드 공부하기

0개의 댓글