MM Detection Config 이해하기! (4)

먼지감자·2021년 10월 2일

인공지능

목록 보기

8/25

이 게시글에서는 coco format의 dataset이 준비되어 있다는 가정하에 나만의 config의 모델, backbone, scheduling등을 바꾸어 가며 학습해보는 방법을 함께 보겠습니다.
mmdetection에서 학습 과정을 wandb로 시각화하는 방법도 적겠습니다.

1. mmdetection clone

먼저 mmdetection repostiory를 clone합니다.

git clone https://github.com/open-mmlab/mmdetection.git

2. config 만들기

이전 게시글에서 config의 대분류는 datasets, model, scedule, runtime이라고 말씀드렸습니다. 저희가 만들 config도 이렇게 구성되어 있습니다.
먼저 mmdetection/configs아래에 새로운 폴더(이 게시물 에서는 mmdetection/configs/bcp2 입니다. )를 만들어줍니다.

config/_base_ 를 참고하여 만들겠습니다.

1. dataset.py
coco format에 onject detection을 하고 있기 때문에 coco_detection.py 를 그대로 복사하여 새로운 폴더아래에 붙여넣기 합니다. 주석으로 처리한 부분을 바꾸어 줍니다.

# dataset settings
dataset_type = 'CocoDataset'
data_root = '/opt/ml/detection/dataset' ## data set 위치 
classes = ("General trash", "Paper", "Paper pack", "Metal", "Glass", 
           "Plastic", "Styrofoam", "Plastic bag", "Battery", "Clothing") ## class 정의

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(512,512), keep_ratio=True), ## image size 변경 
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(512,512), ## image size 변경 
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    samples_per_gpu=4, ## gpu당 batch사이즈 몇으로 할건지 , 2->4 
    workers_per_gpu=2, # data loader 를 만들때 worker개수 선언해주는 것과 동일
    train=dict(
        type=dataset_type,
        ann_file=data_root + '/new_train.json', ## train annotation file 위치
        img_prefix=data_root, ## data root 위치
        classes = classes, ## classes 추가
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + '/new_val.json', ## validation annotation file 위치
        img_prefix=data_root, ## data root 위치
        classes = classes, ## classes 추가
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + '/test.json', ## test annotation file 위치
        img_prefix=data_root , ## data root 위치
        classes = classes, ## classes 추가
        pipeline=test_pipeline))

evaluation = dict(interval=1, metric='bbox')

2. schedule_1x.py
마찬가지로 base에 있는 scheduler를 복사하여 가지고 옵니다.
_1x는 epoch 12번, _2x는 epoch 24번, _20e는 epoch 20번을 의미합니다.

# optimizer
optimizer = dict(type='SGD', lr=0.002, momentum=0.9, weight_decay=0.0001) ## lr 줄임 
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step', # 어떤 scheduler 를 쓸건지
    warmup='linear', # warmup을 할건지
    warmup_iters=500, # warmup iteration 얼마나 줄건지
    warmup_ratio=0.001, 
    step=[8, 11]) # step은 얼마마다 밟은 건지
runner = dict(type='EpochBasedRunner', max_epochs=12)

3. default_runtime.py
마찬가지 default_runtime을 그대로 복사합니다.
이때 hook 안에 WandbLoggerHook을 추가해주고 project, entity, name을 설정하면 wandb에서 학습과정을 시각화할 수 있습니다.

checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
    interval=500,
    hooks=[
        dict(type='TextLoggerHook', interval=500),
        dict(type='WandbLoggerHook',interval=1000,
            init_kwargs=dict(
                project='PROJECT 이름',
                entity = 'ENTITY 이름',
                name = '실험할때마다 RUN에 찍히는 이름'
            ),
            )
    ])
# yapf:enable
custom_hooks = [dict(type='NumClassCheckHook')]

dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
# 1 epoch에 train과 validation을 모두 하고 싶으면 workflow = [('train', 1), ('val', 1)]

4. model-> cascade_rcnn_r50_fpn
model은 configs/_base_/models 안에서 사용할 모델을 그대로 복사해옵니다. 이 게시물에서는 cascade_rcnn_r50_fpn을 사용하겠습니다.
여기서는 num_classes를 사용할 데이터셋의 class수에 맞게 모두 수정해줍니다.

# model settings
model = dict(
    type='CascadeRCNN',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[.0, .0, .0, .0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
    roi_head=dict(
        type='CascadeRoIHead',
        num_stages=3,
        stage_loss_weights=[1, 0.5, 0.25],
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=[
            dict(
                type='Shared2FCBBoxHead',
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=10,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=True,
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
                               loss_weight=1.0)),
            dict(
                type='Shared2FCBBoxHead',
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=10,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.05, 0.05, 0.1, 0.1]),
                reg_class_agnostic=True,
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
                               loss_weight=1.0)),
            dict(
                type='Shared2FCBBoxHead',
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=10,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.033, 0.033, 0.067, 0.067]),
                reg_class_agnostic=True,
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
        ]),
    # model training and testing settings
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=[
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.5,
                    neg_iou_thr=0.5,
                    min_pos_iou=0.5,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.6,
                    neg_iou_thr=0.6,
                    min_pos_iou=0.6,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,
                    neg_iou_thr=0.7,
                    min_pos_iou=0.7,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False)
        ]),
    test_cfg=dict(
        rpn=dict(
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.05,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100)))

5. final.py
4가지 config를 모두 작성했으면 이를 합쳐주는 config파일이 필요합니다. final.py 파일을 만들고, 기존에 작성된 mmdetection/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.py 파일을 참고하겠습니다.

_base_ = [
    '../_base_/models/cascade_mask_rcnn_r50_fpn.py',
    '../_base_/datasets/coco_instance.py',
    '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'
]

모델, 데이터셋, 스케쥴러, 런타임 config를 모두 묶어 놓은 것을 볼수 있습니다. 그대로 복사해서 final.py에 붙여넣어 줍니다. final.py가 다른 config파일들과 동일한 폴더 안에 있으니 다음과 같이 수정해줍니다.

_base_ = [
    'cascade_rcnn_r50_fpn.py',
    'dataset.py',
    'schedule_1x.py',
    'default_runtime.py'
]

train
train은 mmdetecion에 미리 정의된 train.py를 사용하겠습니다.
cd 명령어로 mmdetection 까지 들어간 후
python tools/train.py configs/bcp2/final.py 명령어로 학습을 시켜줍니다.

backbone 바꾸기

mmdet/models/backbones/_init_.py에 들어가보면 backbone으로 사용할 수 있는 모델이 모두 등록되어 있습니다. 자세한 구현은 각 모델.py에서 확인할 수 있습니다.
모델.py에 들어가보면 backbone으로 사용할 수 있는 모델이 정의되어 있고, @BACKBONES.register_modelus()로 모듈 등록도 되어있습니다.

위에서 사용한 cascade_rcnn 모델의 경우 cascade_rcnn_r50_fpn.py 를 보면 backbone이 Resnet으로 정의되어 있습니다.
이를 sota를 달성한 swin Transformer로 바꾸어 보겠습니다.
직접 바꾸어 줘도 되지만, 이 게시물에서는 final.py에서 덮어쓰는 방법을 사용하겠습니다.

mmdetection/configs/swin 에서 mask_rcnn_swin-t-p4-w7_fpn_1x_coco.py 파일을 들어가 model 부분을 복사하여 final.py에 붙여넣습니다. type은 주석처리 해줍니다. _delete_ =True는 기존의 backbone을 삭제하고 SwinTransformer를 사용하겠다는 의미입니다.
이 인자는 final에서 optimizer, lr_config 등 을 덮어쓰면서 바꾸고자 할 때 기존에 선언된 config 를 초기화하고 바꿀때 사용합니다. 만약 _delete_ =True 를 추가하지 않는다면 기존 config 파일에 선언된 optimizer 에 final.py에서 선언된 optimizer 인자가 추가로 등록됩니다.

# final.py
_base_ = [
    'cascade_rcnn_r50_fpn.py',
    'dataset.py',
    'schedule_1x.py',
    'default_runtime.py'
]

# backbone 변경시 final.py 에서 덮어 쓰는 방법
pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth'  # noqa
model = dict(
    # type='MaskRCNN', ## 모델은 cascade_rcnn을 사용하기 때문에 주석처리
    backbone=dict(
        _delete_=True,  ## 기존에 백본을 Resnet을 썼는데 Swin으로 쓰겠다. lr과 같은 다른 config에도 같이 사용이 가능한 인자
        type='SwinTransformer',
        embed_dims=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=7,
        mlp_ratio=4,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.,
        attn_drop_rate=0.,
        drop_path_rate=0.2,
        patch_norm=True,
        out_indices=(0, 1, 2, 3),
        with_cp=False,
        convert_weights=True,
        init_cfg=dict(type='Pretrained', checkpoint=pretrained)),
    neck=dict(in_channels=[96, 192, 384, 768])
        # [256, 512, 1024, 2048] = 기존의 in_channels . backcone마다 channel이 다르다.    
)

저장하고 다시 train을 돌리면 됩니다.

conda에 설치된 mmdet이 서버로 설치된 mmdet인 경우, local에서 바꾸어준 mmdet이 업데이트되지 않기 때문에 삭제후 다시 설치해주어야 합니다.
mmdetection의 상위 폴더에서
pip uninstall mmdet 명령어를 친후
cd mmdetection 으로 다시 mmdetection으로 들어가고,
pip install -v -e .으로 local에 있는 mmdet을 다시 설치해주고 train을 하면 됩니다.

이렇게 mmdetection을 사용할 때 model, dataset, scheduler, runtime config를 변경해서 학습하는 방법을 알아보았습니다.
더욱 자세한 내용은 공식문서에서 확인할 수 있습니다.

먼지감자

ML/AI Engineer

이전 포스트

MM Detection Config 이해하기! (3)

2개의 댓글

HeungJun Kim

2022년 4월 14일

MM Detection 정리를 잘해 주셔서 잘 읽고 있습니다!
이제까지 cp로 콘다 환경에 mmdet 붙여 넣었는데, pip install -v -e . 이렇게 하면되는 군요!
감사합니다.

1개의 답글