[부스트캠프 AI tech Det] week10 (2022.03.21) Overview, two-stage detectors

redgreen·2022년 3월 21일

부스트캠프 AI tech 3기

목록 보기

29/40

1. History

2. Evaluation

mAP(mean Average Precision)
: 각 클래스당 AP의 평균

Precision
: $TP \over All \; Detections$

Recall
: $TP \over All \; Ground \; truth$

PR Curve
: 모든 예측에 대해 confidence를 내림차순으로 정렬하여 누적 Precision과 누적 Recall을 구하여 그린 그래프

AP(Average Precision)

: PR Curve의 아랫 면적 값 (0 ~ 1 사이)

mAP(mean Average Precision)
: $mAP = {1 \over n}\sum\limits_{k=1}^{k=n}AP_k$
$AP_k = the \; AP \; of \; class \; k$
$n = the \; numeber \; of \; classes$

IOU(Intersection Over Union)

: $IOU = {overlapping \; region \over combined\; region}$
: IOU가 일정 기준(ex 0.5) 이상일 때 True라고 판정

mAP와 IOU의 기준 값을 정하여 다음과 같이 사용

mAP50

mAP60

mAP70
...

FPS(Frame Per Second)
: 초당 처리할 수 있는 이미지 숫자. 클 수록 모델이 빠름

FLOPs(Floateing Poing opertaions)

: model이 얼마나 빠르게 동작하는지 측정하는 metric
: 연산량 횟수(+, *, - 등)

3. Library

MMDetection
: pytorch 기반 object detection 오픈소스
: MMCV, MMClassification, MMDetection3D, MMSegmentation 등이 있음

Detectron2
: 페이스북 AI 리서치의 라이브러리
: Object Detection과 Segmentation 알고리즘을 제공

YOLOv5
: coco 데이터셋으로 사전학습된 모델

EfficientDet
: Google Research & Brain에서 연구한 모델로 EfficientNet을 응용해 만든 Detection 모델
: Tensorflow, pytorch 가능

4. Models

4.1 R-CNN

1. input image
2. Extract Region Proposals(selective search), 2000개의 RoI 추출
3. warp(RoI를 동일한 크기로 맞춰줌)
4. CNN(AlexNet) 4096dim feature 추출
5. SVM에서 class와 confidence score 예측
6. CNN feature를 regression을 이용해 bbox예측

Training

Ground truth: positive samples

IoU < 0.3: negative samples

Positive samples 32, negative samples 96

Hard negative mining:
Hard negative: False negative
배경으로 식별하기 어려운 샘플들을 강제로 다음 배치의 negative sample로 mining하는 방법

R-CNN의 한계점

CNN의 입력크기가 고정되어 있음 - 이미지를 고정된 크기로 자르거나(crop) 비율을 조정(warp)해야함

RoI마다 CNN통과 - 하나의 이미지에 대해서 2000번의 CNN을 통과해야함

4.2 SPPNet

한 번의 CNN 연산 후 2000개의 Region을 추출

CNN을 통과한 feature map에서 region proposal(ex. selective search)을 적용하여 RoI 추출

warp를 spatial pyramid pooling으로 대체

Spatial Pyramid Pooling

cnn에서 얻은 feature map에서 2000개의 RoI를 추출하고 feature map에 다양한 사이즈의 pooling을 적용한 후 concat하여 동일한 input size를 만듦

ex.

window size와 stride를 RoI 크기에 맞게 조정해줌으로써 같은 크기의 feature를 얻을 수 있음.
참고블로그

4.3 Fast R-CNN

CNN - RoI Projection - RoI Pooling - FC - prediction

VGG-16 사용

원본이미지에서 얻은 RoI를 RoI Projection을 통해 feature map에 적용

RoI Pooling
: SPPNet의 spatial pyramid pooling과 비슷함
: 한 사이즈로만 pooling을 한다는 점이 다름

RoI Pooling의 결과를 FC layer에 feeding하여 class와 bbox를 예측

multi task Loss 사용

classification(Cross entropy)

BB regressor(Smooth L1): outlier에 덜 민감

Dataset 구성

IoU > 0.5: positive samples

0.1 < IoU < 0.5: negative samples

positive samples 25%, negative samples 75%

sampling:

R-CNN의 경우 이미지에 존재하는 RoI를 전부 저장해 사용해서 한 배치에 서로 다른 이미지의 RoI가 포함됨

Fast R-CNN의 경우 한 배치에 한 이미지의 RoI만을 포함

한 배치 안에서 연산과 메모리를 공유

4.4 Faster R-CNN

참고블로그

Selective Search를 RPN(Region Proposal Network)로 대체

CNN(한번만사용) - RPN을 통해 RoI계산, Anchor box

Anchor box
: CNN으로 얻은 feature map에서 cell( $w$ x $h$ 개)마다 $k$ 개의 anchor box를 만듦

RPN
: anchor box가 object를 포함하고 있는지 여부( $class$ )와 위치를 미세조정(regression)하는 역할
: class score를 얻기 위해 feature map에 1x1 conv연산을 적용
: object 포함 여부를 위해 channel은 2(True/False) x 9(anchor box)로 설정
: bbox 위치를 얻기위해 channel 은 4(w,h,x,y) x 9(anchor box)로 설정
: RPN은 class score를 반환함

NMS(Non Maximum suppression)
: 유사한 RPN Proposals를 제거하기 위해 사용
: class score 기준으로 proposals 분류
: IoU가 0.7 이상인 proposals 영역들은 중복된 영역으로 판단한 뒤 제거

Training

IoU > 0.7 or highest IoU with GT: positive samples

IoU < 0.3: negative samples

그 외는 학습데이터로 사용 안함

IoU > 0.5: positive samples → 32개

IoU < 0.5: negative samples → 96개

128개의 samples로 mini-bath 구성

4 steps alternative training 활용 매우 복잡해서 최근에는 Approximate Joint Training 활용

Loss
: $p^*_i$ : $i$ 번 째 anchor box가 객체를 포함하고 있는지 아닌지에 대한 인덱스 지표(객체포함:1, 안포함:0)
: $p^*_i == 1$ , 즉, 객체를 포함하고 있는 anchor box만 regression을 수행함