AI_Tech부스트캠프 week8...[1]Object_Dectection overview

Leejaegun·2024년 9월 30일

NaverAIBoostCamp

AI_tech_CV트랙 여정

목록 보기

20/74

0. Intro

1. Object Detection

1.1 Task

Classification : object 전체에 대해 하나의 레이블을 할당하는 작업
Object Detection : 객체 검출은 이미지 내에서 여러 객체를 식별하고 각 객체의 위치를 찾아내는 작업

Semantic Segmentation : 객체에 영역을 구분하는 것. 객체에 영역구분. 단 같은 class 라면 따로 구분하지는 않음.
Instance Segmentation : Instance는 semantic 에서 object detection 이 추가. 구분을 하고 어떠한 객체인지도 구분. 즉, 같은 class 내에세도 구분함.

Object Detection History

이번 Object Detection 강의를 통해, 모두 배워볼 수 있다. 그리고 관련 논문도 모두 첨부할 것이니, 제대로 공부하고 싶다면 논문리뷰를 따로 해도 좋다.

1.2 Evaluation

1.2.1 성능

mAP(mean Average Precision): 각 클래스당 AP의 평균

mAP를 계산하기 위한 개념
① Confusion Matrix

② Precision & Recall

Precision = $\frac{TP}{TP+FP} =\frac{TP}{All\ Detection}$

👉예측한 것들 중에서, 정말 맞은것

Recall = $\frac{TP}{TP+FN} = \frac{TP}{All\ Ground\ truths}$

👉정답인 것들 중에서, 정말 맞은것

③ PR curve(Precision-Recall Curve)
모든 예측에 대해서 confidence score 을 쭉 정렬을 했을때, 각각의 누적 TP와 FP를 구하고 x축에 Recall, y축에 Precision 적은 것.

④ AP(Average Precision)

PR curve 를 그리고 PR curve 의 아랫면적으로 계산.

mAP(mean Average Precision) : 계산된 AP의 평균값.
$mAP = \frac{1}{n} \displaystyle \sum^{k=n}_{k=1} AP_k$
$AP_k$ = The AP of class K
n = the number of classes

⑤ IOU(intersection Over union)

: Detection 은 clasification 과 다르게 Bounding box까지 같이 예측하므로, 이 예측한 box 가 groundtruth 가 맞는지 아는지 까지 파악.
즉, IOU는 예측된 경계 상자(Predicted Bounding Box)와 실제 경계 상자(Ground Truth Bounding Box) 사이의 겹치는 정도를 측정하는 지표

mAP in Object Detection 에서

mAP50
mAP60... 이렇게 있을텐데,
여기 Groundtruth 가 mAP50은 50% , mAP60은 60% 만 보겠다. 즉,
mAP(mean Average Precision)에서 mAP50, mAP60 등의 표기는 IoU(Intersection over Union) 임계값을 나타낸다.

1.2.2 속도

① FPS(Frames per second) : 초당 프레임

② FLOPs(Floating Point Operations)
연산당 횟수를 나타내는 것으로, Model 이 얼마나 빠르게 동작하는지 측정하는 Metric 이다.
예시로,

3x2, 2x3 행렬이 있다고 하자. 이때 곱셉은 3x3x2 = 18번,
덧셈은 3x3x1 = 9번이다.
따라서 FLOPs = 18+9 = 27이다.

1.3 Library

① MM_Detection

MM_Detection은 다양한 객체 검출 알고리즘을 지원하는 오픈 소스 프레임워크

② Detectron2

Detectron2는 Facebook AI Research에서 개발한 객체 검출 프레임워크

③ YOLOv5

YOLOv5는 실시간 객체 검출에 특화된 프레임워크. (특히 YOLO는 1 stage Detector로서 뛰어나다)

④ EfficientDet

EfficientDet은 Google Research & Brain에서 연구한 모델로 EfficientNet을 응용해 만든 Object
Detection 모델
Tensorflow로 제공되는 EfficientDet을 사용할 수 있으며, 깃헙에 pytorch 기반으로 구현된 라이브러리 역시 존재

2. Object Detection Domain 특성

통합된 Library 부재. -> 따라서 처음에 Detection library 를 잘 선택해야 하고, 코딩능력이 매우 중요함
엔지니어링적인 측면이 강함
복잡한 파이프라인
높은 성능을 내기 위해선 무거운 모델 사용.
resolution 이 성능에 영향을 많이 끼치미로, 사진의 크기가 크다.
3. 참조자료.
1) Kedar Potdar, A Convolutional Neural Network based Live Object Recognition System as Blind Aid
2) OCR, https://nanonets.com/blog/deep-learning-ocr/
3) CCTV, https://www.youtube.com/watch?v=0hWW6FVcFAo&t=3s
4) Global wheat detection, https://www.kaggle.com/c/global-wheat-detection/overview
5) Vinbigdata, https://www.kaggle.com/c/vinbigdata-chest-xray-abnormalities-detection
6) Ze Liu, “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows”
7) FPS, https://www.techsmith.com/blog/frame-rate-beginners-guide/
8) MMDetection, https://github.com/open-mmlab/mmdetection
9) Chaoxu Guo, “AugFPN: Improving Multi-scale Feature Learning for Object Detection”
10) Detectron, https://github.com/facebookresearch/detectron2
11) YOLOv5, https://github.com/ultralytics
12) Google Research, Brain Team, “EfficientDet: Scalable and Efficient Object Detection”
13) Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection”