15. Object Detection
- Approaches
- Two-stage model
consists of a region proposal module and a recognition module
- R-CNN, Fast R-CNN, Faster R-CNN
- One-stage model
removes the proposal generating module and predicts object positions directly
15.1. Two-stage model
(참고: zzwon1212 - R-CNN)
(참고: zzwon1212 - Fast R-CNN)
### 15.1.3. Mask R-CNN
- RoIAlign
Instead of simply taking max over uniformly split RoI, use linear interpolation to better estimate feature values at each cell.

- Stage 1: Training RPN (Region Prposal Network)


- For each position in the last shared conv feature map, suppose a set of candidate bounding box called anchor.
- We will predict 2 scores (object or not) and 4 coordinates for each anchor.
- apply n×n convolution on the conv feature map to get new feature map with 256 channels.
- apply two separate 1×1 convolution on the new feature map to get cls feature map with 2k channels and reg feature map with 4k channels.
- Each position in cls feature map and reg feature map has scores and coordinates for k anchors for that position.
15.2. One-stage model
(참고: zzwon1212 - YOLO)
15.2.2. SSD


-
Main idea
- Each cell in earier conv layer looks narrow range of the image. So it can detect small objects.
- Each cell in later conv layer looks wider range of the image. So it can detect large objects.
-
Loss



-
Results

- Accuracy: Fast R-CNN < Faster R-CNN < SSD

- FPS: Faster R-CNN < SSD < YOLO

📙 강의