15. Object Detection
- Approaches
- Two-stage model
consists of a region proposal module and a recognition module
- R-CNN, Fast R-CNN, Faster R-CNN
 
 
- One-stage model
removes the proposal generating module and predicts object positions directly 
 
15.1. Two-stage model
(참고: zzwon1212 - R-CNN)
(참고: zzwon1212 - Fast R-CNN)
### 15.1.3. Mask R-CNN
- RoIAlign
Instead of simply taking max over uniformly split RoI, use linear interpolation to better estimate feature values at each cell.

- Stage 1: Training RPN (Region Prposal Network)


- For each position in the last shared conv feature map, suppose a set of candidate bounding box called anchor.
 
- We will predict 2 scores (object or not) and 4 coordinates for each anchor.
 
- apply n×n convolution on the conv feature map to get new feature map with 256 channels.
 
- apply two separate 1×1 convolution on the new feature map to get cls feature map with 2k channels and reg feature map with 4k channels.
 
- Each position in cls feature map and reg feature map has scores and coordinates for k anchors for that position.
 
 
15.2. One-stage model
(참고: zzwon1212 - YOLO)
15.2.2. SSD


- 
Main idea
- Each cell in earier conv layer looks narrow range of the image. So it can detect small objects.
 
- Each cell in later conv layer looks wider range of the image. So it can detect large objects.
 
 
- 
Loss



 
- 
Results

- Accuracy: Fast R-CNN < Faster R-CNN < SSD

 
- FPS: Faster R-CNN < SSD < YOLO
 
 

📙 강의