15. Object Detection
- Approaches
- Two-stage model
consists of a region proposal module and a recognition module
- R-CNN, Fast R-CNN, Faster R-CNN
- One-stage model
removes the proposal generating module and predicts object positions directly
15.1. Two-stage model
(참고: zzwon1212 - R-CNN)
(참고: zzwon1212 - Fast R-CNN)
### 15.1.3. Mask R-CNN
- RoIAlign
Instead of simply taking max over uniformly split RoI, use linear interpolation to better estimate feature values at each cell.
![](https://velog.velcdn.com/images/zzwon1212/post/d39b572a-63b5-435f-9a84-586de5b54bee/image.png)
- Stage 1: Training RPN (Region Prposal Network)
![](https://velog.velcdn.com/images/zzwon1212/post/c0f8f0ef-c83d-45c6-a99e-87916872b746/image.png)
![](https://velog.velcdn.com/images/zzwon1212/post/7103ce6a-b822-4e4a-9b3f-539517da33ba/image.png)
- For each position in the last shared conv feature map, suppose a set of candidate bounding box called anchor.
- We will predict 2 scores (object or not) and 4 coordinates for each anchor.
- apply n×n convolution on the conv feature map to get new feature map with 256 channels.
- apply two separate 1×1 convolution on the new feature map to get cls feature map with 2k channels and reg feature map with 4k channels.
- Each position in cls feature map and reg feature map has scores and coordinates for k anchors for that position.
15.2. One-stage model
(참고: zzwon1212 - YOLO)
15.2.2. SSD
![](https://velog.velcdn.com/images/zzwon1212/post/7951f393-3b2c-4a14-8121-69c7dfd03a01/image.png)
![](https://velog.velcdn.com/images/zzwon1212/post/ee199b65-9a9c-4de2-b8ef-43932e396f51/image.png)
-
Main idea
- Each cell in earier conv layer looks narrow range of the image. So it can detect small objects.
- Each cell in later conv layer looks wider range of the image. So it can detect large objects.
-
Loss
![](https://velog.velcdn.com/images/zzwon1212/post/ab13e05e-b847-4795-b654-272c9c4ee727/image.png)
![](https://velog.velcdn.com/images/zzwon1212/post/9ebae906-b479-4cbe-abc4-495cafc20b98/image.png)
![](https://velog.velcdn.com/images/zzwon1212/post/03c673c1-85bf-46e8-b983-398beeda84dd/image.png)
-
Results
![](https://velog.velcdn.com/images/zzwon1212/post/0a79af43-c8a4-435e-89de-d84f1437894a/image.png)
- Accuracy: Fast R-CNN < Faster R-CNN < SSD
![](https://velog.velcdn.com/images/zzwon1212/post/f49d89d7-0b32-48cf-a895-141a78608a7f/image.png)
- FPS: Faster R-CNN < SSD < YOLO
![](https://velog.velcdn.com/images/zzwon1212/post/2fa747ca-4f05-4eea-b241-357e97192d65/image.png)
📙 강의