Full Stack Deep Learning 강의를 듣고 정리한 내용입니다.
Deep layer with 8 layers
First convnet winner
Innovated with ReLU and Dropout
Heavy data augmentation(flip, scale, etc)
Classification: given an image, output the class of the object
Localization: Do classification, but also highlight where the object is in the image
Detection: given an image, output every object's class and location
Segmentation: label every pixel in the image as belonging to an object or the background
Using networks for Localization
- output bounding box coordinates(x1,y1,x2,y2) as well as the class of an object
- class 결과를 제공할 때 사용하는 network와 동일한 네트워크를 사용해 마지막 단계에서 coordinate 예측값도 산출하게 한다.
Using networks for Detection
- 몇 개의 object가 있는지 모르는 상태라서 Localization에 사용한 방법을 사용할 수 없음
- Solution: slide a classifier over the image(at multiple scales)
- VERY computationally expensive, but 해결방법 있음!
YOLO(You Only Look Once)
1. Put a fixed grid over an image, and within the grid find objects
2. Output class and box coordinates
3. Run non-maximum supression
- nice & fast, and is in active development!
Microsoft COCO: Common Objects in Context
- 이 dataset을 가지고 YOLO의 성능을 평가함.
- 330,000 images & 1.5 million object instances & 80 categories & some captions
이제까지는 이미지의 모든 부분을 관찰함.
R-CNN(Region-CNN)
Faster R-CNN
-Used convnet for the Regional Proposal Network
Mask R-CNN
-Each region goes in not only the classification but also the segmentation step