[U] Week 3 Day 13

나며기·2021년 2월 3일

부스트캠프 AI Tech

목록 보기

14/79

강의 복습 내용

[DAY 13] Convolutional Neural Networks

[DLBasic] CNN - Convolution은 무엇인가?

Convolutional Neural Networks
- CNN consists of convolution layer, pooling layer, and fully connected layer.
  - Convolution and pooling layers: feature extraction
  - Fully connected layer: decision making (e.g., classification)
Stride
Padding
Stride & Padding

[DLBasic] Modern CNN - 1x1 convolution의 중요성

ILSVRC(ImageNet Large-Scale Visual Recognition Challenge)
- Classification / Detection / Localization / Segmentation
- 1,000 different categories
- Over 1 million images
- Training set: 456,567 images
AlexNet
- Key ideas
  - Rectified Linear Unit (ReLU) activation
  - GPI implementation (2 GPUs)
  - Local response normalization, Overlapping pooling
  - Data augmentation
  - Dropout
- ReLU Activation
  - Preserves properties of linear models
  - Easy to optimize with gradient descent
  - Good generalization
  - Overcome the vanishing gradient problem
VGGNet
- Increasing depth with 3 × 3 convolution filters (with stride 1)
- 1x1 convolution for fully connected layers
- Dropout (p=0.5)
- VGG16, VGG19
- Why 3 × 3 convolution?
GoogLeNet
- GoogLeNet won the ILSVRC at 2014
  - It combined network-in-network (NiN) with inception blocks.
- Inception blocks
  - What are the benefits of the inception block?
    - Reduce the number of parameter.
  - How?
    - Recall how the number of parameters is computed.
    - 1x1 convolution can be seen as channel-wise dimension reduction.
- Benefit of 1x1 convolution
  - 1x1 convolution enables about 30% reduce of the number of parameters!
Quiz
- Which CNN architecture has the least number of parameters?
  1. AlexNet (8-layers) (60M)
  2. VGGNet (19-layers) (110M)
  3. GoogLeNet (22-layers) (4M)
- The answer is GoogLeNet.
ResNet
- Deeper neural networks are hard to train.
  - Overfitting is usually caused by an excessive number of parameters.
  - But, not in this case.
- Add an identity map (skip connection)
- Add an identity map after nonlinear activations:
- Batch normalization after convolutions:
- Performance increases while parameter size decreases.
DenseNet
- DenseNet uses concatenation instead of addition.
- Dense Block
  - Each layer concatenates the feature maps of all preceding layers.
  - The number of channels increases geometrically.
- Transition Block
  - BatchNorm -> 1x1 Conv -> 2x2 AvgPooling
  - Dimension reduction
Summary
- VGG: repeated 3x3 blocks
- GoogLeNet: 1x1 convolution
- ResNet: skip-connection
- DenseNet: concatenation

[DLBasic] Computer Vision Applications

Fully Convolutional Network
- This is how an ordinary CNN looks like.
- This is a fully convolutional network.
- Transforming fully connected layers into convolution layers enables a classification net to output a heap map.
- While FCN can run with inputs of any size, the output dimensions are typically reduced by subsampling.
- So we need a way to connect the coarse output to the dense pixels.
Deconvolution (conv transpose)
Detection
- R-CNN
  1. takes an input image,
  2. extracts around 2,000 region proposals (using Selective search),
  3. compute features for each proposal (using AlexNet), and then
  4. classifies with linear SVMs.
- SPPNet
  - In R-CNN, the number of crop/warp is usually over 2,000 meaning that CNN must run more than 2,000 times (59s/image on CPU).
  - However, in SPPNet, CNN runs once.
- Fast R-CNN
  1. Takes an input and a set of bounding boxes.
  2. Generated convolutional feature map
  3. For each region, get a fixed length feature from ROI pooling
  4. Two outputs: class and bounding-box regressor.
  - Faster R-CNN = Region Proposal Network + Fast R-CNN
- YOLO
  - YOLO (v1) is an extremely fast object detection algorithm.
    - baseline: 45fps / smaller version: 155fps
  - It simultaneously predicts multiple bounding boxes and class probabilities.
    - No explicit bounding box sampling (compared with Faster R-CNN)
  - Given an image, YOLO divides it into SxS grid.
    - If the center of an object falls into the grid cell, that grid cell is responsible for detection.
  - Each cell predicts B bounding boxes (B=5).
    - Each bounding box predicts
      - box refinement (x / y / w / h)
      - confidence (of objectness)
  - Each cell predicts C class probabilities.
  - In total, it becomes a tensor with SxSx(B*5+C) size.
    - SxS: Number of cells of the grid
    - B*5: B bounding boxes with offsets (x,y,w,h) and confidence
    - C: Number of classes

[DLBasic] CNN - 강아지 종류 분류하기

Self-study guide
- Dog breed 데이터셋의 다운로드부터 Dataloader 생성까지의 전 과정을 기존의 CNN 모델 파일(py) 파일 수정하여 작성해 볼 것

[DLBasic] CNN - 나만의 데이터셋 만들기

Self-study guide
- 팀별로 수집할 데이터 주제를 선정한다.
- 구글을 통해 관련된 데이터를 다운로드 받는다.
- 같은 class의 데이터를 폴더별로 모은다.
- 해당 데이터중 관련이 없는 데이터를 삭제하거나 새로운 분류를 만들어 따로 모은다.
- CNN 모델을 만들어 학습한다.

Further Question

수업에서 다룬 modern CNN network의 일부는, Pytorch 라이브러리 내에서 pre-trained 모델로 지원합니다. pytorch를 통해 어떻게 불러올 수 있을까요?

피어 세션 정리

강의 리뷰 및 Q&A

[DLBasic] CNN - Convolution은 무엇인가?
[DLBasic] Modern CNN - 1x1 convolution의 중요성
[DLBasic] Computer Vision Applications
[DLBasic] CNN - 강아지 종류 분류하기
[DLBasic] CNN - 나만의 데이터셋 만들기

과제 진행 상황 정리 & 과제 결과물에 대한 정리

[DLBasic] CNN Assignment

CNN(Convolutional Neural Network)에 대한 것으로, 어렵지 않게 해결했습니다.

총평

이론 공부도 재미있지만, 역시 저는 어떤 목적을 가지고 문제를 해결할 때가 가장 즐겁고 많이 배우는 것 같습니다.
그래서 오늘 강의에서 언급된 Self-study를 위한 실습이 너무 반가웠습니다.
해당 실습을 통해서, PyTorch를 가지고 놀다 보면 금방 익숙해질 것 같습니다.
물론, 데이터 수집부터 전처리, 모델링까지 해야 할 일이 더 늘었지만, 다음 주를 이용하면 어떻게든 해낼 수 있다고 생각합니다.

오늘보다 더 성장한 내일의 저를 기대하며, 내일 뵙도록 하겠습니다.

읽어주셔서 감사합니다!

나며기

PLUS ULTRA

이전 포스트

[U] Week 3 Day 12

다음 포스트