CNN - 시각적 이해를 위한 머신러닝 2

zzwon1212·2024년 7월 11일

딥러닝

목록 보기

26/33

Main assumption: 아래 두 가지 덕분에 FC layer에 비해 연산량이 급격히 줄어든다.
- Spatial locality
  Each filter looks at nearby pixels only.
- Positional invariance
  Same filters are applied to all locations in the image.
  → 반면, X-ray 같은 경우, 폐의 위치가 정해져 있으므로 모든 곳을 볼 필요가 없다. 이렇게 domain knowledge를 활용하여 연산량을 줄이는 등 여러 효과를 기대할 수 있다.
Same padding
"첫 픽셀이 필터의 정중앙에 위치하게 하려면?"으로 접근하면 계산이 쉬워진다.
- $F = 3$ → $P = 1$
- $F = 5$ → $P = 2$
- $F = 7$ → $P = 3$

AlexNet & ZFNet
VGGNet
- $3 \times 3$ conv filters only.
  - Stacked $n$ $3 \times 3$ conv (stride 1) layers has the same effective receptive field as $1$ $(2n + 1) \times (2n + 1)$ conv layer.
  - $O(n^2)$ vs. $O(n)$ where $n$ is the receptive filed size.
    - With 1 layer of $7 \times 7$ filter: $1 \times (7 \times 7) \times C^2 = 49C^2$
    - With 3 layer of $3 \times 3$ filter: $3 \times (3 \times 3) \times C^2 = 27C^2$
    - The difference becomes more dramatic for larger receptive fileds.
  - With more layers, we have more nonlinearities, with a better expressive power of more complicated relationships.
- Most memory is consumed in early Conv.
- Most parameters are in late FC.
GoogLeNet (참고: zzwon1212 - GoogLeNet (1 x 1 convolution and Inception network
))
- Multiple receptive filed sizes for conv $(1 \times 1, 3 \times 3, 5 \times 5)$ . Better performance.
- Improved efficiency by using Inception module (using $1 \times 1$ conv) and by avoiding expensive FC layers.
ResNet (참고: zzwon1212 - ResNet (Deep Residual Learning for Image Recognition))
Inception-v2, 3, 4
ResNeXt
DenseNet
MobileNets

📙 강의

JUST DO IT.