Convolutional Neural Networks

‍이세현·2024년 12월 3일

Convolutional Network

Linear Classifier의 한계: 이미지의 공간적 특성을 고려하지 않는다.
- 이미지 연산에 알맞는 연산이 필요하다.
- Fully-Connected Network의 구성요소: Fully-Connected Layers, Activation Function
Convolutional Network의 구성요소: Fully-Connected Layers, Activation Function, Convolution Layers, Pooling Layers, Normalization
- Fully-Connected Layer: $[32\times32\times3]$ image를 $[3072\times1]$ 로 펼친다.
  - Weights: $[10\times3072]$ , Output: $[10\times1]$
Convolution Layer: 공간적 구조를 유지한다.

Filter: 3 채널의 filter로 1회 연산하면 모든 채널을 검사하므로 r, g, b 각각의 연산하여 더한 결과가 나온다. 따라서 연산을 위해서는 filter의 두께를 입력 이미지와 동일하게 설정해야 한다.
Filter를 image 위에서 딱 한 번 포개어 연산하면 scalar 하나가 도출된다. 3 채널 filter로 1회 연산하면 1 채널 map이 출력된다.
- $[3\times5\times5]$ filter와 연산하는 것은 75 크기의 벡터와 내적하는 것과 같다.
Filter의 개수는 출력 map의 채널과 같다.
- $[C_{in}\times H\times W]$ image, $[C_{out}\times C_{in}\times K_w\times K_h]$ filter $\rightarrow [C_{out}\times H'\times W']$
- CNN model은 $[C_{out}\times C_{in}\times K_w\times K_h]$ 크기의 filter와 bias이다.
CNN Filter를 펼치면 vector 내적과 같으므로 linear 연산이다. 따라서 결합벅칙이 성립되고 Convolution Layers를 연속으로 쌓아도 선형성이 있으므로 activation function이 필요하다.
Filter Size가 이미지만큼 크면 linear classifier의 template matching과 동일하다.
- Filter는 한 번에 국소적 정보만 훑는다.
- Kernel size가 1이면 채널 정보는 계산되지만 공간 정보가 담기지 않는다.
  - size 1의 kernel은 채널 크기를 바꾸기 위해 사용한다.

Dimension $\text{Output}: \frac{(W-K+2P)}{S}+1$
Receptive Fields
- $[3\times3]$ 크기의 filter를 사용할 때 하나의 output pixel을 총 9개의 정보로 결정된다. 이때 receptive field는 $[3\times3]$ 이다.
- $L$ 개의 레이어에서 $K$ 크기의 필터를 적용했을 때 receptive fields는 $1+L\times(K-1)$
- 이미지가 작아지면 receptive field가 커진다.
  - 넓은 영역을 보기 위해 이미지 사이즈를 줄이기도 한다.
Hyperparameters
- Kernel Size
- Stride
- Pooling function

Downsample을 위한 방법

Information size를 축소하는 방법
Convolution과 달리 filter 값, parameter가 없다.
Kernel size, stride에 따라 각 영역마다 최댓값만을 선택하여 다음 convolution에 전달하는 연산
cf) Average Pooling
Classic architecture: [Conv, ReLU, Pool] N회, flatten, [FC, ReLU] N회, FC
- Convolution이 계속되면 공간적 크기는 작아지고 채널은 증가하여 전체적인 볼륨은 보존된다.

입력 이미지 형태가 다르면 gradient가 매우 다르므로 input data의 range 규제가 필요하다.

x:N\times D

Fully-Connected	Convolutional
$x:N\times D$	$X:N\times C\times H\times W$
$\mu, \sigma: 1\times D$	$\mu, \sigma: 1\times C\times1\times1$
$\gamma, \beta: 1\times D$	$\gamma, \beta: 1\times C\times1\times1$
$y=\gamma(x-\mu)/\sigma+\beta$	$y=\gamma(x-\mu)/\sigma+\beta$

Fully-Connected	Convolutional
$X:N\times C\times H\times W$	$X:N\times C\times H\times W$
$\mu, \sigma: 1\times C\times1\times1$	$\mu, \sigma: N\times C\times1\times1$
$\gamma, \beta: 1\times C\times1\times1$	$\gamma, \beta: 1\times C\times1\times1$
$y=\gamma(x-\mu)/\sigma+\beta$	$y=\gamma(x-\mu)/\sigma+\beta$