AlexNet

이슬비·2022년 7월 8일

DeepLearning

목록 보기

1/18

0. Abstract

AlexNet의 신경망은 6천만 개의 params와 65만개의 뉴런, max-pooling layer을 포함한 5개의 convolutional layer, 그리고 softmax로 구성된 3개의 fully-connected layer로 구성됨.
훈련 속도를 위해 비포화 뉴런과 GPU를 사용.
overfitting를 피하기 위해 dropout 사용.

1. Introduction

모델의 성능을 위해 dataset을 키우고, 강력한 모델과 더 나은 기술을 사용.
실제 dataset의 변동성이 커짐에 따라 더 큰 훈련 세트가 필요.
CNN은 depth와 breadth 조정이 가능하고, 강력하고 정확한 가정 가능.
CNN은 큰 규모의 고해상도 이미지에 비용이 많이 듦.
이에 2D convolution 가능한 GPU로 훈련.

Our final network contains five convolutional and three fully-connected layers, and this depth seems to be important.

2. Dataset

ImageNet은 22000개의 class를 갖는 1500개의 고해상도 이미지.
ILSVRC는 각 1000개의 class에 해당하는 1000개의 ImageNet의 subset.

전처리 과정은 다음과 같다.
1) 짧은 변의 길이가 $256$ 이 되도록 rescale.
2) 중심을 기준으로 $256*256$ crop.
3) 각 pixel에서 RGB 평균값을 뺌.

3. The Architecture

8 learned layers
= 5 convolutional layers + 3 fully-connected layers.

3.1 ReLU Nonlinearity

The standard way : $f(x)=tanh(x)$ or $f(x)=(1+e^{-x})^{-1}$
Gradient Descent의 훈련 시간 관점에서, 포화 비선형 함수는 비포화 비선형 함수보다 느림.

포화 비선형 함수 : $tanh$ or $sigmoid$
비포화 비선형 함수 : ReLU $f(x)=max(0,x)$

Reference
AlexNet 논문
[논문] AlexNet

이슬비

다음 포스트