AlexNet

dongjun·2023년 9월 11일

AlexNet (2012) Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton
https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

Convolution filter와 Pooling layer 에 대한 여러가지 실험이 수행되고 그에 대한 결과들이 반영되었다. 모델 구조에 대한 개략적인 그림만 봐서는 바로 구현할 수 없고, 논문의 디테일들을 잘 살펴보어야 한다.
그 외에도 ReLU, Dropout, Data augmentation, Feature map fusion과 Local Response Normalization 등 다양한 기법이 적용되었다.

GPU memory limit로 인하여, n개 채널의 feature map을 2개로 나누어 사용하는 구조를 가진다.

Input data: 224x224x3 (컬러 이미지 = 3 channel)
layer 1: 2D Convolution layer (output feature maps: 55x55x96)
layer 2: 2D Pooling layer (output feature maps: 27x27x96)
layer 3: 2D Convolution layer (output feature maps: 27x27x256)
layer 4: 2D Pooling layer (output feature maps: 13x13x256)
layer 5: 2D Convolution layer (output feature maps: 13x13x384)
layer 6: 2D Convolution layer (output feature maps: 13x13x384)
layer 7: 2D Convolution layer (output feature maps: 13x13x256)
layer 8: Fully connected layer (output feature maps: 4096)
layer 9: Fully connected layer (output feature maps: 4096)
layer 10: Fully connected layer (output feature maps: 1000)

가장 처음에 사용하는 Convolution filter는 11x11x3 사이즈에 stride 4이다.
conv1 = Conv2d(3,96,11,11) stride=4
ReLU(conv1)

MaxPooling의 경우, 3x3 크기에 stride 2를 사용하였는데, 기존에는 non-overlapping pooling으로써 풀링 결과의 영역들이 겹치지 않는 형태였지만, AlexNet의 저자들은 풀링 레이어의 크기를 stride보다 크게 설정해서 overlapping pooling을 사용하였다.

pool2 = MaxPooling2d(3) stride=2

2번째 convolution 연산 이후에 feature maps의 크기를 유지하기 위하여 padding을 사용하였다.

conv3 = Conv2d(48,256,5,5) stride=1, padding=2
ReLU(conv3)
pool4 = MaxPooling2d(3) stride=2

conv5 = Conv2d(256,384,3,3) stride=1, padding=1
ReLU(conv5)
conv6 = Conv2d(384,384,3,3) stride=1, padding=1
ReLU(conv6)
conv7 = Conv2d(384,256,3,3) stride=1, padding=1
ReLU(conv7)
pool8 = MaxPooling2d(3) stride=2

Dropout(0.5)
fc9 = Linear(6x6x256, 4096)
ReLU(fc9)

Dropout(0.5)
fc10 = Linear(4096, 4096)
ReLU(fc10)
output = Linear(4096, 1000)
softmax(output)

conv1 = Conv2d(3,96,11,11) stride=4
ReLU(conv1)

pool2 = MaxPooling2d(3) stride=2

conv3 = Conv2d(48,256,5,5) stride=1, padding=2
ReLU(conv3)

pool4 = MaxPooling2d(3) stride=2

conv5 = Conv2d(256,384,3,3) stride=1, padding=1
ReLU(conv5)
conv6 = Conv2d(384,384,3,3) stride=1, padding=1
ReLU(conv6)
conv7 = Conv2d(384,256,3,3) stride=1, padding=1
ReLU(conv7)

pool8 = MaxPooling2d(3) stride=2

flatten()

Dropout(0.5)
fc9 = Linear(6x6x256, 4096)
ReLU(fc9)

Dropout(0.5)
fc10 = Linear(4096, 4096)
ReLU(fc10)

output = Linear(4096, 1000)
softmax(output)

dongjun

AI + X!

이전 포스트

LeNet-5

다음 포스트

AlexNet

LeNet-5

ZFNet

0개의 댓글

관련 채용 정보