[CV] ImageNet Classification with Convolutional Neural Networks(AlexNet) review

๊ฐ•๋™์—ฐยท2022๋…„ 1์›” 3์ผ
0

[Paper review]

๋ชฉ๋ก ๋ณด๊ธฐ
1/17
post-thumbnail
post-custom-banner

๐Ÿ˜Ž ์˜ค๋Š˜์€ CNN์˜ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ค‘์— ๊ธฐ๋ณธ์ธ AlexNet ๋…ผ๋ฌธ๋ฆฌ๋ทฐ๋ฅผ ์ง„ํ–‰ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ฒซ ๋ฆฌ๋ทฐ์ด๊ธฐ์— ๋งŽ์ด ๋ถ€์กฑํ•˜์ง€๋งŒ, AlexNet์€ ์–ด๋ ต์ง€ ์•Š์€ ๋…ผ๋ฌธ์ด์—ˆ๊ธฐ์— ์ฝ์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

๋…ผ๋ฌธ ๋งํฌ: ImageNet Classification with Convolutional Neural Networks

AlexNet


  • ILSVRC(ImageNet Large-Scale Visual Recognition Challenge)์˜ 2012๋…„ ๋Œ€ํšŒ์—์„œ AlexNet์ด Top 5 test error ๊ธฐ์ค€ 15.4%๋ฅผ ๊ธฐ๋กํ•ด 1๋“ฑ์„ ๊ธฐ๋กํ–ˆ์Šต๋‹ˆ๋‹ค.
  • 5๊ฐœ์˜ five convolutional layers ๋ฐ max-pooling layer ์‚ฌ์šฉ, 3๊ฐœ์˜ fully-connected layers with a final 1000-way softmax ์‚ฌ์šฉ๋˜์—ˆ๋‹ค.
  • Contributions
    • ReLU
    • Local Response Normalization
    • Ovarlapping pooling
    • Data Augmentation/Drop out

Introduction


์ด ๊ธ€ ์ฒ˜์Œ์—๋Š” ๊ณผ๊ฑฐ์™€ ํ˜„์žฌ์˜ ์ฐจ์ด์— ๋Œ€ํ•ด์„œ ๋งํ•ด์ฃผ๊ณ  ์žˆ๋‹ค. ๊ณผ๊ฑฐ๋ณด๋‹ค ํ˜„์žฌ ๋งŽ๊ณ  ์งˆ ์ข‹์€ ๋ฐ์ดํ„ฐ ์…‹์„ ์ˆ˜์ง‘ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ๋งํ•˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฐ ์ƒํ™ฉ์—์„œ CNN์€ ์ ์€ ์—ฐ๊ฒฐ๊ณผ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ํ†ตํ•ด ์ด๋ก ์ ์œผ๋กœ ์ข‹์€ ์„ฑ๋Šฅ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ณผ๊ฑฐ์—๋Š” ๊ฐ’ ๋น„์‹ผ ๋น„์šฉ์œผ๋กœ ์ธํ•ด ์‚ฌ์šฉํ•˜์ง€ ๋ชปํ–ˆ๋‹ค๋ฉด, ํ˜„์žฌ๋Š” ์šด์ด ์ข‹๊ฒŒ๋„ GPU์˜ ๋ฐœ๋‹ฌ๋กœ ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค. ์ด ํ›„์—๋Š” ๊ฐ„๋‹จํ•œ๊ฒŒ CNN ๊ตฌ์กฐ์™€ ์–ด๋–ค ๋ฐฉ๋ฒ•์œผ๋กœ ๊ณผ์ ํ•ฉ์„ ์ œ์–ดํ–ˆ๋Š”์ง€ ๊ฐ„๋žตํ•˜๊ฒŒ ๋‚˜์™€์žˆ๋‹ค.

The Dataset


ImageNet ๋ฐ์ดํ„ฐ ์…‹์€ ์•ฝ 1500๋งŒ๊ฐœ์˜ ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€์™€ ์•ฝ 22,000๊ฐœ์˜ ๋ฒ”์ฃผ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ILSVRC ์—์„œ๋Š” ImageNet์˜ subset์„ ์‚ฌ์šฉํ•˜๋ฉฐ ๋Œ€๋žต 120๋งŒ๊ฐœ์˜ training ์ด๋ฏธ์ง€์™€ 50,000๊ฐœ์˜ validation ์ด๋ฏธ์ง€, 150,000๊ฐœ์˜ testing ์ด๋ฏธ์ง€๋กœ ๊ตฌ์„ฑ๋˜์–ด์žˆ์Šต๋‹ˆ๋‹ค.
์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๋Š” 256 X 256 ๊ณ ์ •ํ•˜์˜€๊ณ , resize ๋ฐฉ๋ฒ•์€ ๋„“์ด์™€ ๋†’์ด ์ค‘ ๋” ์งง์€ ๋ถ€๋ถ„์„ 256์œผ๋กœ ๊ณ ์ •์‹œํ‚ค๊ณ , ์ค‘์•™์—์„œ crop ํ–ˆ๋‹ค. ๊ฐ ์ด๋ฏธ์ง€์˜ pixel์— traing set์˜ ํ‰๊ท ์„ ๋นผ์„œ normalize ํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

The Architecture


  • ReLU
    : ReLu๋Š” ํ™œ์„ฑ ํ•จ์ˆ˜์ด๋ฉฐ "f(x) = max(0,x)" ํ•จ์ˆ˜์ด๋‹ค. ์œ„ ๋…ผ๋ฌธ์—์„œ ReLU ๋ฐฉ๋ฒ•์ด tanh ํ™œ์„ฑํ•จ์ˆ˜๋ณด๋‹ค ์•ฝ 6๋ฐฐ ๋น ๋ฅด๋‹ค๊ณ  ๋‚˜์™€์žˆ๋‹ค.
  • Training on Multiple GPUs
    : 2๊ฐœ์˜ GPU๋ฅผ ๋ณ‘๋ ฌํ™” ์‹œ์ผœ ํ•™์Šต์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•œ ๊ฐ€์ง€ ํŠธ๋ฆญ์œผ๋กœ ๋Š” layer 2์—์„œ layer 3๋กœ ๊ฐ€๋Š” ํ•™์Šต ๊ณผ์ •์—์„œ๋Š” 2๊ฐœ์˜ GPU๊ฐ€ ์„œ๋กœ ์†Œํ†ตํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
  • Local Response Normalization
    :ReLU๋Š” ์–‘์ˆ˜๊ฐ’์„ ๋ฐ›์œผ๋ฉด ๊ทธ ๊ฐ’์„ ๊ทธ๋Œ€๋กœ neuron์— ์ „๋‹ฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋„ˆ๋ฌด ํฐ ๊ฐ’์ด ์ „๋‹ฌ๋˜์–ด ์ฃผ๋ณ€์˜ ๋‚ฎ์€ ๊ฐ’์ด neuron์— ์ „๋‹ฌ๋˜๋Š” ๊ฒƒ์„ ๋ง‰์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์„ ์˜ˆ๋ฐฉํ•˜๊ธฐ ์œ„ํ•œ normalization์ด LRN ์ž…๋‹ˆ๋‹ค.
  • Overlapping Pooling
    : ์ผ๋ฐ˜์ ์œผ๋กœ pooling layer๋Š” overlapํ•˜์ง€ ์•Š์ง€๋งŒ AlexNet์€ overlap์„ ํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. kernel size๋Š” 3, stride๋Š” 2๋ฅผ ์ด์šฉํ•ด์„œ overlap์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

โœ” ์œ„์˜ ๋ฐฉ๋ฒ•์˜ ๋Œ€ํ•œ ์„ค๋ช…์€ ์ƒ๋žตํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ฐ„๋‹จํ•˜๊ฒŒ ์ด์•ผ๊ธฐํ•˜๋ฉด ์œ„์˜ ๋ฐฉ๋ฒ•๋ก ๋“ค์ด test error rate ๋ฐ ๋น„์šฉ์„ ์ค„์—ฌ์ฃผ๋Š” ๋ฐฉ๋ฒ•๋ก ๋“ค์ž…๋‹ˆ๋‹ค.

  • Architecture
  • ์œ„์˜ ์•„ํ‚คํ…์ฒ˜๋Š” ์ผ๋ถ€๊ฐ€ max-pooling layer๊ฐ€ ์ ์šฉ๋œ 5๊ฐœ์˜ convolutional layer์™€ 3๊ฐœ์˜ fully-connected layer๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ์Šต๋‹ˆ๋‹ค.
  • [Input layer - Conv1 - MaxPool1 - Norm1 - Conv2 - MaxPool2 - Norm2 - Conv3 - Conv4 - Conv5 - Maxpool3 - FC1- FC2 - Output layer]์˜ ์ˆœ์„œ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์•„๋ž˜๋Š” AlexNet ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜ ๊ณ„์‚ฐ๊ณผ์ •์ž…๋‹ˆ๋‹ค. ์ฐธ๊ณ  ์ž๋ฃŒ
  • ์•„๋ž˜ ํ‘œ๋Š” Tensorflow๋กœ ์š”์•ฝ๋œ ํ‘œ ์ž…๋‹ˆ๋‹ค.

Reducubg Overfitting

  • ์œ„์˜ ๋„คํŠธ์›Œํฌ ์•„ํ‚คํ…์ณ๋Š” 6์ฒœ๋งŒ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€๋ฅผ ILSVRC์˜ 1000๊ฐœ classes๋กœ ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ƒ๋‹นํ•œ overfitting ์—†์ด ์ˆ˜ ๋งŽ์€ parameters๋ฅผ ํ•™์Šต ์‹œํ‚ค๋Š” ๊ฒƒ์€ ์–ด๋ ต๋‹ค๊ณ  ๋งํ•ฉ๋‹ˆ๋‹ค.

    Data Augmentation

  • ๊ฐ„๋‹จํ•˜๊ฒŒ Data Augmentation์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. 2๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ Data Augmentation์œผ๋กœ ์ง„ํ–‰ํ–ˆ์œผ๋ฉฐ, 2๊ฐ€์ง€ ๋ฐฉ๋ฒ• ๋ชจ๋‘ little computation์œผ๋กœ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ฒซ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” extracting five 224 X 224 patches(the four corner and one center patch) & horizontal reflections ๋ฐฉ๋ฒ•์œผ๋กœ, ์œ„์˜ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ์˜ ์•ฝ 2048๋ฐฐ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๋‘๋ฒˆ์งธ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” PCA๋ฅผ ํ†ตํ•ด RGB pixel ๊ฐ’์˜ ๋ณ€ํ™”๋ฅผ ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. PCA๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ RGB ๊ฐ ์ƒ‰์ƒ์— ๋Œ€ํ•œ eigenvalue๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค. eigenvalue์™€ ํ‰๊ท  0, ๋ถ„์‚ฐ 0.1์ธ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์—์„œ ์ถ”์ถœํ•œ ๋žœ๋ค ๋ณ€์ˆ˜๋ฅผ ๊ณฑํ•ด์„œ RGB ๊ฐ’์— ๋”ํ•ด์ค๋‹ˆ๋‹ค.

    * ์œ„์˜ ๋ฐฉ๋ฒ•๋“ค๋กœ top-1 ์—๋Ÿฌ์˜ 1%๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ์—ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

    Dropout

    Test์—์„œ ๋ชจ๋“  ๋‰ด๋Ÿฐ์„ ์‚ฌ์šฉํ–ˆ์ง€๋งŒ, ๊ฒฐ๊ณผ ๋„์ถœํ•  ๋•Œ 0.5๋ฅผ ๊ณฑํ•ด์ฃผ์—ˆ๋‹ค. ์ฒ˜์Œ ๋‘ ๊ฐœ์˜ Fc์—์„œ Dropout์„ ๋„์ถœํ–ˆ๊ณ , dropout์„ ํ†ตํ•ด overfitting์„ ํ”ผํ•  ์ˆ˜ ์žˆ์—ˆ๊ณ , ์ˆ˜๋ ดํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ๋ฐ˜๋ณต์ˆ˜๋Š” ๋‘ ๋ฐฐ ์ฆ๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Details of learning

Train ๋ชจ๋ธ์—๋Š” SGD(stochastic gradient descent) ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ, batch size = 128, momentum = 0.9 and weight decay = 0.0005 ๋ฅผ ์ ์šฉ์‹œ์ผฐ๋‹ค.

  • v = momentum, e = learning rate ์ด๋ฉฐ, weight ์ดˆ๊ธฐํ™”๋Š” ํ‰๊ท ์ด 0 ๋ถ„์‚ฐ์ด 0.01์ธ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. bias๋Š” ๋‘๋ฒˆ์งธ, ๋„ค๋ฒˆ์งธ, ๋‹ค์„ฏ๋ฒˆ์งธ conv layers ์™€ Fc layer์—์„œ๋Š” 1๋กœ ๋‚˜๋จธ์ง€๋Š” 0์œผ๋กœ ๊ณ„์‚ฐํ–ˆ์Šต๋‹ˆ๋‹ค.
  • learing rate = 0.01๋กœ ์ดˆ๊ธฐํ™” ํ–ˆ๊ณ , ๋ชจ๋ธ ํ•™์Šต ์ค‘ ์ด 3๋ฒˆ์˜ ๊ฐ์†Œ๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

Results

  • ๊ฒฐ๊ณผ์ ์œผ๋กœ top-5 ํ…Œ์ŠคํŠธ ์…‹์—์„œ 15.3% ๋กœ competition์—์„œ ๊ฐ€์žฅ ์šฐ์ˆ˜ํ•œ ์„ฑ๊ณผ๋ฅผ ๊ฑฐ๋‘˜ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
profile
Maybe I will be an AI Engineer?
post-custom-banner

0๊ฐœ์˜ ๋Œ“๊ธ€