๐Ÿ“ธ Image Classification(์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜)(1)-LeNet,AlexNet,VGG๋ถ€ํ„ฐ Degradation๊นŒ์ง€ | ๋‚ด๊ฐ€๋ณด๋ ค๊ณ ์ •๋ฆฌํ•œAI๐Ÿง

HipJaengYiCatยท2023๋…„ 4์›” 2์ผ
0

DeepLearning

๋ชฉ๋ก ๋ณด๊ธฐ
10/16
post-thumbnail

preview

์šฐ๋ฆฐ ์•ž์„  CNN๊ณผ CV์„ ํ†ตํ•ด ์ธ๊ฐ„์˜ ์‹œ๊ฐ์  ์ธ์ง€์„ ๋ชจ๋ฐฉํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์— ๋Œ€ํ•ด ์•Œ์•„๋ดค๋‹ค์ด๋ฒˆ ์žฅ๋ถ€ํ„ฐ๋Š” Computer Vision์˜ tasks ์ค‘ image classification์˜ ๋ฐœ์ „ ๊ณผ์ •์„ ์‚ดํŽด๋ณผ๊ฒƒ์ด๋‹ค. LeNet-5, AlexNet๋ถ€ํ„ฐ VGG๊นŒ์ง€ ๋ชจ๋ธ๋“ค์„ ์•Œ์•„๋ณด๋ฉฐ ๊นŠ์€ layer๋ฅผ ๊ฐ€์งˆ ๋•Œ ๋ฐœ์ƒํ•˜๋Š” Degradation problem์„ ์‚ดํŽด๋ณด๊ฒ ๋‹ค.

Image Classification๋ž€?

Image Classification๋ž€
์ „์ฒด ์ด๋ฏธ์ง€์— lable ๋˜๋Š” class๋ฅผ ํ• ๋‹น ํ•˜๋Š” ์ž‘์—…์ด๋‹ค. ์ด๋ฏธ์ง€๋Š” ๊ฐ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ํ•˜๋‚˜์˜ class ๊ฐ€์งˆ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋˜๊ณ , ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•ด ์ด๋ฏธ์ง€๊ฐ€ ์†ํ•œ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ์˜ˆ์ธก์„ ๋ฐ˜ํ™˜๋‹จ๋‹ค.
์ฆ‰. ์ด๋ฏธ์ง€์™€ class๋ฅผ ๋งคํ•‘ํ•˜๋Š” ์ž‘์—…์œผ๋กœ, ํ•ด๋‹น ์ด๋ฏธ์ง€์˜ category level์„ ์—ฐ๊ฒฐํ•˜๋Š” ์ง€๋„๋ฅผ ๊ทธ๋ฆฌ๋Š” ๊ฒƒ์ด๋‹ค.

Classification์˜ ๋ฐœ์ „ ๊ณผ์ •

๐Ÿ†€ ์šฐ๋ฆฌ๊ฐ€ ์ด์„ธ์ƒ์˜ ๋ชจ๋“  ์ •๋ณด๋ฅผ ๊ธฐ์–ตํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด ๋ชจ๋“  ์ด๋ฏธ์ง€๋ฅผ ๋ถ„๋ฅ˜ ํ•  ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ? ๊ทธ๋ ‡๋‹ค๋ฉด ์–ด๋–ค ๋ฐฉ์‹์œผ๋กœ ๋ชจ๋“  ์ด๋ฏธ์ง€๋ฅผ ๋ถ„๋ฅ˜ํ•ด์•ผ ํ• ๊นŒ?
๐Ÿ…ฐ ๋ชจ๋“  ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ K-NN(k nearest neighbors)๋ฐฉ์‹์„ ์ ์šฉํ•˜๋ฉด ๋ถ„๋ฅ˜ํ•  ์ด๋ฏธ์ง€์˜ ๊ทผ์ฒ˜ ์ด์›ƒ ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ๊ณ  ๊ทธ ์ด์›ƒ๋ฐ์ดํ„ฐ์˜ label data๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ?
์ฆ‰, ๊ฒ€์ƒ‰์œผ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค.
๐Ÿ’โ€โ™€๏ธ k nearest neighbors : ์ฟผ๋ฆฌ ๋ฐ์ดํ„ฐ์˜ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ด์›ƒ ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ๊ณ  ์ด์›ƒ ๋ฐ์ดํ„ฐ์˜ ๋ผ๋ฒจ ๋ฐ์ดํ„ฐ๋ฅผ ์ฐธ์กฐํ•˜์—ฌ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ

๐Ÿ†€ ํ•˜์ง€๋งŒ ์•„๋ฌด๋ฆฌ ์ปดํ“จํ„ฐ ์„ฑ๋Šฅ์ด ์ข‹์•„์ง€๋”๋ผ๋„ ์„ธ์ƒ์˜ ๋ชจ๋“  ์ด๋ฏธ์ง€๋ฅผ ๊ฐ€์ง€๊ณ  ๋ชจ๋“  ์ด๋ฏธ์ง€ ๊ฐ„ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•˜๋ฉด Time complexity(๊ณ„์‚ฐ๋ณต์žก๋„)์™€ Memory complexity(๋ฉ”๋ชจ๋ฆฌ ๋ณต์žก๋„)๊ฐ€ ๋ฌดํ•œ๋Œ€์— ๊ฐ€๊น์ง€ ์•Š์„๊นŒ?
๐Ÿ…ฐ ๊ทธ๋ ‡๋‹ค๋ฉดsingle layer neural networks์ธ perceotron ๋ชจ๋ธ์„ ํ†ตํ•ด ์ด๋ฏธ์ง€๋ฅผ ์••์ถ•ํ•ด๋ณด์ž

๐Ÿคฆโ€โ™€๏ธ ํ•˜์ง€๋งŒ layer๊ฐ€ ํ•˜๋‚˜๋ฐ–์— ์—†๋Š” ๋‹จ์ˆœํ•œ ๋ชจ๋ธ์ด๋ผ ๋ณต์žกํ•œ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ฅผ ํ’€๊ธฐ์—” ๋„ˆ๋ฌด ๋‹จ์ˆœํ•˜๋‹ค.
๐Ÿคฆโ€โ™€๏ธ ๋˜ํ•œ single fully connected layer network์ด๊ธฐ ๋•Œ๋ฌธ์— ํ•˜๋‚˜์˜ ํŠน์ง•(์ •๋‹ต์— ํ•ด๋‹นํ•˜๋Š” ํŠน์ง•)์„ ๋ฝ‘๊ธฐ์œ„ํ•ด ๋ชจ๋“  ํŠน์ง•์„ ๋ฝ‘๊ฒŒ ๋˜๋‹ˆ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ํ‰๊ท  ์‹œํ‚จ ๊ฒƒ๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๋งŒ ๋‚˜์˜ค๋ฉด์„œ ๊ณ„์‚ฐํ•ด์•ผ ๋˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋„ˆ๋ฌด ๋งŽ๋‹ค.
๐Ÿคฆโ€โ™€๏ธ ํ•™์Šต ๋ฐ์ดํ„ฐ์™€ ๋‹ฌ๋ฆฌ ๋Œ€์ƒ์ด ์ž˜๋ฆฐ ์‚ฌ์ง„์ด ๋„ฃ์–ด์ฃผ๋ฉด ์ข‹์ง€ ๋ชปํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋ณด๋‚ด๋Š” ๋ฌธ์ œ์  ์žˆ๋‹ค.

๐Ÿ†€ ๊ทธ๋ ‡๋‹ค๋ฉด ์ด๋ฏธ์ง€์˜ ๊ณต๊ฐ„์  ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๋ฉด์„œ ํŒŒ๋ผ๋ฏธํ„ฐ๋„ ์ ๊ฒŒ ๊ณ„์‚ฐํ•ด์•ผ๋˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์„๊นŒ?
๐Ÿ…ฐ convolution์„ ํ†ตํ•ด์„œ ๊ณต๊ฐ„์  ํŠน์„ฑ์„ ๋ฐ˜์˜ํ•ด ๊ตญ๋ถ€์ ์ธ ์˜์—ญ๋งŒ ์ถ”์ถœํ•˜๋ฉด ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ํŠน์ง• ์ถ”์ถœ์ด ๊ฐ€๋Šฅํ•˜๋‹ค!

  • ํ•„ํ„ฐ๊ฐ€ ๋Œ์•„๋‹ค๋‹ˆ๋ฉด์„œ parameter sharing(ํŒŒ๋ผ๋ฏธํ„ฐ ๊ณต์œ )์„ ํ•˜๋ฉด์„œ local feature learning์„ ํ•˜๋ฏ€๋กœ ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๋Œ€๋น„ ํšจ๊ณผ์ ์œผ๋กœ ํŠน์ง•์„ ์ถ”์ถœํ•œ๋‹ค.
  • ๋”ฐ๋ผ์„œ convolution neural networks๋Š” locally connected neural network์ด๋‹ค.

๐Ÿ’โ€โ™€๏ธ CNN์€ classification ๋ฌธ์ œ ๋ฟ๋งŒ์•„๋‹ˆ๋ผ ๋‹ค์–‘ํ•œ cv๋ฌธ์ œ์—์„œ backbone์ด ๋œ๋‹ค. CNN์œผ๋กœ ์ถ”์ถœํ•œ ํŠน์ง•๋งต์„ ์ด์šฉํ•ด image-level classification, classification+regression, pixel level classification ๋“ฑ์œผ๋กœ ๋ฐœ์ „๋˜์—ˆ๋‹ค.

Image Classification์˜ models

  • LeNet5 - 1998
  • AlexEet - 2012
  • VGGNet - 2014
  • GoogLNet - 2015
  • ResNEt - 2016
  • DenseNet - 2017
  • SENet - 2018
  • EfficientNet - 2019

LeNet5

LeNet-5
1998๋…„ Yann LeCun์˜ ๋…ผ๋ฌธ 'Gradient-Based Learning Applied to Document Recognition' ์— ๋‹ด๊ฒจ์žˆ๋Š” CNN ์‹ ๊ฒฝ๋ง์˜ ๊ตฌ์กฐ๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค
CNN์€ ์†์œผ๋กœ ์ ํžŒ ์šฐํŽธ ๋ฒˆํ˜ธ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์•ˆ๋˜์—ˆ๋‹ค.

๐Ÿ’โ€โ™€๏ธ Conv(C1) - Subsampling(S2) - Conv(C3) - Subsampling(S4) - Conv(C5) - FC - FC , ์•ฝ 6๋งŒ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๊ตฌ์„ฑ๋˜์—ˆ๋‹ค.

convolution์ด ์ œ์•ˆ๋œ ๋ฐฐ๊ฒฝ

  • ๊ธฐ์กด์˜ ํŒจํ„ด์ธ์‹์—์„œ๋Š” hand-designed feature extractor๋กœ ํŠน์ง•์„ ์ถ”์ถœํ•ด ์ œํ•œ๋œ ํ•™์Šต์ด ์ด๋ฃจ์–ด์ง€๋ฏ€๋กœ feature extractor ๊ทธ ์ž์ฒด์—์„œ ํ•™์Šต์ด ์ด๋ฃจ์ ธ์•ผํ•œ๋‹ค
  • ๊ธฐ์กด ํŒจํ„ด์ธ์‹์€ fully-connected multi-layer networks๋ฅผ ๋ถ„๋ฅ˜๊ธฐ๋กœ ์‚ฌ์šฉํ•ด ๋งŽ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ณ„์‚ฐํ•ด์•ผ๋˜๋ฏ€๋กœ ๊ทธ์— ๋”ฐ๋ผ ๋” ๋งŽ์€ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์ด ํ•„์š”ํ•˜๊ณ , ๋ฉ”๋ชจ๋ฆฌ ์ €์žฅ๊ณต๊ฐ„์ด ๋„ˆ๋ฌด ๋งŽ์ด ํ•„์š”ํ•˜๋‹ค
  • ์ด๋ฏธ์ง€๋Š” 2D๊ตฌ์กฐ์ด๋ฏ€๋กœ ์ธ์ ‘ํ•œ pixel๋“ค์€ ๊ณต๊ฐ„์ ์œผ๋กœ ๋งค์šฐ ํฐ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”๋ฐ fully-connected multi-layer๋Š” ์ด๋Ÿฐ ๋ณ€์ˆ˜๋“ค์„ ๋‹จ์ˆœ ๋ฐฐ์—ด ํ•˜๋ฏ€๋กœ ๊ณต๊ฐ„์  ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์ง€ ๋ชปํ•œ๋‹ค

LeNet5์˜ ํŠน์ง•

  1. hidden layer์˜ receptive field๋ฅผ local๋กœ ์ œ์•ˆํ•˜๋ฉด์„œ local featrue๋ฅผ ์ถ”์ถœํ•จ
  2. shared weight
  3. sub-sampling(=pooling)

receptive field

๐Ÿ’โ€โ™€๏ธ CNN์€ ์ด๋Ÿฐ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ filter(kernel)์„ ํ†ตํ•ด layer์˜ receptive field๋ฅผ local๋กœ ์ œํ•œํ•จ์œผ๋กœ์จ local feature๋ฅผ ์ถ”์ถœํ–ˆ๋‹ค.
๐Ÿ’โ€โ™€๏ธ receptive field๋ž€ convolution์„ ๊ฑฐ์นœ output tensor์˜ ํ•˜๋‚˜์˜ ๊ฒฐ๊ณผ๊ฐ’์ด ์›๋ณธ ์ด๋ฏธ์ง€์—์„œ ๋‹ด๋‹นํ•˜๋Š” ๋ฒ”์œ„๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.
๐Ÿ’โ€โ™€๏ธ ๋”ฐ๋ผ์„œ ์ธต์ด ๊นŠ์–ด์งˆ์ˆ˜๋ก ๋” ๋„’์€ receptive field์„ ๊ฐ”๋Š” ๊ฒƒ์ด๋‹ค.

receptive field ๊ณ„์‚ฐํ•˜๊ธฐ

K : kernel(filter) size
L : layers ์ˆ˜
receptive field size = L x (K-1) + 1

๐Ÿ†€ ouput size = (1,1), kernel size = (3,3), layers 2๊ฐœ ์ผ๋•Œ receptive field size?(stride 1, input image(5,5))
๐Ÿ…ฐ receptive field size = (2 x 2 + 1, 2 x 2 + 1) = (5,5)

shared weight

๐Ÿ’โ€โ™€๏ธ ๋™์ผํ•œ weights์™€ bias๋ฅผ ๊ณต์œ ํ•˜๋Š” kernel๋“ค์„ ํ†ตํ•ด ์ž…๋ ฅ์—์„œ ๋ชจ๋“  ์œ„์น˜์—์„œ ๋™์ผํ•œ ํŠน์ง•์„ ์ถ”์ถœํ•œ๋‹ค. ์ฆ‰, forward pass์—์„œ๋Š” kxk kernel๊ณผ stride์— ๋งž์ถฐ feature map(input data)์„ ๋Œ์•„๋‹ค๋‹ˆ๋ฉฐ ๊ณ„์‚ฐํ•˜์ง€๋งŒ back propagation์—์„œ๋Š” ํ•˜๋‚˜์˜ Weight์ง‘ํ•ฉ๊ณผ bias๋งŒ ํ•™์Šตํ•œ๋‹ค.

pooling

๐Ÿ’โ€โ™€๏ธ ํ•œ๋ฒˆ conv ํ†ตํ•ด feature map์ด ์ƒ์„ฑ๋˜๋ฉด ์œ„์น˜ ์ •๋ณด์˜ ์ค‘์š”์„ฑ์ด ๋–จ์–ด์ง„๋‹ค.
๊ฐ ํŠน์ง•์˜ ์œ„์น˜ ์ •๋ณด๋Š” ํŒจํ„ด์„ ์‹๋ณ„ํ•˜๋Š” ๊ฒƒ๊ณผ๋Š” ๋ฌด๊ด€ํ•˜๊ณ , ์ž…๋ ฅ๊ฐ’์— ๋”ฐ๋ผ ํŠน์ง•์ด ๋‚˜ํƒ€๋‚˜๋Š” ์œ„์น˜๊ฐ€ ๋‹ค๋ฅผ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์•„ ์œ„์น˜ ์ •๋ณด๋Š” ์ž ์žฌ์ ์œผ๋กœ ์œ ํ•ดํ•˜๋‹ค.

๋”ฐ๋ผ์„œ feature map์—์„œ ํŠน์ง•๋“ค์˜ ์œ„์น˜์— ๋Œ€ํ•œ ์ •ํ™•๋„๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๊ธฐ ์œ„ํ•ด pooling(sun-sampling)์„ ํ†ตํ•ด feature map์˜ ํ•ด์ƒ๋„๋ฅผ ๊ฐ์†Œ์‹œ์ผฐ๋‹ค.
์ด๋•Œ LeNet5์€ average pooling์„ ์ˆ˜ํ–‰ํ•ด ํ•ด์ƒ๋„๋Š” ๊ฐ์†Œํ•˜๊ณ  distortion & shift์— ๋Œ€ํ•œ ๋ฏผ๊ฐ๋„๋ฅผ ๊ฐ์†Œ์‹œ์ผฐ๋‹ค.

๋˜ํ•œ ์œ„์น˜ ์ •๋ณด์— ๋Œ€ํ•œ ์†์‹ค์€ feature map size๊ฐ€ ์ž‘์•„์งˆ์ˆ˜๋ก ๋” ๋งŽ์€ filter๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ feature๋ฅผ ์ถ”์ถœํ•˜์—ฌ ์ƒํ˜ธ๋ณด์™„ํ•˜๋„๋ก ํ•˜์˜€๋‹ค.

AlexNet - 2012

AlexNet๋ž€
Alex Khrizevsky๊ฐ€ ๋ฐœํ‘œํ•œ 'ImageNet Classification with Deep Convolutional Neural Networks'์—์„œ ์†Œ๊ฐœ๋œ ๋ชจ๋ธ์ด๋‹ค.
AlexNet 2012๋…„์— ๊ฐœ์ตœ๋œ ILSVRC(ImageNet Large Scale Visual Recognition Challenge) ๋Œ€ํšŒ์˜ ์šฐ์Šน์„ ์ฐจ์ง€ํ•œ ์ปจ๋ณผ๋ฃจ์…˜ ์‹ ๊ฒฝ๋ง(CNN) ๊ตฌ์กฐ์ด๋‹ค.
์ฆ‰, CNN์˜ ๋ถ€ํฅ์— ์•„์ฃผ ํฐ ์—ญํ• ์„ ํ•œ ๊ตฌ์กฐ๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ๋‹ค

- https://bskyvision.com/421

๐Ÿ†€ LeNet-5์™€ ์ฐจ์ด์ ์€ ๋ฌด์—‡์ผ๊นŒ?
๐Ÿ…ฐ ๊ธฐ๋ณธ ๊ตฌ์กฐ๋Š” ์œ ์‚ฌํ•˜๋‚˜ ์•„๋ž˜์™€ ๊ฐ™์€ ์ฐจ์ด์ ์ด ์žˆ๋‹ค.

  1. ReLU ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ gradient vanishing ๋ฌธ์ œ ์™„ํ™”
  2. 2๊ฐœ์˜ GPU๋กœ ๋ณ‘๋ ฌ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ ๋ณ‘๋ ฌ์ ์ธ ๊ตฌ์กฐ
  3. Local Response Normalization
  4. overlapping pooling
  5. Data argumentation(1.2 millions์˜ ImageNet ํ•™์Šต) & regularization(์ •๊ทœํ™”) ๊ธฐ์ˆ ์ธ Dropout ์‚ฌ์šฉ
  6. 11x11 convolution filter ์‚ฌ์šฉ
  7. 7๊ฐœ์˜ hidden layers, 605K neurons, 60million parameters๋กœ ๋” ์ปค์ง„ ๋ชจ๋ธ

AlexNet์˜ ์ „์ฒด ๊ตฌ์กฐ

ReLUํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ ์ด์œ ?

๐Ÿ†€ ReLUํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ ์ด์œ ?
๐Ÿ…ฐ sigmoid๋‚˜ ํ•˜์ดํผ๋ณผ๋ฆญํƒ„์  ํŠธ๋Š” ์‹ ๊ฒฝ๋ง์ด ๊นŠ์–ด์งˆ ์ˆ˜๋ก ๊ธฐ์šธ๊ธฐ๊ฐ€ ์†Œ๋ฉธ๋˜๋Š” ๋ฌธ์ œ ๊ฐ€ ๋ฐœ์ƒํ•จ
sigmoid ํ•จ์ˆ˜๋Š” x๊ฐ’์ด ์ž‘์•„์ง์— ๋”ฐ๋ผ ๊ธฐ์šธ๊ธฐ๊ฐ€ ๊ฑฐ์˜ 0์œผ๋กœ ์ˆ˜๋ ดํ•˜๊ณ , ํ•˜์ดํผ๋ณผ๋ฆญํƒ„์  ํŠธ ํ•จ์ˆ˜๋„ x๊ฐ’์ด ์ปค์ง€๊ฑฐ๋‚˜ ์ž‘์•„์ง์— ๋”ฐ๋ผ ๊ธฐ์šธ๊ธฐ๊ฐ€ ํฌ๊ฒŒ ์ž‘์•„์ง€๊ธฐ ๋•Œ๋ฌธ์— gradient vanishing์ด ๋ฐœ์ƒํ•œ๋‹ค

<

๋˜ํ•œ singmoid๋‚˜ ํ•˜์ดํผ๋ณผ๋ฆญํƒ„์  ํŠธ ํ•จ์ˆ˜๋Š” ๋ฏธ๋ถ„์„ ์œ„ํ•ด ์—ฐ์‚ฐ์ด ํ•„์š”ํ•œ ๋ฐ˜๋ฉด ReLUํ•จ์ˆ˜๋Š” ๋‹จ์ˆœ ์ธ๊ณ„๊ฐ’์ด๋ฏ€๋กœ ๋ฏธ๋ถ„์ด ์‰ฝ๋‹ค.

ReLUํ•จ์ˆ˜๋Š” ์ •ํ™•๋„๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ Tanh๋ณด๋‹ค 6๋ฐฐ๊ฐ€ ๋” ๋น ๋ฅด๊ณ , gradient vanishing ๋ฌธ์ œ ์™„ํ™”๋œ๋‹ค
AlexNet ์ดํ›„๋กœ๋Š” ReLU๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์„ ํ˜ธ๋˜์—ˆ๋‹ค.

Local Response Normalization ์‚ฌ์šฉํ•œ ์ด์œ ?

๐Ÿ†€ Local Response Normalization ์‚ฌ์šฉํ•œ ์ด์œ ?
๐Ÿ…ฐ ReLUํ•จ์ˆ˜๋Š” ์ผ๋ถ€ ๊ฐ€์ค‘์น˜์˜ ์ถœ๋ ฅ๊ฐ’์ด ์ฃผ๋ณ€ ๊ฐ€์ค‘์น˜์— ๋น„ํ•ด ๋งค์šฐ ํด ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ทธ ๊ฐ’์„ ์ฃผ๋ณ€ ๊ฐ€์ค‘์น˜์™€ ๋น„์Šทํ•˜๊ฒŒ ๋งž์ถฐ์ฃผ๋Š” ์ •๊ทœํ™” ๋ฐฉ๋ฒ•์ด๋‹ค.

  • ์ฆ‰, Excited neuron์€ ์ฃผ๋ณ€์— ์žˆ๋Š” ๋‹ค๋ฅธ ๋‰ด๋Ÿฐ์— ๋น„ํ•ด ํ›จ์”ฌ ๋ฏผ๊ฐํ•˜๊ธฐ ๋•Œ๋ฌธ์— Excited neuron์˜ ์ฃผ๋ณ€ ๋‰ด๋Ÿฐ์œผ๋กœ ์ •๊ทœํ™”์‹œ์ผœ Excited neuron์„ subdue ํ•œ๋‹ค

  • LRN์„ ์‚ฌ์šฉํ•˜๋ฉด feature map์˜ ๋ช…์•”์„ ์ •๊ทœํ™” ์‹œ์ผœ์ค€๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค

  • LRN์„ ์ ์šฉํ–ˆ์„ ๋•Œ์˜ ๋ณ€ํ™”์™€ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€์ด๋‹ค

๐Ÿ’โ€โ™€๏ธ ์ดํ›„ ๋ชจ๋ธ์—์„œ๋Š” LRN์€ ์‚ฌ์šฉ๋˜์ง€ ์•Š๊ณ  batch nomalization์„ ์‚ฌ์šฉํ•œ๋‹ค

overlapping pooling

๐Ÿ†€ overlapping pooling์ด๋ž€?
๐Ÿ…ฐ CNN์—์„œ pooling์€ feature map์˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•จ์ด๊ณ , overlapping pooling์€ pooling kernel์ด ์›€์ง์ด๋Š” ๋ณดํญ์ธ stride๋ฅผ ์ปค๋„ ์‚ฌ์ด์ฆˆ๋ณด๋‹ค ์ž‘๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

๐Ÿ’โ€โ™€๏ธ LeNet-5์—์„œ๋Š” non-overlapping average pooling์ด ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ˜๋ฉด์— AlexNet์€ overlapping maxpooling์„ ์ด์šฉํ•˜์˜€๋‹ค.

๐Ÿ†€ overlapping pooling์„ ์™œ ์‚ฌ์šฉํ–ˆ์„๊นŒ?
๐Ÿ…ฐ overlapping pooling์„ ํ•˜๋ฉด pooling kernel์ด ์ค‘์ฒฉ๋˜๋ฉด์„œ top-1, top-5 ์—๋Ÿฌ์œจ์„ ์ค„์ด๋Š”๋ฐ ํšจ๊ณผ๊ฐ€ ์žˆ๋‹ค

๐Ÿ’โ€โ™€๏ธ top-1, top-5 ์—๋Ÿฌ์œจ์€ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์—๋Ÿฌ์œจ์ด๋‹ค

  • top-1 : ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ์˜ˆ์ธกํ•œ ํด๋ž˜์Šค ์ค‘ top1์ด ์‹ค์ œ ํด๋ž˜์Šค์™€ ๊ฐ™์„ ๊ฒฝ์šฐ์˜ error
  • top-5 : ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ์˜ˆ์ธกํ•œ ํด๋ž˜์Šค ์ค‘ top5 ์ค‘์— ์‹ค์ œ ํด๋ž˜์Šค๊ฐ€ ์žˆ์„ ๊ฒฝ์šฐ์˜ error

Dropout

๐Ÿ’โ€โ™€๏ธ over fitting์„ ๋ง‰๊ธฐ ์œ„ํ•œ ๊ทœ์ œ ๊ธฐ์ˆ ์˜ ์ผ์ข…์ด๋‹ค, fully connected layer์˜ ๋‰ด๋Ÿฐ ์ค‘ ์ผ๋ถ€ ๋‰ด๋Ÿฐ ๊ฐ’์„ 0์œผ๋กœ ๋ฐ”๊ฟ”๋ฒ„๋ฆฌ๋ฉด์„œ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.
0์ด ๋œ ๋‰ด๋Ÿฐ๋“ค์€ forward pass back propagation์—๋Š” ์•„๋ฌด๋Ÿฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง€ ์•Š๋Š”๋‹ค.

Data argumentation

๐Ÿ’โ€โ™€๏ธ over fitting์„ ๋ง‰๊ธฐ ์œ„ํ•œ ๋˜ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ฐ์ดํ„ฐ์— ๋ณ€ํ˜•์„ ์ฃผ์–ด ๋ฐ์ดํ„ฐ์˜ ์–‘์„ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ์ด๋‹ค
๐Ÿ’โ€โ™€๏ธ LesNet-5rk 6๋งŒ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ธ๊ฒƒ์— ๋น„ํ•ด AlexNet์€ 6์ฒœ๋งŒ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์ฒœ๋ฐฐ ๋งŽ์•„์กŒ๊ธฐ ๋•Œ๋ฌธ์— ๊ทธ๋งŒํผ ๋งŽ์€ ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

11x11 convolution filter

๐Ÿ†€ ์™œ 11x11 kernel์„ ์‚ฌ์šฉํ–ˆ์„๊นŒ?
๐Ÿ…ฐ receptive field size๋ฅผ ํ‚ค์šฐ๊ธฐ์œ„ํ•ด ํ•„ํ„ฐ ์‚ฌ์ด์ฆˆ๋ฅผ ํ‚ค์› ์œผ๋‚˜
Larger size filters are used to cover a wider range of the input image
๋” ๋„“์€ ๋ฒ”์œ„์˜ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ์ปค๋ฒ„ํ•˜๊ธฐ ์œ„ํ•ด ๋” ํฐ ํฌ๊ธฐ์˜ ํ•„ํ„ฐ๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

๐Ÿ†€ ์ผ๋ถ€์—์„œ๋งŒ activation map(feature map)์ด crossํ•˜๋Š” ์ด์œ ๋Š” ๋ฌด์—‡์ผ๊นŒ?
๐Ÿ…ฐ ๋ชจ๋“  ๋ถ€๋ถ„์—์„œ crossํ•˜๋ฉด ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์— ์ผ๋ถ€์—์„œ๋งŒ ์ˆ˜ํ–‰ํ–ˆ๋‹ค๊ณ  ํ•œ๋‹ค.

๐Ÿ†€ Fully connected Layers ๋„˜์–ด๊ฐ€๊ธฐ ์ „ vectorํ™”๋œ tensor์˜ ํฌ๊ธฐ๋Š” ์–ด๋–ป๊ฒŒ ๋ ๊นŒ?(2,3D -> 1D๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ณผ์ •)
๐Ÿ…ฐ 2๊ฐœ๋กœ ๋ณ‘๋ ฌ์ ์œผ๋กœ ํ•™์Šตํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— vectorํ™” ๊ณผ์ •์—์„œ 2048์ด ์•„๋‹Œ 2048X2์ธ 4096๊ฐœ๊ฐ€ ๋œ๋‹ค.

VGGNet - 2014

VGGNet๋ž€
์˜ฅ์Šคํฌ๋“œ ๋Œ€ํ•™ ์—ฐ๊ตฌํŒ€์ด ๋ฐœํ‘œํ•œ Very Deep Convolutional Networks for Large-Scale Image Recognition ๋…ผ๋ฌธ์—์„œ ์†Œ๊ฐœ๋œ ๋ชจ๋ธ์ด๋‹ค.
VGG-16 ๋ชจ๋ธ์€ ImageNet Challenge์—์„œ Top-5 ํ…Œ์ŠคํŠธ ์ •ํ™•๋„๋ฅผ 92.7% ๋‹ฌ์„ฑํ•˜๋ฉด์„œ 2014๋…„ ์ปดํ“จํ„ฐ ๋น„์ „์„ ์œ„ํ•œ ๋”ฅ๋Ÿฌ๋‹ ๊ด€๋ จ ๋Œ€ํ‘œ์  ์—ฐ๊ตฌ ์ค‘ ํ•˜๋‚˜๋กœ ์ž๋ฆฌ๋งค๊น€ํ•˜์˜€๋‹ค.

๐Ÿ’โ€โ™€๏ธ ํ•ด๋‹น ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ์€ ๋„คํŠธ์›Œํฌ์˜ ๊นŠ์ด๋ฅผ ๊นŠ๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ์„ฑ๋Šฅ์— ์–ด๋–ค ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€ ํ™•์ธํ•˜๊ณ ์ž ํ•œ๊ฒƒ์ด๋‹ค.

๐Ÿ†€ AlexNet์™€ ์ฐจ์ด์ ์€ ๋ฌด์—‡์ผ๊นŒ?

  1. ์˜ค์ง 3x3 filter, 2x2 max pooling ํ†ตํ•ด ๊นŠ์€ ๋„คํŠธ์›Œํฌ๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค.
  2. local response normalization๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜๋‹ค.

๐Ÿ’โ€โ™€๏ธ AlexNet๋ณด๋‹ค 2๋ฐฐ ์ด์ƒ ๊นŠ์€ ๋„คํŠธ์›Œํฌ์˜ ํ•™์Šต ์„ฑ๊ณตํ•˜์˜€์œผ๋ฉฐ ImageNet Challenge์—์„œ AlexNet์˜ ์˜ค์ฐจ์œจ์„ ์ ˆ๋ฐ˜(16.4 > 7.3)์œผ๋กœ ์ค„์˜€๋‹ค.
๐Ÿ’โ€โ™€๏ธ VGG๋Š” ๊ฐ„๋‹จํ•œ ๋ชจ๋ธ์— ๋ฐฐํ•ด ๋†’์€ ๊ฒฐ๊ณผ๋ฌผ์„ ๋ณด์—ฌ์คฌ์Œ.

VGGNet์˜ ํŠน์ง•

  1. 3x3 filter, 2x3 max pooling
  2. local response normalization

3x3 convolution

๐Ÿ’โ€โ™€๏ธ 3x3 filter๋ฅผ ํ†ตํ•ด ๋„คํŠธ์›Œํฌ์˜ ๊นŠ์ด 16์ธต, 19์ธต ๋“ฑ์œผ๋กœ ๊นŠ์ด๋ฅผ ๊นŠ๊ฒŒ ๋งŒ๋“ค์–ด ์„ฑ๋Šฅ์„ ๋†’์˜€๋‹ค.

๐Ÿ†€ ์–ด๋–ป๊ฒŒ 16โ€“19 ๋ ˆ์ด์–ด์™€ ๊ฐ™์ด ๊นŠ์€ ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์˜ ํ•™์Šต์„ ์„ฑ๊ณตํ–ˆ์„๊นŒ?
๐Ÿ…ฐ filter ์‚ฌ์ด์ฆˆ๊ฐ€ ํฌ๋ฉด ์ด๋ฏธ์ง€์˜ ์‚ฌ์ด์ฆˆ๊ฐ€ ๊ธˆ๋ฐฉ์ถ•์†Œ ๋˜๊ธฐ ๋•Œ๋ฌธ์— ๋„คํŠธ์›Œํฌ์˜ ๊นŠ์ด๋ฅผ ์ถฉ๋ถ„ํžˆ ๊นŠ๊ฒŒ ๋งŒ๋“ค์ง€ ๋ชปํ•˜์ง€๋งŒ ๋ชจ๋“  convolution layer์—์„œ 3x3 filter๋งŒ ์‚ฌ์šฉํ•˜์—ฌ ๋„คํŠธ์›Œํฌ๋ฅผ ๊นŠ๊ฒŒ ๋งŒ๋“ค์—ˆ๋‹ค.

๐Ÿ†€ ์™œ ๋ชจ๋“  Convolutional layer์—์„œ 3x3 ํ•„ํ„ฐ๋งŒ ์‚ฌ์šฉํ–ˆ์„๊นŒ?
๐Ÿ…ฐ VGG ์ด์ „ ๋ชจ๋ธ๋“ค์„ ํฐ receptive field๋ฅผ ๊ฐ–๊ธฐ ์œ„ํ•ด 11x11, 7x7 filter๋ฅผ ์‚ฌ์šฉํ•˜์˜€์œผ๋‚˜ VGG๋Š”๋Š” 3x3 filter๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ 7x7 filter์˜ ํšจ๊ณผ๋ฅผ ๋ณด์•˜๋‹ค. 2x2 max pooling๋ฅผ ํ†ตํ•ด receptive field ์‚ฌ์ด์ฆˆ๋ฅผ ์ถฉ๋ถ„ํžˆ ํฌ๊ฒŒ ์œ ์ง€ํ•  ํ•˜๊ณ ,

๐Ÿ’โ€โ™€๏ธ 7x7 filter 1๋ฒˆ ์ˆ˜ํ–‰ VS 3x3 filter 3๋ฒˆ ์ˆ˜ํ–‰
1. ๊ฒฐ์ • ํ•จ์ˆ˜์˜ ๋น„์„ ํ˜•์„ฑ ์ฆ๊ฐ€

  • conv ์—ฐ์‚ฐ์€ ReLU ํ•จ์ˆ˜ ์ ์šฉ์„ ํฌํ•จํ•˜๋ฏ€๋กœ 7x7 filter 1๋ฒˆ ์ˆ˜ํ–‰์€ ReLU ํ•จ์ˆ˜ ์ ์šฉ์„ 1๋ฒˆ ์ˆ˜ํ–‰ํ•˜๊ณ , 3x3 filter 3๋ฒˆ ์ˆ˜ํ–‰์€ ReLU ํ•จ์ˆ˜ ์ ์šฉ์„ 3๋ฒˆ ์ˆ˜ํ–‰ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ๋ ˆ์ด๋ฒ„๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ๋น„์„ ํ˜•์„ฑ์ด ์ฆ๊ฐ€ํ•˜๊ณ  ๋ชจ๋ธ์ด ํŠน์ง•์„ ๋” ์ž˜ ์‹๋ณ„ํ•  ์ˆ˜ ์žˆ๋‹ค.
  1. parameter ์ˆ˜ ๊ฐ์†Œ
  • 7x7 filter 1๊ฐœ์˜ parameter ์ˆ˜๋Š” 7x7x1 = 49์ด๊ณ , 3x3 filter 3๊ฐœ์˜ parameter ์ˆ˜๋Š” 3x3x3 = 27๋กœ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๊ฐ€ ํฌ๊ฒŒ ๊ฐ์†Œํ•œ๋‹ค.

๐Ÿ’โ€โ™€๏ธ ๋„คํŠธ์›Œํฌ์˜ ๊นŠ์ด๊ฐ€ ๊นŠ์–ด์ง€๋ฉด feature map์€ ๋™์ผํ•œ receptive field์— ๋Œ€ํ•ด ๋” ์ถ”์ƒ์ ์ธ ์ •๋ณด๋ฅผ ๋‹ด๊ฒŒ ๋˜๋ฏ€๋กœ ์ฃผ์˜ํ•ด์•ผํ•œ๋‹ค.

local response normalization ์‚ฌ์šฉํ•˜์ง€ ์•Š์€ ์ด์œ 

๐Ÿ’โ€ VGG์—ฐ๊ตฌํŒ€์€ A์™€ A-LRN ๊ตฌ์กฐ์˜ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•จ์œผ๋กœ์„œ ์„ฑ๋Šฅํ–ฅ์ƒ์—๋Š” ๋ณ„๋กœ ํšจ๊ณผ๊ฐ€ ์—†๋‹ค๊ณ  ์‹คํ—˜์„ ํ†ตํ•ด ํ™•์ธํ–ˆ๋‹ค

VGG์˜ ํ•œ๊ณ„์ 

๐Ÿ’โ€โ™€๏ธ ๋” ๊นŠ์€ ๋„คํŠธ์›Œํฌ๋Š” ๋” ํฐ capacity(์ˆ˜์šฉ์„ฑ)์™€ non-linearity(๋น„์„ ํ˜•์„ฑ) ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค. ๋˜ํ•œ ๋” ํฐ receptive fields๋ฅผ ๊ฐ€์ง€๊ฒŒ ํ•ด neural network๋Š” ๋” ๊นŠ์–ด์ง€๊ณ  ๋” ๋„“์–ด์กŒ๋‹ค.

๐Ÿ†€ ํ•˜์ง€๋งŒ ๋” ๊นŠ์–ด์ง€๋Š”๊ฒŒ ๋” ๋„“์–ด์ง€๋Š”๊ฒŒ ๋ฐ˜๋“œ์‹œ ์ข‹์„๊นŒ?
๐Ÿ…ฐ ๋ ๋ฃจ ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด Gradient vanishing ๋ฌธ์ œ๋ฅผ ์–ด๋Š์ •๋„ ํ•ด๊ฒฐํ–ˆ์ง€๋งŒ ๋” ๊นŠ์–ด์ง„ network๋Š” gradient vanishing(๊ฒฝ์‚ฌ์†Œ์‹ค)๊ณผ exploding(๊ฒฝ์‚ฌํญ๋ฐœ)์„ ์œ ๋ฐœํ•ด ์ตœ์ ํ™”๋ฅผ ๋”์šฑ ํž˜๋“ค๊ฒŒ ํ•˜๋ฉฐ ๊ณ„์‚ฐ๋ณต์žก๋„(computationally complex)๊ฐ€ ๋”์šฑ ๋Š˜์–ด๋‚œ๋‹ค.

๐Ÿ†€gradient vanishing(๊ฒฝ์‚ฌ์†Œ์‹ค)๊ณผ exploding(๊ฒฝ์‚ฌํญ๋ฐœ)์ด ์ผ์–ด๋‚˜๋ฉด ์–ด๋–ค ๊ฒฐ๊ณผ ๋‚˜ํƒ€๋‚ ๊นŒ?
๐Ÿ…ฐ ๋” ๊นŠ์€ ๋„คํŠธ์›Œํฌ๋Š” over-fitting(์˜ค๋ฒ„ํ”ผํŒ…)์„ ๋ถ€๋ฅผ๊ฒƒ์ด๋ผ๋Š” ์˜ˆ์ธก๊ณผ ๋‹ค๋ฅด๊ฒŒ ์‹ค์ œ๋กœ๋Š” Degradation problem์„ ๋ฐœ์ƒ์‹œํ‚จ๋‹ค

Degradation problem

๐Ÿ†€ Degradation problem์€ ๋ฌด์—‡์ผ๊นŒ?

๐Ÿ…ฐ ์ •ํ™•๋„๊ฐ€ ์–ด๋Š ์ˆœ๊ฐ„ ์ •์ฒด ๋˜๊ณ  layer๊ฐ€ ๋” ๊นŠ์–ด์งˆ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ๋” ๋‚˜๋น ์ง€๋Š” ํ˜•์ƒ์„ ๋งํ•œ๋‹ค.

  • Depth๊ฐ€ ๊นŠ์€ ์ƒํƒœ์—์„œ ํ•™์Šต์„ ์ด๋ฏธ ๋งŽ์ด ์ง„ํ–‰ํ•œ ๊ฒฝ์šฐ weight๋“ค์˜ ๋ถ„ํฌ๊ฐ€ ๊ท ๋“ฑํ•˜์ง€ ์•Š๊ณ , Back Propagation์‹œ์— ๊ธฐ์šธ๊ธฐ๊ฐ€ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์•„ ํ•™์Šต์„ ์•ˆ์ •์ ์œผ๋กœ ์ง„ํ–‰ํ•  ์ˆ˜ ์—†๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.
  • ์ด๊ฒƒ์ด over-fitting์ด๋ผ๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์ง€๋งŒ over-fitting์€ train data์—์„œ๋Š” ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์ข‹์€ ๋ฐ˜๋ฉด์— test data์—์„œ๋Š” ์„ฑ๋Šฅ์ด ์ข‹์ง€ ์•Š์€ ๊ฒฝ์šฐ์ด๋‹ค.
  • ๊ทธ์™€ ๋‹ค๋ฅด๊ฒŒ 20-layter๋ณด๋‹ค 56-layer ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด train data, test data์—์„œ ๋ชจ๋‘ ๋” ์•ˆ ์ข‹์€ ๊ฒƒ์„ ๋ณด๋ฉด over fitting ๋ฌธ์ œ๊ฐ€ ์•„๋‹ˆ๋ผ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰, optimization์ด ์ œ๋Œ€๋กœ ์ ์šฉ๋˜์ง€ ์•Š๋Š” ๊ฒƒ์ด๋‹ค.
  • ๋˜ํ•œ layer๊ฐ€ ๊นŠ์–ด์ง€๋ฉด์„œ ํŒŒ๋ผ๋งˆํ„ฐ ์ˆ˜๊ฐ€ ๋งŽ์•„์ ธ training error๊ฐ€ ์ฆ๊ฐ€ํ•œ๋‹ค

Epilogue

๐Ÿ†€ ๊ทธ๋ ‡๋‹ค๋ฉด ์ดํ›„ ๋ชจ๋ธ๋“ค์€ Degradation problem๋ฅผ ์–ด๋–ป๊ฒŒ ํ•ด๊ฒฐํ–ˆ์„๊นŒ?
๐Ÿ…ฐ ๋‹ค์Œ ์žฅ์—์„œ Degradation์„ ํ•ด๊ฒฐํ•œ ๋ชจ๋ธ์ธ GoogLeNet,ResNet ๋“ฑ์„ ์†Œ๊ฐœํ•  ๊ฒƒ์ด๋‹ค.

-์ฐธ๊ณ -

profile
AI Learning, Parcelled Innovations, Carrying All

0๊ฐœ์˜ ๋Œ“๊ธ€