๐Ÿ“Œ ๋ณธ ๋‚ด์šฉ์€ Michigan University์˜ 'Deep Learning for Computer Vision' ๊ฐ•์˜๋ฅผ ๋“ฃ๊ณ  ๊ฐœ์ธ์ ์œผ๋กœ ํ•„๊ธฐํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. ๋‚ด์šฉ์— ์˜ค๋ฅ˜๋‚˜ ํ”ผ๋“œ๋ฐฑ์ด ์žˆ์œผ๋ฉด ๋ง์”€ํ•ด์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํžˆ ๋ฐ˜์˜ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
(Stanford์˜ cs231n๊ณผ ๋‚ด์šฉ์ด ๊ฑฐ์˜ ์œ ์‚ฌํ•˜๋‹ˆ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ๋„์›€ ๋˜์‹ค ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค)๐Ÿ“Œ


1. ImageNet classification Challenge

1) ๊ฐœ๋…

  • ์—„์ฒญ ํฐ ๊ทœ๋ชจ dataset
  • ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜์— ๋Œ€ํ•œ ํฐ bench mark
  • CNN์„ค๊ณ„์—์„œ ๋งŽ์€ ์‹œ์‚ฌ์  ๋‚จ๊น€
  • 2010, 2011๋…„ โ†’ Neural Network base X 2012๋…„ โ†’ CNN์ด ์ฒจ์œผ๋กœ ๊ฑฐ๋Œ€ํ•œ ์ฃผ๋ฅ˜๊ฐ€ ๋˜๋˜ ํ•ด (AlexNet์ด ์••๋„ํ•จ)




2. AlexNet

๐Ÿ“ ๊ณ„์‚ฐ๊ตฌ์กฐ ์‹œ์‚ฌ์ : ์ดˆ๊ธฐ ๋ฉ”๋ชจ๋ฆฌๅคš,ํŒŒ๋ผ๋ฏธํ„ฐ์ˆ˜(fc layer)์—์„œ, ๊ณ„์‚ฐ๋น„์šฉ(conv์—์„œ ๅคš)
1) ์„ค๊ณ„

  • 227 * 227 inputs
  • 5 conv layers
  • Max pooling
  • 3 fully-connected layers
  • relu ๋น„์„ ํ˜• ํ•จ์ˆ˜

2) ๋‹จ์ 

  • Local response normalization์‚ฌ์šฉ (ํ˜„์žฌ๋Š” ์‚ฌ์šฉX, batch norm์˜ ์„ ๊ตฌ์ž)
  • 2๊ฐœ์˜ GTX 580 GPU์— ํ•™์Šต๋จ
    • ๊ฐ๊ฐ 3GB ๋ฐ–์— ์•ˆ๋จ (ํ˜„์žฌ๋Š” 12-18GB)
    • GPU ๋ฉ”๋ชจ๋ฆฌ์— ๋งž์ถ”๊ธฐ ์œ„ํ•ด 2๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ ๋ฌผ๋ฆฌ์  GTX์นด๋“œ๋กœ ๋ถ„์‚ฐ๋จ
      (GPU ์—ฌ๋Ÿฌ๊ฐœ๋กœ ๋ถ„ํ• ์€ ํ˜„๋Œ€์—์„œ๋„ ๊ฐ€๋” ์‚ฌ์šฉํ•˜์ง€๋งŒ ์ฃผ๋กœ ์‚ฌ์šฉX)

3) citations(์ธ์šฉํšŸ์ˆ˜)

  • ๋ชจ๋“  ๊ณผํ•™๋ถ„์•ผ์—์„œ ์ ค ์ธ์šฉ ๅคš

4) ๊ณ„์‚ฐ ๊ตฌ์กฐ

a. Conv Layer

  • C=3
    : RGB
  • input size; H/W=227
    : input size
  • filters=64
    : output size์˜ channel๊ณผ ๊ฐ™์•„์•ผ ๋จ
  • output size; H/W=56
    : ((W-K+2P)/S)+1
    โ†’ ((227-11+4)/4)+1 = 56
  • memory(KB)=784
    : (number of output elements) * (bytes per element) / 1024
    โ†’ (64(5656))*4/1024 = 784
  • params(k) = 23 (ํ•™์Šต๊ฐ€๋Šฅ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜)
    : number of weights
    =(weight shape)+(bias shape)
    = (CoutCink*k) + Cout
    = (64311*11) + 64 = 23,296
  • flop(M) = 73 (์ด ์—ฐ์‚ฐ์ˆ˜ = ๋ถ€๋™ ์†Œ์ˆ˜์  ์—ฐ์‚ฐ์ˆ˜)
    : Number of floating point operations(multipy+add)
    = (number of output elements)*(1๊ฐœ์˜ output elem๋‹น ์—ฐ์‚ฐ ์ˆ˜)
    = (CoutHโ€™Wโ€™) (Cink*k)
    = (645656) (311*11)
    = 72,855,552

b. pooling layer

  • Cin = Cout = 64
  • output size; H/W = 27
    : ((W-K+2P)/S)+1
    = 27.5 (Alexnet์€ ํ•ญ์ƒ ๋‚˜๋ˆ ๋–จ์–ด์ง€์ง€x)
    = floor(27.5)=27 ๊ฑ ๋‚ด๋ฆผํ•จ
  • memory(KB) = 182
    : (number of output elements) * (bytes per element) / 1024
    = 182.25
  • params(k) = 0
    : pooling layer์—๋Š” learnable parameter ์—†์Œ
  • flop(M) = 0
    : Number of floating point operations(multipy+add)
    = (number of output positions)*(1๊ฐœ์˜ output position๋‹น ์—ฐ์‚ฐ ์ˆ˜)
    = (CoutHโ€™Wโ€™) (kk)
    = 0.4 MFlop

c. flatten

  • flatten output size = 9216 (๋ชจ๋“  ๊ณต๊ฐ„๊ตฌ์กฐ ํŒŒ๊ดด, ๋ฒกํ„ฐ๋กœ ํ‰๋ฉดํ™”)
    : Cin H W
    =25666 = 9216

d. FC

  • FC params
    : Cin * Cout + Cout
    = 9216 * 4096 + 4096
    = 37,725,832
  • FC flops
    : Cin * Cout
    = 9216 * 4096
    = 37,748,736

5) ์œ„ ๊ณ„์‚ฐ์—์„œ ์•Œ ์ˆ˜ ์žˆ๋Š” ํŠน์ง•

  • ์‹œํ–‰์ฐฉ์˜ค์ 
  • ์ง€๊ธˆ์€ ์‚ฌ์šฉ ์ ์Œ

a. Memory ์‚ฌ์šฉ๋Ÿ‰

  • ์ดˆ๊ธฐ์— ๋ฉ”๋ชจ๋ฆฌ ๅคš
    • ์ด์œ ) ์ดˆ๊ธฐ conv layer์˜ output์ด ์ƒ๋Œ€์ ์œผ๋กœ ๋†’์€ ๊ณต๊ฐ„ ํ•ด์ƒ๋„์™€ ๋งŽ์€ ์ˆ˜์˜ filter๊ฐ€์ ธ์„œ

b. parameter ์ˆ˜

  • ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์€ fc layer์— ์กด์žฌ
    • ์ด์œ ) 66256์˜ tensor ๊ฐ€์ง€๊ณ  ์žˆ๊ณ , 4096์˜ ์ˆจ๊ฒจ์ง„ ์ฐจ์›์œผ๋กœ ์™„์ „ํžˆ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์–ด์„œ
      • Alexnet์˜ ๋ชจ๋“  learnable parameter๊ฐ€ fully connected layer์—์„œ ๋‚˜์˜ด

c. ๊ณ„์‚ฐ ๋น„์šฉ

  • conv layer์—์„œ ์—ฐ์‚ฐ๋Ÿ‰ ๅคš
    • ์ด์œ ) ๊ณ„์‚ฐ ๋น„์šฉ์€ fc์—์„œ๋Š” ๋ณ„๋กœ ์•ˆํผ.(๊ฑ ๊ณฑํ•˜๊ธฐ๋งŒ ํ•ด์„œ)
      ๋ฐ˜๋ฉด conv layer์—๋Š” filter์ˆ˜๊ฐ€ ๋งŽ๊ณ , ๋†’์€ ๊ณต๊ฐ„ ํ•ด์ƒ๋„๋ฉด ๊ณ„์‚ฐ๋น„์šฉ โ†‘




3. ZFNet: Bigger AlexNet

๐Ÿ“ ๊ณ„์‚ฐ๊ตฌ์กฐ ์‹œ์‚ฌ์ : ๋” ํฐ ๋„คํŠธ์›Œํฌ๊ฐ€ ๋” ์„ฑ๋Šฅ good

1) ํŠน์ง•

  • more trial, less error

2) AlexNet๊ณผ ๋ฐ”๋€ ์ 

  • conv1
    • (11x11 stride 4) โ†’ (7x7 stride 2)๋กœ ๋ฐ”๋€œ
      • ๊ธฐ์กด 4๋งŒํผ down sample โ†’ 2๋งŒํผ down sample๋กœ ๋ฐ”๋€œ
      • ๋†’์€ ๊ณต๊ฐ„ ํ•ด์ƒ๋„ & ๋” ๋งŽ์€ receptive field & ๋” ๋งŽ์€ ์ปดํ“จํŒ… ๋น„์šฉ
  • conv3,4,5
    • (384,384,256 filters) โ†’ (512,1024,512)๋กœ ๋ฐ”๋€œ
      • filter ํฌ๊ฒŒ = ๋„คํŠธ์›Œํฌ ๋” ํฌ๊ฒŒ

=โ‡’ ๊ฒฐ๋ก ) ๋” ํฐ ๋„คํŠธ์›Œํฌ๊ฐ€ ๋” ์„ฑ๋Šฅ์ด ์ข‹๋‹ค




4. VGG: Deeper Networks, Regular Design

๐Ÿ“ ๊ณ„์‚ฐ๊ตฌ์กฐ ์‹œ์‚ฌ์ : ๊ตณ์ด ํฐ ํ•„ํ„ฐ ํ•„์š”X, conv layer๊ฐœ์ˆ˜ ๋” ์ค‘์š”, ์ฑ„๋„ ์ˆ˜ ๋งŽ์•„์ ธ๋„ ๊ณ„์‚ฐ๋น„์šฉ ๋™์ผ - Stage ์‚ฌ์šฉ
1) AlexNet, ZFNet ๊ณตํ†ต ๋ฌธ์ œ์ 

  • ad hoc way (๋„คํŠธ์›Œํฌ ํ™•์žฅ, ์ถ•์†Œ ์–ด๋ ค์›€)
  • hand design ๋งž์ถคํ˜• convolution architecture

โ‡’ VGG๋Š” ๋„คํŠธ์›Œํฌ์˜ ๋™์ผํ•œ ์กฐ๊ฑด์œผ๋กœ ์ „์ฒด ์ ์šฉ (๋‹จ์ˆœํ™”ํ•จ)

2) VGG ์„ค๊ณ„ ๊ทœ์น™ (์ •ํ™•ํ•œ ๊ตฌ์„ฑ์— ๋Œ€ํ•ด ์ƒ๊ฐX)

  • ๊ธฐ๋ณธ ์„ธํŒ…
    • All conv are 3x3 stride 1 pad 1
    • All max pool are 2x2 stride 2
    • After pool, double channels
  • stage
    • Alexnet์€ 5๊ฐœ์˜ conv layer์žˆ์—ˆ๊ณ , VGG๋Š” ๋” ๊นŠ๊ฒŒ ํ•œ๊ฒƒ

    • 1๊ฐœ์˜ stage = conv, pooling layer๋“ฑ ํฌํ•จ

    • VGG: 5๊ฐœ์˜ stage

3) ํŠน์ • ์„ค๊ณ„ ๊ทœ์น™ ์ฑ„ํƒ ์ด์œ 

a. conv layer

  • ๊ธฐ์กด: learnable parameter์—ฌ์„œ ๋งค๋ฒˆ ๋‹ฌ๋ผ์ง

    โ†’ conv = 3x3 ์œผ๋กœ ๊ณ ์ •์‹œํ‚จ ๊ฒƒ

  • ์ฆ๋ช…

    • ๊ฐ€์ •1: 1๊ฐœ์˜ conv layer + 5x5 kernel size ์ผ๋•Œ,

      conv(5x5, Cโ†’C) = (kernel, input โ†’ output)

      params = 25C225C^2, FLOPs = 25C2HW25C^2HW

      (C๊ฐœ์˜ conv filter ๊ฐ€์ง€๊ณ  ์žˆ์–ด์„œ)

    • ๊ฐ€์ •2: 2๊ฐœ์˜ conv layer + 3x3 kernel size ์ผ๋•Œ,

      params = 18C218C^2, FLOPs = 18C2HW18C^2HW

      (๊ฐ๊ฐ 9C29C^2 ์”ฉ)

    • ๊ฒฐ๋ก 

      โ‡’ ๊ฐ€์ •2์˜ ๋” ์ž‘์€ kernel size์— ๋” ๋งŽ์€ conv layer๊ฐœ์ˆ˜๊ฐ€ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐœ์ˆ˜๋‚˜, ์—ฐ์‚ฐ๋Ÿ‰ ๋ฉด์—์„œ ํšจ์œจ์ 

    • ๋‹จ์ผ 5x5 conv ๋ณด๋‹ค good

=โ‡’ ๊ตณ์ด ํฐ ํ•„ํ„ฐ ํ•„์š”X โ†’ hyperparameter๋กœ kernel size ์‹ ๊ฒฝ ํ•„์š”X โ†’ conv layer์ˆ˜๋งŒ ์‹ ๊ฒฝ

b. pooling layer

  • ํ•ด์„

    • pool ํ• ๋•Œ๋งˆ๋‹ค ์ฑ„๋„ ์ˆ˜ 2๋ฐฐ๋กœ
  • ์ฆ๋ช…

    • stage 1
    • stage 2
    • ๊ฒฐ๋ก  =โ‡’ ์ฑ„๋„ ์ˆ˜ ๋งŽ์•„์ ธ๋„, ๋ฉ”๋ชจ๋ฆฌ 2๋ฐฐ ๊ฐ์†Œ, ๊ณ„์‚ฐ ๋น„์šฉ ๋™์ผ

4) AlexNet vs VGG-16

โ‡’ ๊ฒฐ๋ก ) ๋„คํŠธ์›Œํฌ โ†‘ โ†’ ์„ฑ๋Šฅ โ†‘

5) ์งˆ๋ฌธ

Q. VGG๋„ multiple GPU์‚ฌ์šฉ?

A. multiple GPU์žˆ์—ˆ์ง€๋งŒ, ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ๋กœ ๋ฐฐ์น˜๋ถ„ํ•  & ๋ฐฐ์น˜๋ณ„๋กœ ๋‹ค๋ฅธ GPU์—์„œ ๊ณ„์‚ฐ

โ†’ ๋ชจ๋ธ ๋ถ„ํ•  X, ๋ฏธ๋‹ˆ๋ฐฐ์น˜ ๋ถ„ํ• O




5. GoogLeNet: Focus on Efficiency

๐Ÿ“ Stem, Inception Module, Global Average pooling, Auxiliary Classifier

  • ๊ธฐ์กด) network์ปค์ง€๋ฉด : ์„ฑ๋Šฅ ๋” ์ข‹์Œ

1) ๊ฐœ๋…

  • ํšจ์œจ์„ฑ์— ์ดˆ์  โ‡’ ์ „์ฒด์ ์ธ ๋ณต์žก์„ฑ ์ตœ์†Œํ™”

2) Stem network

  • ๊ฐœ๋…

    • input image๋ฅผ ์—„์ฒญ๋‚˜๊ฒŒ down samplingํ•จ (๊ฒฝ๋Ÿ‰์˜ stem ์ด์šฉ)
    • ๋ช‡๊ฐœ์˜ layer๋งŒ์œผ๋กœ ๋งค์šฐ ๋น ๋ฅด๊ฒŒ down sampling๊ฐ€๋Šฅ
    • ๊ฐ’๋น„์‹ผ convolution ์ˆ˜ํ–‰ํ•„์š”X
  • ๊ตฌ์กฐ

  • VGG์™€์˜ ๋น„๊ต

    • VGG๊ฐ€ GoogleNet๋ณด๋‹ค 18๋ฐฐ ๋” ๋น„์Œˆ

3) Inception Module

  • ๊ฐœ๋…

    • ์ „์ฒด ๋„คํŠธ์›Œํฌ์—์„œ ๋ฐ˜๋ณต๋˜๋Š” ๋กœ์ปฌ ๊ตฌ์กฐ
    • ๊ธฐ์กด์˜ conv-conv-pool์˜ ๊ตฌ์กฐ์ฒ˜๋Ÿผ, GoogleNet์€ ์ž‘์€ inception module designํ•ด์„œ ์ „์ฒด ๋„คํŠธ์›Œํฌ์—์„œ ๋ฐ˜๋ณต
  • ๊ตฌ์กฐ

    • 3x3 max pooling stride 1

  • ๊ธฐ๋Šฅ
    • ๊ธฐ๋Šฅ1
      • ๊ธฐ์กด์˜ kernel size๋ฅผ ๋Œ€์‹ ํ•˜์—ฌ 3x3 stack์œผ๋กœ ๋Œ€์ฒดํ•  ์ˆ˜ ์žˆ๋‹จ๊ฒƒ
      • hyper parameter๋กœ kernel size์ œ๊ฑฐ (ํ•ญ์ƒ ๋ชจ๋“  kernel size ์ˆ˜ํ–‰ํ•  ๊ฒƒ์ด๋ฏ€๋กœ)
    • ๊ธฐ๋Šฅ2
      • ๋” ๋น„์‹ผ conv (3x3โ€ฆ) ์‚ฌ์šฉ์ „์— 1x1 conv์‚ฌ์šฉํ•˜์—ฌ, ์ฑ„๋„ ์ˆ˜ ์ค„์ž„ (bottleneckํ˜„์ƒ ํ™œ์šฉ)

4) Global Average Pooling

  • ๊ฐœ๋…
    • ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ค„์—ฌ์•ผํ•˜๋ฏ€๋กœ
    • ํ‰ํƒ„ํ™”ํ•˜์—ฌ ๊ณต๊ฐ„์ •๋„ ํŒŒ๊ดดํ•˜๊ธฐ๋ณด๋‹ค, ์ „์ฒด์— ๋Œ€ํ•œ average pooling์œผ๋กœ ๊ณต๊ฐ„ ์ฐจ์› ์ถ•์†Œํ•œ ๋’ค FC Layer ํ•œ๋ฒˆ ์‚ฌ์šฉ
  • ๊ตฌ์กฐ
  • VGG์™€ ๋น„๊ต

5) Auxiliary Classifiers (๋ณด์กฐ ๋ถ„๋ฅ˜๊ธฐ)

  • ๊ฐœ๋…
    • Batch norm ๋ฐœ์ƒ์ „์— ์ƒ๊น€
      • 10๊ฐœ ์ด์ƒ layer๊ฐ€ ์žˆ์„๋•Œ trainํ•˜๊ธฐ ์–ด๋ ค์› ์Œ
      • 10๊ฐœ ์ด์ƒ layer train ์œ„ํ•ด์„œ ugly hacks์— ์˜์กดํ•ด์•ผ ํ–ˆ์Œ
    • Network ๊นŠ์ด๊ฐ€ ๊นŠ์„๋•Œ, ์ค‘๊ฐ„ layer์˜ ํ•™์Šต ๋•๊ธฐ ์œ„ํ•ด ์„ค๊ณ„
    • ์ตœ์ข…์ ์œผ๋กœ ๋งจ ๋, ์ค‘๊ฐ„ ์ด 2๊ฐœ ๊ฐ๊ฐ์—์„œ ์ ์ˆ˜ ๋ฐ›์Œ
      • gradient ๊ณ„์‚ฐํ•ด์„œ backpropํ•˜์—ฌ gradient ์ „ํŒŒ (๋‹น์‹œ ์‹ฌ์ธต ๋„คํŠธ์›Œํฌ ์ˆ˜๋ ด์‹œํ‚ค๊ธฐ ์œ„ํ•œ trick)
      • ์ถ”๊ฐ€์ ์ธ ๋ณด์กฐ ๋ถ„๋ฅ˜๊ธฐ ์ถœ๋ ฅ (gradient)๋ฅผ ์•ž ๊ณ„์ธต์— ๋„ฃ๊ณ , ์ค‘๊ฐ„ ๊ณ„์ธต์—๋„ ์ด๊ฒŒ ๋„์›€๋˜๋ฉฐ, ์ด๊ฒƒ๋“ค์˜ ์ผ๋ถ€์— ๊ธฐ๋ฐ˜ํ•ด์„œ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ๋จ




6. Residual Networks

๐Ÿ“ batch norm ๋ฐœ๊ฒฌ์ดํ›„/์ง€๋ฆ„๊ธธ/VGG(Stage) + GoogLeNet(Stem, Inception Module, Global Average pooling) ์‚ฌ์šฉ
1) ๋ชจ๋ธ ์ƒ์„ฑ ๋ฐฐ๊ฒฝ

  • ๋ฌธ์ œ์ 
    • Batch Norm๋ฐœ๊ฒฌ ํ›„, ๊ธฐ์กด์—๋Š” bigger layer์ด ๋” ์„ฑ๋Šฅ ์ข‹์•˜๋Š”๋ฐ ์ด์ œ ๊นŠ์€ ๋ชจ๋ธ์ด ์„ฑ๋Šฅ ๋” ์•ˆ์ข‹์•„์ง !
      = layer๊ฐ€ ๊นŠ์–ด์งˆ์ˆ˜๋ก ํšจ์œจ์ ์ธ ์ตœ์ ํ™” ๋ถˆ๊ฐ€๋Šฅ !
  • ๋ฌธ์ œ์— ๋Œ€ํ•œ ์ด์œ  ์˜ˆ์ƒ
    • ๊นŠ์€ ๋ชจ๋ธ์ด overfitting ๋œ ๊ฑฐ๋‹ค.
  • ๊ธฐ๋ณธ ๊ฐ€์ •
    • deeper model์€ shallower model์„ ๋ชจ๋ฐฉํ•  ์ˆ˜ ์žˆ๋‹ค
      ex. 56 layer๊ฐ€ 20 layer๋ฅผ ๋ชจ๋ฐฉํ•œ๋‹ค(20 layer์˜ ๋ชจ๋“  layer๋ฅผ 56 layer์— copyํ•œ๋‹ค๊ณ  ์ƒ๊ฐ)
      โ†’ ๋”ฐ๋ผ์„œ deeper model์€ ์ตœ์†Œํ•œ shallow model๋ณด๋‹ค ๋” ์„ฑ๋Šฅ์ด ์ข‹๋‹ค
      โ‡’ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ์ ์ด ๊ธฐ๋ณธ ๊ฐ€์ •์—์„œ ๋ฒ—์–ด๋‚จ
  • ํ•ด๊ฒฐ์ฑ…
    • layer๊ฐ€ ๊นŠ์€ ๊ฒฝ์šฐ, identity function์„ ๋” ์‰ฝ๊ฒŒ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก network ๋ณ€๊ฒฝํ•ด์•ผ๋จ
    • ๊ทธ๋ ‡๊ฒŒ ํ•ด์„œ ๋‚˜์˜ค๊ฒŒ ๋œ๊ฒŒ Residual Network

๊ฒฐ๋ก ) layer๊ฐ€ ๊นŠ์–ด์งˆ์ˆ˜๋ก ํšจ์œจ์ ์ธ ์ตœ์ ํ™” ๋ถˆ๊ฐ€๋Šฅ โ†’ layer๊ฐ€ ๊นŠ์„๋•Œ identity function์„ ๋” ์‰ฝ๊ฒŒ ํ•™์Šตํ•˜๋„๋ก

2) Shortcut

  • ๊ฐœ๋…

    • ์ง€๋ฆ„๊ธธ ์ƒ์„ฑ
  • ์žฅ์ 

    • identity function์„ ๋งค์šฐ ์‰ฝ๊ฒŒ ๋ฐฐ์šธ ์ˆ˜ ์žˆ์Œ
      • ์ง€๋ฆ„๊ธธ ์‚ฌ์ด์˜ block๋“ค์„ ๊ฐ€์ค‘์น˜=0์œผ๋กœ block identity ๊ณ„์‚ฐ ๊ฐ€๋Šฅ
        = deep network๊ฐ€ emulate(๋ชจ๋ฐฉ)ํ•˜๊ธฐ ์‰ฝ๊ฒŒ ๋งŒ๋“ฆ
    • gradient ์ „ํŒŒ๋ฅผ ๊ฐœ์„ ํ•˜๋Š”๋ฐ ๋„์›€
      • ex. ์—ญ์ „ํŒŒ์˜ +์ผ๋•Œ, ๊ธฐ์šธ๊ธฐ๋ฅผ ์ž…๋ ฅ์— ๋ชจ๋‘ ๋ณต์‚ฌ. ์ด residual block์„ ํ†ตํ•ด ์—ญ์ „ํŒŒ์‹œ ์ง€๋ฆ„๊ธธ๋กœ ๋ณต์‚ฌํ•ด์ค„ ์ˆ˜ ์žˆ์Œ

3) ๋ชจ๋ธ ๊ตฌ์กฐ

  • ๊ฐœ๋…

    • VGG(๋‹จ์ˆœํ•œ ์„ค๊ณ„ ์›์น™)์™€ GoogleNet(์ˆ˜ํ•™์  ๊ณ„์‚ฐ)์˜ ๊ฐ€์žฅ ์ข‹์€ ๋ถ€๋ถ„์—์„œ ์˜๊ฐ ๋ฐ›์Œ

    • ๋งŽ์€ residual block์˜ stack์ž„

      a. VGG์—์„œ ๋”ฐ์˜จ ๊ฒƒ

      • ๊ฐ residual block์€ 2๊ฐœ์˜ 3x3 conv ์žˆ์Œ
      • Stage ๊ตฌ์กฐ
        • ๊ฐ stage์˜ ์ฒซ๋ฒˆ์งธ block์€ stride 2 conv๋กœ ํ•ด์ƒ๋„ ๋ฐ˜์œผ๋กœ ์ค„์ž„
        • ์ฑ„๋„ 2๋ฐฐ๋กœ ๋Š˜๋ฆผ

      b. GoogleNet์—์„œ ๋”ฐ์˜จ ๊ฒƒ

      • Stem ๊ตฌ์กฐ

        • ์ฒ˜์Œ input์„ down samplingํ•จ
      • Global Average Pooling

        • ๊ทธ๋Œ€๋กœ fully connected layer๋กœ ์•ˆ๋„˜๊น€
        • ํŒŒ๋ผ๋ฏธํ„ฐ ์ค„์ด๊ธฐ ์œ„ํ•จ

      c. ์‚ฌ์šฉ์ž๊ฐ€ ์ •ํ•ด์•ผํ•  ๊ฒƒ

      • ์ดˆ๊ธฐ ๋„คํŠธ์›Œํฌ ๋„ˆ๋น„ ex. C=64
      • stage๋‹น block ์ˆ˜ ex. 3 residual blocks per stage

4) ๋ชจ๋ธ ์˜ˆ์‹œ

a. ResNet-18

b. ResNet-34

  • ํ•ด์„

    • ๋งค์šฐ ๋‚ฎ์€ error ๋‹ฌ์„ฑ
  • VGG-16๊ณผ ๋น„๊ต

    • ๋‘˜๋‹ค resnet์ด ๋” ์ข‹์Œ
    • GFLOP: ResNet์€ ์•ž์— downsamplingํ•˜๊ณ  ์‹œ์ž‘ํ•ด์„œ ์ฐจ์ด ๋งŽ์ด ๋‚จ

5) Bottleneck Block (GoogleNet์˜ Inception Module)

  • ๊ฐœ๋…

    • ๋” ๊นŠ์–ด์ง์— ๋”ฐ๋ผ Block design์ˆ˜์ •
  • Basic Block

    • ๊ฐ conv layer์—์„œ๋งŒ ๊ณ„์‚ฐ ๋จ
  • Bottleneck Block

    • 4๋ฐฐ ๋งŽ์€ channel์˜ ์ž…๋ ฅ ์ˆ˜๋ฝ

      โ‡’ ๊ฒฐ๋ก ) ๊ณ„์‚ฐ ๋น„์šฉ ์ฆ๊ฐ€์‹œํ‚ค์ง€ ์•Š์œผ๋ฉด์„œ, ๋” ๊นŠ์€ ๋„คํŠธ์›Œํฌ ๊ตฌ์ถ• ๊ฐ€๋Šฅ

6) ์ตœ์ข… ์ „์ฒด ๋ชจ๋ธ ๊ตฌ์กฐ

  • ๊ฐœ๋…
    • ๊นŠ๊ฒŒ ์Œ“์„์ˆ˜๋ก, ๋” error ์ค„์–ด๋“ฆ !!
  • ๊ฒฐ๊ณผ
    • ๋‹ค ์ด๊ฒผ์Œ




7. Improving Residual Networks: Block Design

๐Ÿ“ Conv ์ „์— Batch norm๊ณผ Relu๋„ฃ๊ธฐ

  • ๊ฐœ๋…
    • Conv ์ „์— Batch norm๊ณผ Relu๋ฅผ ๋„ฃ์–ด์„œ ์„ฑ๋Šฅ ๊ฐœ์„ ๊ฐ€๋Šฅ




8. Compare Complexity

  • ์ „์ฒด ๋น„๊ต

  • ํ•ด์„
    • size of dot: ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜
    • G-Ops(Operations): ํ•ด๋‹น ์•„ํ‚คํ…์ฒ˜์˜ ์—ฌ์œ ๊ฒฝ๋กœ ๊ณ„์‚ฐํ•˜๋Š”๋ฐ ๊ฑธ๋ฆฌ๋Š” FLOP์ˆ˜
    • Inception-v4: Resnet + Inception
    • VGG: ๊ฐ€์žฅ ๋†’์€ ๋ฉ”๋ชจ๋ฆฌ, ๊ฐ€์žฅ ๋งŽ์€ ์—ฐ์‚ฐ๋Ÿ‰ (๋งค์šฐ ๋น„ํšจ์œจ)
    • GoogLeNet: ๋งค์šฐ ํšจ์œจ์  ์—ฐ์‚ฐ๋Ÿ‰, ๊ทธ์น˜๋งŒ ์„ฑ๋Šฅ์€ ๊ทธ๋‹ฅ..
    • AlexNet: ๋งค์šฐ ์ ์€ ์—ฐ์‚ฐ๋Ÿ‰, ๊ทธ์น˜๋งŒ ์—„์ฒญ ๋งŽ์€ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜
    • ResNet: ์‹ฌํ”Œ ๋””์ž์ธ, ๋” ๋‚˜์€ ํšจ์œจ์„ฑ, ๋†’์€ accuracy (๋” ๊นŠ๊ฒŒ ์„ค๊ณ„ํ•จ์— ๋”ฐ๋ผ)




9. Model Ensembles

  • 2016 ์šฐ์Šน์ž: ์ข‹์€ ๋ชจ๋ธ๋“ค๋ผ๋ฆฌ ์•™์ƒ๋ธ”ํ•จ




10. ResNeXt

๐Ÿ“ ResNet ๊ฐœ์„  ๋ฒ„์ „ - Group ์ถ”๊ฐ€

  • ๊ฐœ๋…
    • ํ•˜๋‚˜์˜ bottleneck์ด ์ข‹์œผ๋ฉด, ์ด๋ฅผ ๋ณ‘๋ ฌ์ ์œผ๋กœ ๊ตฌ์„ฑํ•˜๋ฉด ๋” ์ข‹์ง€ ์•Š๊ฒ ๋Š”๊ฐ€!

  • ๊ณ„์‚ฐ ๊ฒฐ๊ณผ
    • Total FLOPs: (8Cc+9c^2)HWG
    • ์ด๊ฑธ๋กœ ํŒจํ„ด ๋„์ถœ ๊ฐ€๋Šฅ
      • C=64,G=4,c=24 ; C=64,G=32,c=4 ์ผ๋•Œ ์œ„์™€ ๊ฐ™์€ ๊ฒฐ๊ณผ ๋„์ถœ ๊ฐ€๋Šฅ

        โ‡’ ๊ฒฐ๋ก ) Group ์œผ๋กœ ๋ณ‘๋ ฌ์ ์œผ๋กœ ํ• ๋•Œ ๋” ์ข‹์€ ์„ฑ๋Šฅ ๋ณด์ž„

1) Grouped Convolution

  • ๊ตฌ์กฐ

    • group=1์ผ๋•Œ
    • group=2์ผ๋•Œ
    • group=G์ผ๋•Œ

2) ResNeXt์— Group ์ถ”๊ฐ€

  • ๊ตฌ์กฐ

3) Group๋ณ„ ์„ฑ๋Šฅ ๊ฒฐ๊ณผ

  • ํ•ด์„
    • Group์„ ์ถ”๊ฐ€ํ•จ์— ๋”ฐ๋ผ ์„ฑ๋Šฅ ๋” ์ข‹์•„์ง!




11. SENet

(Squeeze and Excite)

  • ๊ฐœ๋…
    • Residual block ์‚ฌ์ด์— Global pooling, FC, Sigmoid ๋„ฃ์–ด์„œ Global context ๋งŒ๋“ฆ

profile
๐Ÿ–ฅ๏ธ

0๊ฐœ์˜ ๋Œ“๊ธ€