๐Ÿ˜„ Lecture 04 | Introduction to Neural Networks

๋ฐฑ๊ฑดยท2022๋…„ 1์›” 16์ผ
0

Stanford University CS231n.ย 

๋ชฉ๋ก ๋ณด๊ธฐ
2/6

๋ณธ ๊ธ€์€ Hierachical Structure์˜ ๊ธ€์“ฐ๊ธฐ ๋ฐฉ์‹์œผ๋กœ, ๊ธ€์˜ ์ „์ฒด์ ์ธ ๋งฅ๋ฝ์„ ํŒŒ์•…ํ•˜๊ธฐ ์‰ฝ๋„๋ก ์ž‘์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
๋˜ํ•œ ๋ณธ ๊ธ€์€ CSF(Curation Service for Facilitation)๋กœ ์ธ์šฉ๋œ(์ฐธ์กฐ๋œ) ๋ชจ๋“  ์ถœ์ฒ˜๋Š” ์ƒ๋žตํ•ฉ๋‹ˆ๋‹ค.

1. Introduction to Neural Networks


1.1 CONTENTS

VelogLectureDescriptionVideoSlidePages
์ž‘์„ฑ์ค‘Lecture01Introduction to Convolutional Neural Networks for Visual Recognitionvideoslidesubtitle
์ž‘์„ฑ์ค‘Lecture02Image Classificationvideoslidesubtitle
์ž‘์„ฑ์ค‘Lecture03Loss Functions and Optimizationvideoslidesubtitle
์ž‘์„ฑ์ค‘Lecture04Introduction to Neural Networksvideoslidesubtitle
์ž‘์„ฑ์ค‘Lecture05Convolutional Neural Networksvideoslidesubtitle
์ž‘์„ฑ์ค‘Lecture06Training Neural Networks Ivideoslidesubtitle
์ž‘์„ฑ์ค‘Lecture07Training Neural Networks IIvideoslidesubtitle
์ž‘์„ฑ์ค‘Lecture08Deep Learning Softwarevideoslidesubtitle
์ž‘์„ฑ์ค‘Lecture09CNN Architecturesvideoslidesubtitle
์ž‘์„ฑ์ค‘Lecture10Recurrent Neural Networksvideoslidesubtitle
์ž‘์„ฑ์ค‘Lecture11Detection and Segmentationvideoslidesubtitle
์ž‘์„ฑ์ค‘Lecture12Visualizing and Understandingvideoslidesubtitle
์ž‘์„ฑ์ค‘Lecture13Generative Modelsvideoslidesubtitle
์ž‘์„ฑ์ค‘Lecture14Deep Reinforcement Learningvideoslidesubtitle
์ž‘์„ฑ์ค‘Lecture15Invited Talk: Song Han Efficient Methods and Hardware for Deep Learningvideoslidesubtitle
์ž‘์„ฑ์ค‘Lecture16Invited Talk: Ian Goodfellow Adversarial Examples and Adversarial Trainingvideoslidesubtitle

1.2 ์ฐธ๊ณ  ์˜์ƒ

1.3 ์ฐธ๊ณ  ๋ฌธ์„œ

1.4 KeyWords

1.4.1 Neural networks

์ฐธ์กฐ : Neural networks

  • ํ•ต์‹ฌ๊ฐœ๋…
    • ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ ๊ธฐ๊ณ„ํ•™์Šต๊ณผ ์ธ์ง€๊ณผํ•™์—์„œ ์ƒ๋ฌผํ•™์˜ ์‹ ๊ฒฝ๋ง์—์„œ ์˜๊ฐ์„ ์–ป์€ ํ†ต๊ณ„ํ•™์  ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ ์‹œ๋ƒ…์Šค์˜ ๊ฒฐํ•ฉ์œผ๋กœ ๋„คํŠธ์›Œํฌ๋ฅผ ํ˜•์„ฑํ•œ ์ธ๊ณต ๋‰ด๋Ÿฐ์ด ํ•™์Šต์„ ํ†ตํ•ด ์‹œ๋ƒ…์Šค์˜ ๊ฒฐํ•ฉ ์„ธ๊ธฐ๋ฅผ ๋ณ€ํ™”์‹œ์ผœ, ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์„ ๊ฐ€์ง€๋Š” ๋ชจ๋ธ ์ „๋ฐ˜์„ ๊ฐ€๋ฆฌํ‚จ๋‹ค.

1.4.2 computational graphs

์ฐธ์กฐ

  • computational graphs
  • computational graphs
  • ํ•ต์‹ฌ๊ฐœ๋…
    • ๊ณ„์‚ฐ ๊ณผ์ •์„ ๊ทธ๋ž˜ํ”„๋กœ ๋‚˜ํƒ€๋‚ธ๊ฒƒ ์—ฌ๊ธฐ์„œ ๊ทธ๋ž˜ํ”„๋Š” ๋ณต์ˆ˜์˜ ๋…ธ๋“œ(node)์™€ ์—์ง€(edge)๋กœ ํ‘œํ˜„
    • ๋…ธ๋“œ๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ์„ ์ด ์—์ง€

1.4.3 backpropagation

์ฐธ์กฐ : backpropagation

  • ํ•ต์‹ฌ ๊ฐœ๋…
    • Backpropagation์€ ์˜ค๋Š˜ ๋‚  Artificial Neural Network๋ฅผ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•œ ์ผ๋ฐ˜์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ํ•œ๊ตญ๋ง๋กœ ์ง์—ญํ•˜๋ฉด ์—ญ์ „ํŒŒ๋ผ๋Š” ๋œป์ธ๋ฐ, ๋‚ด๊ฐ€ ๋ฝ‘๊ณ ์ž ํ•˜๋Š” target๊ฐ’๊ณผ ์‹ค์ œ ๋ชจ๋ธ์ด ๊ณ„์‚ฐํ•œ output์ด ์–ผ๋งˆ๋‚˜ ์ฐจ์ด๊ฐ€ ๋‚˜๋Š”์ง€ ๊ตฌํ•œ ํ›„ ๊ทธ ์˜ค์ฐจ๊ฐ’์„ ๋‹ค์‹œ ๋’ค๋กœ ์ „ํŒŒํ•ด๊ฐ€๋ฉด์„œ ๊ฐ ๋…ธ๋“œ๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ณ€์ˆ˜๋“ค์„ ๊ฐฑ์‹ ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜

1.4.4 biological neurons

์ฐธ์กฐ : biological neurons

  • ํ•ต์‹ฌ ๊ฐœ๋…
    • ์ƒ๋ฌผํ•™์  ๋‰ด๋Ÿฐ์€ ํ™œ๋™ ์ „์œ„ ๋˜๋Š” ์ŠคํŒŒ์ดํฌ๋ผ๊ณ ํ•˜๋Š” ์•ฝ 1 ๋ฐ€๋ฆฌ ์ดˆ ๋™์•ˆ ์„ธํฌ๋ง‰์„ ๊ฐ€๋กœ ์งˆ๋Ÿฌ ๋‚ ์นด๋กœ์šด ์ „๊ธฐ ์ „์œ„๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์‹ ๊ฒฝ๊ณ„์˜ ํŠน์ • ์„ธํฌ

2. Summary


2.1 ์ „์ฒด ์š”์•ฝ

์—ญ์ „ํŒŒ ๋ฐ ์‹ ๊ฒฝ๋ง
์ „์ฒด์š”์•ฝ
CNN์€ spatial structure๋ฅผ ๋ณด์กดํ•˜๊ธฐ ์œ„ํ•ด convolutional layer๋ฅผ ์‚ฌ์šฉํ•˜๋Š”NN์˜ ํ•œ ์ข…๋ฅ˜์ด๋‹ค.(FC๋ ˆ์ด์–ด๋Š” ์ด๋ฏธ์ง€๋ฅผ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด ์ด๋ฏธ์ง€๋ฅผ ํ–‰๋ ฌ์„ ํ•œ์ค„๋กœ ์ญ‰ ํŽด๋Š” ์ž‘์—…(Flatten)๋ฅผ ํ•˜๋Š”๋ฐ ์ด๋ฏธ์ง€์—์„œ ๋ถ™์–ด์žˆ๋˜ ํ”ฝ์…€๋“ค์ด Flattenํ•œ ํ–‰๋ ฌ์—์„œ๋Š” ์„œ๋กœ ๋–จ์–ด์ง„๋‹ค.-> ์ด๋ฏธ์ง€์˜ ๊ณต๊ฐ„์ ๊ตฌ์กฐ(spatial structurw)๋ฅผ ๋ฌด์‹œํ•œ๋‹ค. ๋ฐ˜๋ฉด์— CNN์€ ํ•„ํ„ฐ๋ฅผ ์Šฌ๋ผ์ด๋“œํ•จ์œผ๋กœ์จ ์ฃผ์œ„ ํ”ฝ์…€๋“ค์„ ๊ณ„์‚ฐํ•˜๋ฉด์„œ ์ด๋ฏธ์ง€์˜ ๊ณต๊ฐ„์ ์ธ ๊ตฌ์กฐ๋ฅผ ๋ณด์กดํ•œ๋‹ค.) Conv ํ•„ํ„ฐ(weights)๊ฐ€ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ์Šฌ๋ผ์ด๋”ฉํ•ด์„œ ๊ณ„์‚ฐํ•œ ๊ฐ’๋“ค์ด ๋ชจ์—ฌ ๊ฐ ์ถœ๋ ฅ Activation map์„ ๋งŒ๋“ ๋‹ค. Convolutional layer๋Š” ๊ฐ ๋ ˆ์ด์–ด ๋งˆ๋‹ค ๋‹ค์ˆ˜์˜ ํ•„ํ„ฐ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ณ , ๊ฐ ํ•„ํ„ฐ๋Š” ์„œ๋กœ ๋‹ค๋ฅธ Activation map์„ ์ƒ์„ฑํ•œ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋ชจ๋“  weights(๊ฐ€์ค‘์น˜) ๋˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์˜ ๊ฐ’์„ ์•Œ๊ณ  ์‹ถ์€ ๊ฒƒ์ด๊ณ , 5๊ฐ•์—์„œ๋Š” Optimization์„ ํ†ตํ•ด์„œ ๋„คํŠธ์›Œํฌ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋‹ค. ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋ฉด์„œ, Loss๋ผ๋Š” ์‚ฐ์—์„œ Loss๊ฐ€ ์ค„์–ด๋“œ๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ด๋™ํ•˜๊ณ  ์‹ถ์–ดํ•œ๋‹ค. ๊ทธ๋ ‡๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” gradient์˜ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ์œผ๋กœ ์ด๋™ํ•˜๋ฉด ๋œ๋‹ค. Mini-batch stochastic Gradient Desent๋Š” ์šฐ์„  ํ…Œ์ดํ„ฐ์˜ ์ผ๋ถ€๋งŒ ๊ฐ€์ง€๊ณ (sample a batch of data) forword pass๋ฅผ ์ˆ˜ํ–‰ํ•œ ๋’ค์— Loss๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  gradient๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด์„œ backprop๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค.
์‹ ๊ฒฝ๋ง์—์„œ ์—ด์‹ฌํžˆ ํ›ˆ๋ จ์‹œ์ผœ๋†“๊ณ  ์˜ค๋ฅ˜๋‚˜๋Š” ๊ฒƒ์œผ๋กœ ๊ฑฐ๊พธ๋กœ ๋งž์ถฐ๊ฐ€๋ฉด์„œ ๋ณ€์ˆ˜๊ฐ’์„ ์กฐ์ •ํ•˜์—ฌ ์ ํ•ฉํ•œ ์‹์„ ์ฐพ์•„๋‚ด๋Š”๊ฒƒ

2.2 Step #01 : Computational Graph

2.2.1 Gradient Desent(๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•)

  • ์ˆ˜์‹์˜ ์˜๋ฏธ : ๊ธฐ์šธ๊ธฐ ๊ตฌํ•˜๋Š” ์ผ๋ฐ˜์‹
  • Gradient Desent์—๋Š” Numerical Gradient์™€ Analytic Gradietn๊ฐ€ ์žˆ๋‹ค.
  • NG๋Š” ๋Š๋ฆฌ๊ณ  ๋Œ€๋žต์ ์ธ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์“ฐ๊ธฐ๊ฐ€ ํŽธํ•˜๋‹ค
  • AG๋Š” ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•˜์ง€๋งŒ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ ์‰ฝ๋‹ค.
  • ์šฐ๋ฆฌ๋Š” AG๋ฅผ ์ฐพ์•„ AG๋ฅผ ๊ตฌํ˜„ํ•  ๊ฒƒ์ด๋‹ค.

cfcf.์ฐธ๊ณ 

  • ํŠน์ˆ˜ํ•œ ์˜๋ฏธ์˜ ๋ฏธ์ ๋ถ„
    • ๋ฏธ๋ถ„์˜ ์˜๋ฏธ
      - ๋ฏธ๋ถ„(์ˆœ๊ฐ„๋ณ€ํ™”์œจ)์€ ์–ด๋–ค ์‹œ์Šคํ…œ(ํ•จ์ˆ˜)์ด ์žˆ์„ ๋•Œ, ์ด ์‹œ์Šคํ…œ์ด ์–ด๋–ค ๋ณ€์ˆ˜(์š”์ธ)์— ์˜ํ•ด ์–ด๋–ป๊ฒŒ ์˜ํ–ฅ์„ ๋ฐ›๋Š”์ง€๋ฅผ ๋ถ„์„ํ•˜๋Š” ๊ฐ€์žฅ ํ•ต์‹ฌ์ ์ธ ๋„๊ตฌ๋กœ ์‚ฌ์šฉ
    • ์ ๋ถ„์˜ ์˜๋ฏธ
      - ์ ๋ถ„์€ ์ž…๋ ฅ๋œ ๊ฐ’์— ๋”ฐ๋ผ ๋‚˜ํƒ€๋‚œ ์ด ๊ฒฐ๊ณผ, ํ˜„์ƒ์œผ๋กœ ์š”์ธ์— ๋”ฐ๋ฅธ ๊ฒฐ๊ณผ์˜ ์ƒํƒœ๋ฅผ ๋‚˜ํƒ€๋ƒ„
  • ์ˆ˜ํ•™์  Gradient ์˜๋ฏธ
    • ๊ฒฝ์‚ฌ=๋†’์ด์˜ ๋ณ€ํ™”=์Šค์นผ๋ผ๊ฐ’์˜ ๋ณ€ํ™”์œจ=ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„์— ๋ฐฉํ–ฅ=๋‹จ์œ„๋ฒกํ„ฐ๋ฅผ ๋ถ€์—ฌํ•ด์„œ, ์ „์ฒด์ ์ธ ๋ณ€ํ™”์˜ ๊ฒฝํ–ฅ์„ ๋‚˜ํƒ€๋‚ด๋Š” ์ˆ˜ํ•™์  ๋ฐฉ๋ฒ•

2.2.2 Computaional Graph

  • ๊ณ„์‚ฐ์„ ๊ทธ๋ž˜ํ”„๋กœ ํ‘ผ๋‹ค.
  • ์‹œ์ž‘์ , ๊ฐ ์ฒดํฌ ํฌ์ธํŠธ์™€ ๋์ , ๊ทธ๋ฆฌ๊ณ  ๊ฐ ์ ์„ ์—ฐ๊ฒฐํ•˜๋Š” ์„ .
  • ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„ ์ดํ•ด ๋‹จ๊ณ„
    • 1๋‹จ๊ณ„
      - ๊ณ„์‚ฐ ๊ณผ์ •์„ ๋…ธ๋“œ์™€ ํ™”์‚ดํ‘œ๋กœ๋งŒ ํ‘œ์‹œ
      - ๋…ธ๋“œ์˜ ๊ฒฐ๊ณผ๊ฐ’์„ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ์ „๋‹ฌ
    • 2๋‹จ๊ณ„
      - ๋…ธ๋“œ๋ฅผ ์—ฐ์‚ฐ์œผ๋กœ๋งŒ ๊ณ ๋ ค
      - ๊ณ„์‚ฐ ๊ณผ์ •์˜ ์ˆซ์ž๋ฅผ ์™ธ๋ถ€ ๋ณ€์ˆ˜๋กœ ํ‘œ์‹œ
    • 3๋‹จ๊ณ„
      - x, + ๋“ฑ์„ ๋„ฃ์–ด ์‹ค์ œ ๊ทธ๋ž˜ํ”„๋กœ ๊ณ„์‚ฐ
  • ์ด ๋‹จ๊ณ„์—์„œ Foward, Backward Propagation์ด ์žˆ์Œ
    • Foward Propagation : ๊ณ„์‚ฐ์„ ์™ผ์ชฝ์—์„œ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ์ง„ํ–‰
    • Backward Propagation : ๊ณ„์‚ฐ์„ ์˜ค๋ฅธ์ชฝ์—์„œ ์™ผ์ชฝ์œผ๋กœ ์ง„ํ–‰(cf ํŽธ๋ฏธ๋ถ„)

cfcf.์ฐธ๊ณ 

  • ๋…ธ๋“œ
    • ๋…ธ๋“œ ์˜๋ฏธ
      - ๋…ธ๋“œ(node)๋Š” ์ปดํ“จํ„ฐ ๊ณผํ•™์— ์“ฐ์ด๋Š” ๊ธฐ์ดˆ์ ์ธ ๋‹จ์œ„์ด๋‹ค. ๋…ธ๋“œ๋Š” ๋Œ€ํ˜• ๋„คํŠธ์›Œํฌ์—์„œ๋Š” ์žฅ์น˜๋‚˜ ๋ฐ์ดํ„ฐ ์ง€์ (data point)์„ ์˜๋ฏธ
  • ํŽธ๋ฏธ๋ถ„
    • ํŽธ๋ฏธ๋ถ„์˜ ์˜๋ฏธ
      - dz(๋†’์ด์˜ ์ฆ๊ฐ€๋Ÿ‰-lim)์„ ๊ตฌ์„ฑํ•˜๋Š” ๊ฐ๊ฐ์˜ ์š”์†Œ์˜ ์ฆ๊ฐ€๋Ÿ‰์„ ํ‘œํ˜„ํ•  ๋•Œ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ.

2.2.3 Computational Graph์˜ ์‚ฌ์šฉ

  • local
    - ์ „์ฒด๊ฐ€ ๋ณต์žกํ•ด๋„ ๊ฐ ๋…ธ๋“œ์—์„œ ๋‹จ์ˆœํ•œ ๊ณ„์‚ฐ์— ์ง‘์ค‘์—์—ฌ ๋ฌธ์ œ๋ฅผ ๋‹จ์ˆœํ™”
    - ์ค‘๊ฐ„ ๊ณ„์‚ฐ ๊ฒฐ๊ณผ๋ฅผ ๋ชจ๋‘ ๋ณด๊ด€ ๊ฐ€๋Šฅ
    - Backpropagation์„ ํšจ์œจ์ ์œผ๋กœ ๊ณ„์‚ฐ์ด ๊ฐ€๋Šฅ

ํ•œ์ค„ ์ด์•ผ๊ธฐ

  • ๊ณ„์‚ฐ ๊ณผ์ •์„ ๊ทธ๋ž˜ํ”„๋กœ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ

2.3 Step #02 : Backpropagation

2.3.1 Backpropagation ์‚ฌ์ „์ดํ•ด

  • Cost Function = Loss Function
  • Cost Function์ด๋ž€ ์‹ ๊ฒฝ๋ง์— ํ›ˆ๋ จ๋ฐ์ดํ„ฐ xx๋ฅผ ๊ฐ€ํ•˜๊ณ 
  • ์‹ค์ œ ์ถœ๋ ฅ๊ณผ ๊ธฐ๋Œ€ ์ถœ๋ ฅ๊ฐ„์˜ ์ฐจ์— ๋Œ€ํ•œ MSE(Mean sQuare Error)๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ
  • y(x)y(x)์™€ aa์˜ ์ฐจ์ด๊ฐ€ ์ž‘์•„์งˆ ์ˆ˜๋ก ์‹ ๊ฒฝ๋ง ํ•™์Šต์ด ์ž˜ ๋จ
  • ํ›ˆ๋ จ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด ๊ฐ€์ค‘์น˜(ww)์™€ ๋ฐ”์ด์–ด์Šค(bb)๋ฅผ ๋ณ€ํ™”์‹œํ‚ค๋Š” ๊ณผ์ •์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ˆ˜ํ–‰
  • Cost Function์ด ์ตœ์†Œ๊ฐ’์ด ๋˜๋„๋ก ํ•˜๋Š” ๊ฒƒ์ด ์‹ ๊ฒฝ๋ง ํ•™์Šต์˜ ๋ชฉํ‘œ.

์–ด๋–ป๊ฒŒ w ์™€ b๊ฐ’์„ ๋ณ€ํ™”์‹œ์ผœ์•ผ ์ตœ์ ์˜ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์„๊นŒ?

  • ์ฐธ๊ณ ์ž๋ฃŒ
  • w(Weight : ๊ฐ€์ค‘์น˜)
    • ๋‹ค์Œ ๋…ธ๋“œ๋กœ ๋„˜์–ด๊ฐˆ๋•Œ ๋น„์ค‘์„ ์กฐ์ ˆํ•˜๋Š” ๋ณ€์ˆ˜
  • b(Bias : ํŽธํ–ฅ)
    • ์ผ์ข…์˜ ์„ฑํ–ฅ ๋ฒ„ํ”„ : ๊ฐ€์ค‘ํ•ฉ์— ๋”ํ•ด์ฃผ๋Š” ์ƒ์ˆ˜๊ฐ’
    • ๋‰ด๋Ÿฐ์—์„œ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ๊ฑฐ์ณ ์ตœ์ข…์ ์œผ๋กœ ์ถœ๋ ฅ๋˜๋Š” ๊ฐ’์„ ์กฐ์ ˆํ•˜๋Š” ์—ญํ• 
  • w๋‚˜ b๋ฅผ ํŽธ๋ฏธ๋ถ„์‹œํ‚ค๋ฉด, ์ถœ๋ ฅ ์ชฝ์—์„œ ๋งค์ฃผ ์ž‘์€ ๋ณ€ํ™”๊ฐ€ ์ƒ๊ธฐ๋ฉฐ ์„ ํ˜•์ ์ธ ๊ด€๊ณ„๋ฅผ ํ™•์ธ
  • ์ด ๋•Œ ์ถœ๋ ฅ์—์„œ์˜ ์˜ค์ฐจ๋ฅผ ๋ฐ˜๋Œ€ ์ž…๋ ฅ ์ชฝ์œผ๋กœ ์ „ํŒŒ์‹œํ‚ค๋ฉด์„œ w, b๋ฅผ ๊ฐฑ์‹ ํ•˜๋ฉด ๋œ๋‹ค.
  • Cost Function์ด ๊ฒฐ๊ตญ w์™€ b๋กœ ์ด๋ฃจ์–ด์กŒ๊ธฐ ๋•Œ๋ฌธ์— ์ถœ๋ ฅ ๋ถ€๋ถ„๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ด์„œ ์ž…๋ ฅ์ชฝ์œผ๋กœ, ์ˆœ์ฐจ์ ์œผ๋กœcost Function์— ๋Œ€ํ•œ ํŽธ๋ฏธ๋ถ„์„ ๊ตฌํ•˜๊ณ , ์–ป์€ ํŽธ๋ฏธ๋ถ„ ๊ฐ’์„ ์ด์šฉํ•ด w์™€ b๊ฐ’์„ ๊ฐฑ์‹ ์‹œํ‚ด
  • ๋ชจ๋“  ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ์ด ์ž‘์—…์„ ๋ฐ˜๋ณต ์ˆ˜ํ–‰
  • ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ์ตœ์ ํ™”๋œ w์™€ b๊ฐ’์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.
  • f(e)f(e)๋Š” sigmoidํ•จ์ˆ˜์— ํ•ด๋‹น, e๋Š” ๊ฐ ๋„ท์œผ๋กœ๋ถ€ํ„ฐ ์ž…๋ ฅ๊ณผ ๊ฐ€์ค‘์น˜์˜ ๊ณฑ์˜ ์ดํ•ฉ

2.3.2 Backpropagation ๊ฐœ๋… ์ดํ•ด

  • ์ฐธ๊ณ ์ž๋ฃŒ
  • ์‹ ๊ฒฝ๋ง ๋ณ€์ˆ˜๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•œ ์ข‹์€ ๋ฐฉ๋ฒ•
  • ๊ฐ ๋…ธ๋“œ๊ฐ€ ์ตœ์ข… ๊ฒฐ๊ณผ์— ์–ด๋–ค ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€ ์•Œ ์ˆ˜ ์žˆ์Œ
  • ๋…ธ๋“œ์— ์ž…๋ ฅ๋˜๋Š” ๊ฐ’์— ๋Œ€ํ•œ ์ตœ์ข… ๊ฒฐ๊ณผ์˜ ๋ฏธ๋ถ„
  • ๋…ธ๋“œ์˜ ๊ฐ’์ด ๋ณ€ํ–ˆ์„๋•Œ ์ตœ์ข…๊ฒฐ๊ณผ๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋ณ€ํ™”ํ•˜๋Š”์ง€๋ฅผ Backpropagation์„ ํ†ตํ•ด ๊ตฌํ•จ.
  • ํŽธ๋ฏธ๋ถ„์„ ์ „๋‹ฌํ•˜๊ณ  ์˜ค๋ฅธ์ชฝ์—์„œ ์™ผ์ชฝ์œผ๋กœ ๊ฐ’์„ ์ „๋‹ฌ
  • ์ค‘๊ฐ„๊นŒ์ง€ ๊ตฌํ•œ ๋ฏธ๋ถ„ ๊ฒฐ๊ณผ๋ฅผ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ์–ด์„œ ๋‹ค์ˆ˜์˜ ๋ฏธ๋ถ„์„ ํšจ์œจ์ ์œผ๋กœ ๊ฒŸ๋‚˜
  • ๊ฐ ๋ณ€์ˆ˜์˜ ๋ฏธ๋ถ„์„ ํšจ์œจ์ ์œผ๋กœ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ๋’ค์—์„œ ์˜ค๋Š” gradients์— local gradient๋ฅผ ๊ณฑํ•˜๋Š” ๊ฒƒ์œผ๋กœ ํ•ด์„

cfcf.์ฐธ๊ณ 

  • Chain Rule
    - ์—ฐ์‡„ ๋ฒ•์น™
    - ํ•ฉ์„ฑํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„์— ๋Œ€ํ•œ ์„ฑ์งˆ์ด๋ฉฐ, ํ•ฉ์„ฑ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„์€ ํ•ฉ์„ฑ ํ•จ์ˆ˜๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ๊ฐ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„์˜ ๊ณฑ์œผ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.
    - ์˜ˆ ) z=(x+y)2z=(x+y)^2์—์„œ
    โˆ‚zย โˆ‚x\partial z \over\ \partial x = โˆ‚zย โˆ‚t\partial z \over\ \partial tโˆ‚tย โˆ‚x\partial t \over\ \partial x๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.
    ์—ฐ์‡„๋ฒ•์น™์„ ์จ์„œ โˆ‚zย โˆ‚x\partial z \over\ \partial x๋ฅผ ๊ตฌํ•˜๋ฉด
    โˆ‚zย โˆ‚t\partial z \over\ \partial t = 2t2t
    โˆ‚tย โˆ‚x\partial t \over\ \partial x = 1
    ์ตœ์ข…์ ์œผ๋กœ ๊ตฌํ•˜๊ณ  ์‹ถ์€ โˆ‚zย โˆ‚x\partial z \over\ \partial x ๋Š” ๋‘ ๋ฏธ๋ถ„์„ ๊ณฑํ•ด ๊ณ„์‚ฐ
    โˆ‚zย โˆ‚x\partial z \over\ \partial x = โˆ‚zย โˆ‚t\partial z \over\ \partial tโˆ‚tย โˆ‚x\partial t \over\ \partial x = 2tโˆ—1=2(x+y)2t* 1 = 2(x+y)

2.3.3 backpropagation์˜ ๊ณ„์‚ฐ ์ ˆ์ฐจ

- ๋…ธ๋“œ๋กœ ๋“ค์–ด์˜จ ์ž…๋ ฅ ์‹ ํ˜ธ์— ๊ทธ ๋…ธ๋“œ์˜ ํŽธ๋ฏธ๋ถ„์„ ๊ณฑํ•œ ํ›„ ๋‹ค์Œ ๋…ธ๋“œ๋กœ ์ „๋‹ฌ
- โˆ‚z\partial z์™€ โˆ‚t\partial t๋Š” ์ „๋ถ€ ์†Œ๊ฑฐ ๋˜์–ด 'xx์— ๋Œ€ํ•œ zz์˜ ๋ฏธ๋ถ„์ด ๋จ.
- bckpropagation๊ณผ chain rule์˜ ์›๋ฆฌ๊ฐ€ ๊ฐ™์Œ

2.3.4 backpropagation์˜ ํ•ต์‹ฌ์„ ์—ฐ์‡„๋ฒ•์น™์˜ ์›๋ฆฌ๋กœ ์„ค๋ช…

- backpropagarion์˜ ๊ณ„์‚ฐ ์ ˆ์ฐจ๋Š” ์‹ ํ˜ธ E์— ๋…ธ๋“œ์˜ ํŽธ๋ฏธ๋ถ„โˆ‚yย โˆ‚x\partial y \over\ \partial x์„ ๊ณฑํ•œ ํ›„ ๋‹ค์Œ ๋…ธ๋“œ๋กœ ์ „๋‹ฌ.
- ๋ฏธ๋ถ„๊ฐ’์„ ํšจ์œจ์ ์œผ๋กœ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.

2.3.5 Gate์˜ ์ข…๋ฅ˜

  • gate๋“ค์„ gradient๊ด€์ ์—์„œ ๋ณด๋ฉด distributor, router, switcher๋กœ ์ƒ๊ฐ
  • add๋Š” x+y๋กœ
    • ๋ฏธ๋ถ„ํ•ด๋ณด๋ฉด x๋ฐฉํ–ฅ์ด๋“ , y๋ฐฉํ–ฅ์œผ๋กœ๋“  1
    • ํ•ฉ์„ฑํ•จ์ˆ˜ ๊ผด๋กœ ๋‚˜ํƒ€๋ƒˆ์„๋•Œ ๋’ค์—์„œ ์˜ค๋Š” gradient๋ฅผ ์•ž์œผ๋กœ ๊ทธ๋Œ€๋กœ ์ „๋‹ฌํ•˜๋Š” ๊ผด
    • distributor๋กœ ํ•ด์„
  • max๋Š” router์ธ๋ฐ, max(x,y)์ธ๋ฐ, ์ด ๋‘˜์ค‘ ํ•œ ๊ฐ’๋งŒ
    • x๊ฐ€ ์ปค์„œ x๊ฐ€ computational graph์—์„œ forwarding๋˜์—ˆ๋‹ค๊ณ  ํ•˜๋ฉด, ๊ฒฐ๊ตญ x์ž์ฒด๋งŒ ๋’ค์— ์˜ํ–ฅ์„ ๋ผ์นœ๊ฒƒ์ด๋ฏ€๋กœ ๋’ค์—์„œ ์˜ค๋Š” gradient๊ฐ€ x๋ฐฉํ–ฅ์œผ๋กœ๋งŒ backwarding
    • backpropagation ์ž…์žฅ์—์„œ ๋ณด๋ฉด ์—ฌ๋Ÿฌ path์ค‘ ํ•˜๋‚˜์˜ path๋กœ๋งŒ backwardingํ•˜๋Š” ๊ฒƒ
    • router
  • mul gate
    • xy์ผ๋•Œ x๋ฐฉํ–ฅ์œผ๋กœ ๋“ค์–ด๊ฐ€๋Š” gradient๋Š” y๊ฐ€ ๊ณฑํ•ด์ ธ์„œ ๋“ค์–ด๊ฐ

ํ•œ์ค„ ์ด์•ผ๊ธฐ

  • ํ•™์Šต์‹œํ‚จ ํ›„ ๊ฒฐ๊ณผ์— ๋”ฐ๋ฅธ ์˜ค์ฐจ๊ฐ’์„ ๋‹ค์‹œ ๋’ค๋กœ ์ „ํŒŒํ•ด๊ฐ€๋ฉด์„œ ๊ฐ ๋…ธ๋“œ๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ณ€์ˆ˜๋“ค์„ ๊ฐฑ์‹ ํ•˜๋Š” ๊ฒƒ
  • ๋’ค์—์„œ ์˜ค๋Š” gradients์— local gradient๋ฅผ ๊ณฑํ•˜๋Š” ๊ฒƒ

2.6 Step #03 : Vectorized Operations & Jacobian Matrix

Gradient๊ฐ€ ํ•˜๋‚˜๊ฐ€ ์•„๋‹Œ ์—ฌ๋Ÿฌ ๊ฐœ์ธ vector ๊ณต๊ฐ„์—์„œ backpropagation์€?

2.6.1 gradient๊ฐ€ Jacobian matrix๊ฐ€ ๋˜๋Š” ๊ฒƒ

  • ๋‹ค๋ฅธ ๊ณ„์‚ฐ์€ ๊ทธ๋Œ€๋กœ

ํ•œ์ค„ ์ด์•ผ๊ธฐ

  • ๋ฏธ์†Œ ์˜์—ญ์—์„œ โ€˜๋น„์„ ํ˜• ๋ณ€ํ™˜โ€™์„ โ€˜์„ ํ˜• ๋ณ€ํ™˜์œผ๋กœ ๊ทผ์‚ฌโ€™ ์‹œํ‚จ ๊ฒƒ
  • Vector์— ๋Œ€ํ•œ backpropagation์€ gradient๊ฐ€ Jacobian Matrix๊ฐ€ ๋จ

2.7 Step #04 : Summary

2.8 ์ง€๊ธˆ๊นŒ์ง€ ์ด์ •๋ฆฌ

  • ์‹ ๊ฒฝ๋ง์˜ ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ•ธ๋“ค๋งํ•˜๋Š” ๊ฒƒ์€...ใ…œใ…œ
  • ๋…ธ๋“œ์˜ ๋ฐฉํ–ฅ์„ ์•ž๋’ค๋กœ ๊ฒ€์ฆํ•  ๋•Œ ๊ทธ๋ž˜ํ”„๋ฅผ ์œ ์ง€์‹œ์ผœ์•ผ..

2.9 Step #05 : Neural Network

  • ์ง€๊ธˆ๊นŒ์ง€ Linear score funtion ์„ ๊ณต๋ถ€ํ–ˆ๋‹ค.
  • ์ด์ œ๋Š” 2-layer Neural Network๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.
    f=W2max(0,W1x)f = W_2max (0,W_1x) ๊ฐ™์€
    ์ด๊ฒƒ์ด 3layer๋กœ -> f=W3max(0,W2max(0,W1x))f = W_3max(0,W_2max (0,W_1x))
    ๋‹ค์–‘ํ•œ ๋น„์„ ํ˜• ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ธ๊ฐ„์˜ ์‹ ๊ฒฝ๋ง์„ ๋”ฐ๋ผ๊ฐˆ ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“ ๋‹ค.

ํ•œ์ค„ ์ด์•ผ๊ธฐ

  • ์ƒ๋ฌผํ•™์  ๋‰ด๋Ÿฐ์€ ์ธ๊ณต ์‹ ๊ฒฝ๋ง๋ณด๋‹ค ๋ณต์žกํ•œ ๊ตฌ์กฐ๋กœ ์ธ๊ณต์‹ ๊ฒฝ๋ง๊ณผ ์ผ๋Œ€์ผ ๋Œ€์‘์€ ์•„๋‹ˆ๋‹ค.
profile
๋งˆ์ผ€ํŒ…์„ ์œ„ํ•œ ์ธ๊ณต์ง€๋Šฅ ์„ค๊ณ„์™€ ์Šคํƒ€ํŠธ์—… Log

0๊ฐœ์˜ ๋Œ“๊ธ€