Optimization

๊น€์„ ์žฌยท2021๋…„ 12์›” 26์ผ
0

AI Tech

๋ชฉ๋ก ๋ณด๊ธฐ
7/8
post-thumbnail

๐Ÿ’ก ์ตœ์ ํ™”์— ๋Œ€ํ•œ ๋งŽ์€ ์šฉ์–ด๊ฐ€ ์ƒ๊ธฐ๋Š”๋ฐ ์šฉ์–ด์— ๋ช…ํ™•ํ•œ ์ดํ•ด๊ฐ€ ์—†๋‹ค๋ฉด ๋’ค๋กœ ๊ฐˆ์ˆ˜๋ก ํฐ ์˜คํ•ด๊ฐ€ ์Œ“์ผ ์ˆ˜ ์žˆ๋‹ค

Gradient Descent

  • ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•œ ์ตœ์†Œ๊ฐ’์„ ์ฐพ๊ธฐ์œ„ํ•œ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜

Important Concepts in Optimization

Generalization( ์ผ๋ฐ˜ํ™” )

  • ํ•™์Šต๋œ ๋ชจ๋ธ์€ ๋ณด์ด์ง€ ์•Š๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์–ผ๋งˆ๋‚˜ ์ž˜ ์ˆ˜ํ–‰ํ•  ๊ฒƒ์ธ๊ฐ€?
  • ๋งŽ์€ ๊ฒฝ์šฐ์— ์šฐ๋ฆฌ๋Š” ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ด๋‹ค
  • Training error๊ฐ€ 0์ด ๋˜์—ˆ๋‹คํ•ด์„œ ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ์ตœ์ ํ™” ๊ฐ’์— ๋„๋‹ฌํ–ˆ๋‹ค๊ณ  ๋ณด์žฅํ•  ์ˆ˜ ์—†๋‹ค
  • ์ผ๋ฐ˜์ ์œผ๋กœ Training error๊ฐ€ ์ค„์–ด๋“ค์ง€๋งŒ ์‹œ๊ฐ„์ด ์ง€๋‚ ์ˆ˜๋ก Test error( ํ•™์Šตํ•˜์ง€ ์•Š๋Š” ๋ฐ์ดํ„ฐ )๊ฐ€ ์ฆ๊ฐ€ํ•œ๋‹ค
  • ๋”ฐ๋ผ์„œ Generalization gap์ด ์ฆ๊ฐ€ํ•˜๊ฒŒ ๋œ๋‹ค
    • Test error์™€ Training error ์™€์˜ ์ฐจ
  • ํ•™์Šต๋ฐ์ดํ„ฐ์˜ ์„ฑ๋Šฅ ์ž์ฒด๊ฐ€ ์•ˆ ์ข‹์„ ๋•Œ๋Š” Generalization performance๊ฐ€ ์ข‹๋‹ค๊ณ  ํ•ด์„œ Test performance๊ฐ€ ์ข‹๋‹ค๊ณ  ํ•  ์ˆ˜ ์—†๋‹ค

Underfitting vs Overfitting

Underfitting

  • ํ•™์Šต๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ์ž˜ ๋™์ž‘ํ•˜์ง€ ๋ชปํ•˜๋Š” ๊ฒƒ

Overfitting

  • ํ•™์Šต๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„  ์ž˜ ๋™์ž‘ํ•˜์ง€๋งŒ Test data์— ๋Œ€ํ•ด์„  ์ž˜ ๋™์ž‘ํ•˜์ง€ ์•Š๋Š” ๊ฒƒ

Cross-Validation ( ๊ต์ฐจ ๊ฒ€์ฆ )

  • K for validation์ด๋ผ๊ณ ๋„ ํ•œ๋‹ค
  • train data์™€ validation data๋ฅผ k๊ฐœ๋กœ ๋‚˜๋ˆ ์„œ k-1๊ฐœ๋กœ ํ•™์Šต์„ ์‹œํ‚ค๊ณ  ๋‚˜๋จธ์ง€ 1๊ฐœ๋กœ test๋ฅผ ํ•ด๋ณด๋Š” ๊ฒƒ
  • Cross-validation์„ ํ†ตํ•ด hyperparameter๋ฅผ ์ฐพ๋Š”๋‹ค
  • test validation์„ ํ™œ์šฉํ•ด Cross-validation ํ•˜๊ฑฐ๋‚˜, hyperparameter๋ฅผ ์ฐพ์„ ์ˆ˜ ์—†๋‹ค

Bias and Variance

  • ๋น„์šฉ์„ ์ตœ์†Œํ™” ํ•˜๋ ค ํ•  ๋•Œ bias2bias^2, variance, noise** ์„ ํ†ตํ•ด ์‹œ๋„ํ•ด ๋ณผ ์ˆ˜์žˆ๋‹ค
  • ํ•™์Šต๋ฐ์ดํ„ฐ์— ๋…ธ์ด์ฆˆ๊ฐ€ ๊ปด์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ–ˆ์„ ๋•Œ
    • tradeoff : ๊ฐ’ ํ•˜๋‚˜๊ฐ€ ์ž‘์•„์ง€๋ฉด ๋‹ค๋ฅธ ๊ฐ’ ํ•˜๋‚˜๊ฐ€ ์ปค์ง€๊ฒŒ ๋œ๋‹ค
    • t : target
    • f^\widehat{f} : neural network ์ถœ๋ ฅ ๊ฐ’
  • cost๋ฅผ minimize ํ•˜๋ คํ•˜๋ฉด bias, variance, noise๋ฅผ ์ค„์—ฌ์•ผํ•˜๋Š”๋ฐ ๋™์‹œ์— ์ค„์ผ ์ˆ˜ ์—†๋‹ค
  • bias๋ฅผ ์ค„์ด๋ฉด variance๊ฐ€ ๋†’์•„์ง€๊ณ , variance๋ฅผ ์ค„์ด๋ฉด bias๊ฐ€ ๋†’์•„์งˆ ํ™•๋ฅ ์ด ํฌ๋‹ค

Boostrapping

  • ๋žœ๋ค ์ƒ˜ํ”Œ๋ง์— ์˜์กดํ•˜๋Š” test ๋˜๋Š” metric
  • ํ•™์Šต๋ฐ์ดํ„ฐ๊ฐ€ ๊ณ ์ •๋˜์–ด ์žˆ์„ ๋•Œ ๊ทธ ์•ˆ์—์„œ sub sampling์„ ํ†ตํ•ด ํ•™์Šต๋ฐ์ดํ„ฐ๋ฅผ ์—ฌ๋Ÿฌ๊ฐœ ๋งŒ๋“ ๋‹ค
  • ์ด๋ฅผ ํ†ตํ•ด ์—ฌ๋Ÿฌ ๋ชจ๋ธ, metric์„ ๋งŒ๋“ ๋‹ค

Bagging vs Boosting

Bagging( Bootstrapping aggregating )

  • ์—ฌ๋Ÿฌ ๋ชจ๋ธ๋“ค์ด Bootstrapping์œผ๋กœ ํ•™์Šต๋˜๊ณ  ์žˆ๋‹ค
  • ํ•™์Šต๋ฐ์ดํ„ฐ๋ฅผ ์—ฌ๋Ÿฌ๊ฐœ ๋งŒ๋“ค์–ด ์—ฌ๋Ÿฌ ๋ชจ๋ธ์„ ํ†ตํ•ด ์ถœ๋ ฅ๊ฐ’์˜ ํ‰๊ท ์„ ๋‚ด๋Š” ๊ฒƒ ( ์•™์ƒ๋ธ” )

Boosting

  • ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์–ด๋ ค์šด ํŠน์ • ํ›ˆ๋ จ ์ƒ˜ํ”Œ์— ์ค‘์ ์„ ๋‘”๋‹ค
  • ๊ฐ•ํ•œ ๋ชจ๋ธ( strong model )์€ ๊ฐ ํ•™์Šต์ž๊ฐ€ ์ด์ „ ์•ฝํ•œ ํ•™์Šต์ž( weak learner )์˜ ์‹ค์ˆ˜์—์„œ ๋ฐฐ์šฐ๋Š” ์•ฝํ•œ ํ•™์Šต์ž๋ฅผ ์ˆœ์„œ๋Œ€๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ตฌ์ถ•๋œ๋‹ค
  • ์—ฌ๋Ÿฌ๊ฐœ์˜ weak learner๋ฅผ ํ†ตํ•ด ํ•˜๋‚˜์˜ strong model์„ ๋งŒ๋“ ๋‹ค

Gradient Descent Methods

Stochastic gradient descent

  • ๋‹จ์ผ ์ƒ˜ํ”Œ์—์„œ ๊ณ„์‚ฐ๋œ ๊ฐ€์ค‘์น˜๋กœ ์—…๋ฐ์ดํŠธ

Mini-batch gradient descent

  • subset data๋ฅผ ํ†ตํ•ด ๊ณ„์‚ฐ๋œ ๊ฐ€์ค‘์น˜๋กœ ์—…๋ฐ์ดํŠธ

Batch gradient descent

  • ์ „์ฒด ๋ฐ์ดํ„ฐ์—์„œ ๊ณ„์‚ฐ๋œ ๊ฐ€์ค‘์น˜๋กœ ์—…๋ฐ์ดํŠธ

  • Stochastic gradient descent
  • Momentum
  • Nesterov accelerated gradient
  • Adagrad
  • Adadelta
  • RMSprop
  • Adam

Batch-size Matters


Regularization

  • Earlystopping
  • Parameternormpenalty
  • Dataaugmentation
  • Noiserobustness
  • Labelsmoothing
  • Dropout
  • Batchnormalization
profile
data science!!, data analyst!! ///// hello world

0๊ฐœ์˜ ๋Œ“๊ธ€