Optimization

Royยท2022๋…„ 10์›” 6์ผ
0

DL

๋ชฉ๋ก ๋ณด๊ธฐ
1/2

๐Ÿ’ก์ด ๊ธ€์˜ ๋ชฉ์ 

๊ณต๋ถ€ํ•˜๋ฉด์„œ ๋งˆ์ฃผํ•œ ๋‚ด์šฉ๋“ค์„ ์š”์•ฝ ๋ฐ ์ •๋ฆฌ๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ์‚ฌ๋žŒ๋“ค์—๊ฒŒ ๊ณต์œ ํ•˜๊ณ  ๊ณต์œ ๋ฅผ ํ•˜๋ฉด์„œ ์˜ค๊ฐœ๋… ํ˜น์€ ๊ฐœ๋…์ •๋ฆฌ๋ฅผ ํ•˜๊ธฐ ์œ„ํ•จ์ž…๋‹ˆ๋‹ค. ๋งŽ์ด ํ”ผ๋“œ๋ฐฑํ•ด ์ฃผ์‹œ๊ณ  ๋น„ํŒํ•ด ์ฃผ์‹ญ์‹œ์˜ค.

๐Ÿ’กOverview

๐Ÿ”ซ Gradient Descent : ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•
๋ฏธ๋ถ„๊ฐ’์ด ๊ทน์†Œ๊ฐ€ ๋˜๋Š” ๋ฐฉํ–ฅ์„ ์ฐพ์•„ ๊ทน์†Œ๊ฐ’์„ ์ฐพ๋Š”๋‹ค.์—ฌ๊ธฐ์„œ ๋ฏธ๋ถ„์˜ ๋Œ€์ƒ์ด ๋˜๋Š” ๊ฒƒ์€ 'W'์ฆ‰ Neural Network์˜ ํŒŒ๋ผ๋ฏธํ„ฐ์ด๊ณ  Loss(์‹ค์ œ๊ฐ’-์˜ˆ์ธก๊ฐ’)์—์„œ ํŽธ๋ฏธ๋ถ„์„ ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ˆ˜์‹๊ณผ ๊ฐ™์ด t+1์‹œ์ ์œผ๋กœ ์ง„ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด -๋ฅผ ๋ถ™์—ฌ ์—ฐ์‚ฐ์„ ํ•˜๊ฒŒ ๋˜๋Š”๋ฐ ์ด๋Š” ๊ทน์†Œ๊ฐ’์— ๋„๋‹ฌํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ž…๋‹ˆ๋‹ค. (๋„ตํƒ€๋Š” Learning_Rate์„ ๋œปํ•ฉ๋‹ˆ๋‹ค.)
๐Ÿ“Œ Optimization : ๊ธฐ์ดˆ์›๋ฆฌ๋กœ Gradient Descent๋ฅผ ๋”ฐ๋ผ๊ฐ‘๋‹ˆ๋‹ค. ํ•ด๋‹น ๋ถ€๋ถ„์ด ๋ฐœ์ „๋˜๋ฉด์„œ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋ก ์ด ์ถ”๊ฐ€๋œ๋‹ค๋Š” ๊ฒƒ์œผ๋กœ ๋งฅ๋ฝ์„ ์žก์•„๊ฐ€๋ฉด ์ข‹์Šต๋‹ˆ๋‹ค.

Key Concepts

๐Ÿ’กOptimization์˜ ๋ฐฉ๋ฒ•๋ก ์„ ํ™•์žฅํ•˜๊ณ ์ž ํ•œ๋‹ค๋ฉด ์–ด๋–ค ์ ๋“ค์„ ๊ณ ๋ คํ•ด์•ผํ• ๊นŒ? Optimization์˜ key Concepts์„ ๋ณด๋ฉฐ ์ดํ•ดํ•ด๋ณด์ž.

1. Generalization Gap

๐Ÿ“Œ Unseen data(test data)์˜ ์„ฑ๋Šฅ๊ณผ train data์˜ ์„ฑ๋Šฅ์˜ ์ฐจ์ด๋ฅผ ๋งํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ์ฐจ์ด๊ฐ€ ํด์ˆ˜๋ก ๊ณผ์ ํ•ฉ(Over-fitting)์ด ์šฐ๋ ค ๋˜๊ธฐ์— Early Stopping์„ ์‚ฌ์šฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. -> (Early Stopping์€ ๊ฐœ๋…๋ณด๋‹ค ๊ตฌํ˜„์ด ๋” ์ค‘์š”ํ•œ ๊ฒƒ ๊ฐ™์•„ ์ถ”ํ›„ ํฌ์ŠคํŒ… ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. 22๋…„ 10์›” 5์ผ)

2. Under-fitting, Over-fitting

๐Ÿ“Œfitting์€ ๋ชจ๋ธ์ด ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ชจ๋ธ์„ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.
ํ•˜์ง€๋งŒ ๋ชจ๋ธ ํ•™์Šต์˜ ๋ชฉํ‘œ๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ์—์„œ๋งŒ ์ž˜ ์ ์šฉ๋˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ๋ฒ”์šฉ์ ์œผ๋กœ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋งŒ๋“ค์–ด ์‹ถ์–ด ํ•œ๋‹ค.
๋”ฐ๋ผ์„œ ํ•™์Šต ๋ชฉํ‘œ๋ฅผ ์ด๋ฃจ๊ธฐ ์œ„ํ•ด 2๊ฐ€์ง€ ๊ฐœ๋…์„ ์ž˜ ์ดํ•ดํ•ด์•ผํ•œ๋‹ค.

  1. Under-fitting : Loss๊ฐ€ ํฌ๊ณ  epoch๋ฅผ ๋Š˜๋ ค๋„ loss๊ฐ€ ํฐ ์ƒํ™ฉ์—์„œ ์ˆ˜๋ ดํ•˜๋Š” ๊ฒฝ์šฐ
    ๐Ÿ’Š ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๊ฑฐ๋‚˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ๋ณ€๊ฒฝ
  2. Over-fitting : train_loss๋Š” ์ ์œผ๋‚˜ Validation_loss๋Š” ํฐ ์ƒํ™ฉ
    ๐Ÿ’Š ๊ต์ฐจ ๊ฒ€์ฆ ๋˜๋Š” earlystopping

3. Bias-variance trade off

๐Ÿ“Œ Bias :ํŽธํ–ฅ, Variance : ํผ์ ธ์žˆ๋Š” ์ •๋„

๐Ÿ’ก Cost = bias + variance + noise๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— bias๊ฐ€ ๊ฐ์†Œํ•˜๋ฉด variance๊ฐ€ ์ด์— ๋งž์ถฐ ์ฆ๊ฐ€ํ•˜๊ณ  ๋ฐ˜๋Œ€๋กœ variance๊ฐ€ ๊ฐ์†Œํ•˜๋ฉด bias๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” trad_off๊ฐ€ ์žˆ๋‹ค.

4. Bagging and boosting

๐Ÿ’ก์•™์ƒ๋ธ”(Ensemble) ๊ธฐ๋ฒ•์˜ ์ผ์ข…, ์•™์ƒ๋ธ”์ด๋ž€? : ์—ฌ๋Ÿฌ๊ฐœ์˜ ์˜ˆ์ธก ๋ชจํ˜•์„ ๊ฒฐํ•ฉํ•˜์—ฌ '์ง‘๋‹จ์ง€์„ฑ'์˜ ํšจ๊ณผ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.
(cf.์ด์‚ฐํ˜• ๋ฐ์ดํ„ฐ : Voting, ์—ฐ์†ํ˜• ๋ฐ์ดํ„ฐ : AVG์„ ๋Œ€ํ‘œ์ ์ธ ๊ฒฐํ•ฉ๋ฐฉ์‹์œผ๋กœ ์„ ํƒํ•œ๋‹ค.)
๋‚ด์šฉ์ด ๋งŽ๊ณ  ์ง๊ด€์ ์ธ ์ด์œ ๋ฅผ ์œ„ํ•ด ์ด ๋˜ํ•œ ๋”ฐ๋กœ ํฌ์ŠคํŒ…์„ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. (10์›” 5์ผ ๊ธฐ์ค€)
๐Ÿ“Œ Bagging : random subsampling์œผ๋กœ ์—ฌ๋Ÿฌ ๋ชจ๋ธ์„ ํ•˜์œ„ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๋‚˜๋ˆ  ๋ณ‘๋ ฌ ์ž‘์—… ๋’ค ํ•ฉ์น˜๋Š” ๊ณผ์ •
๐Ÿ“Œ boosting : weak learner๋ฅผ ๋ถ„ํ•  ํ›„ ์ง๋ ฌ ๋ฐฉ์‹์œผ๋กœ ์˜ˆ์ธกํ•œ ๊ฒฐ๊ณผ๋ฌผ๋กœ ๋‹ค์‹œ ์˜ˆ์ธกํ•จ.

์ถœ์ฒ˜ : https://ybeaning.tistory.com/17

5.Batch-size Matters

๐Ÿ“Œ Flat minimizer&sharp minimizer

On Large-batch Training for Deep Learning: Generalization Gap and Sharp Minima์—์„œ๋Š” ์•ž์„œ ์„ค๋ช…ํ•œ Large- Batch-size์— ๋Œ€ํ•ด ๋ฌธ์ œ๋ฅผ ์ œ๊ธฐํ–ˆ์Šต๋‹ˆ๋‹ค.
Deep Learning์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฌถ์Œ๋‹จ์œ„(Batch)๋กœ ํ•™์Šตํ•ด grad๊ฐ’์„ ๊ฐฑ์‹ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” Weight๊ฐ’์„ ๊ฐฑ์‹ ํ•ด์ฃผ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ฌถ์Œ๋‹จ์œ„(Batch)๊ฐ€ ํฌ๋‹ค๋Š” ๊ฒƒ์€ grad์˜ ๊ฐ’์ด ํฌ๋‹ค๋Š” ๊ฒƒ์ด๊ณ  ์ด๋Š” ์ฆ‰ Step ๊ทน์†Ÿ๊ฐ’๋„๋‹ฌ์— ๋“œ๋Š” step์ด ํฌ๋‹ค๋Š” ๊ฒƒ์„ ๋œปํ•ฉ๋‹ˆ๋‹ค.(Slope๊ฐ€ ์ปค์ง„๋‹ค.) ์ด๋ฅผ Sharp minimizer๋ผ๊ณ  ํ•จ.

SB : small-Batch_size, LB : Large_Batch_size

์—๋Ÿฌ๊ฐ€ ๋™์ผํ•œ ์ƒํ™ฉ์—์„œ ๊ฐ๊ฐ์˜ ํ‘œ๋ณธ ๊ณต๊ฐ„์„ ํ‘œ์‹œํ•œ ๊ฒƒ์ธ๋ฐ LB์—์„œ ํ‘œ๋ณธ๊ณต๊ฐ„์˜ ํฌ๊ธฐ๊ฐ€ ์—„์ฒญ๋‚œ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๊ณ  ์˜ˆ์ธก๊ฐ’์— ๋Œ€ํ•œ ๋ณ€๋™์ด ํฌ๋‹ค๋Š” ๊ฒƒ์„ ์œ ์ถ”ํ•  ์ˆ˜ ์žˆ๋‹ค.

Optimization์˜ ๋ฐœ์ „์€ ๋‹ค์Œ ํŽ˜์ด์ง€์—์„œ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹นใ…Žใ…Ž

profile
๋ฐ˜๊ฐ‘์Šต๋‹ˆ๋‹ค. ์ข‹์€ ์ •๋ณด ๊ณต์œ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

0๊ฐœ์˜ ๋Œ“๊ธ€