[cs231n] Lecture 3: Loss Functions and Optimization

๊ฐ•๋™์—ฐยท2022๋…„ 1์›” 16์ผ
0

[CS231n-2017]

๋ชฉ๋ก ๋ณด๊ธฐ
2/7

๐Ÿ‘จโ€๐Ÿซ ๋ณธ ๋ฆฌ๋ทฐ๋Š” cs231n-2017 ๊ฐ•์˜๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

Syllabus
Youtube Link

๐Ÿ“Œ ์ด๋ฒˆ ์‹œ๊ฐ„์—๋Š” ๊ฐ•์˜ ์ œ๋ชฉ๊ณผ ๊ฐ™์ด Loss Functions, Optimization ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.

What is Loss Functions and Optimization?

๐Ÿ“Œ Loss Function์€ train data score์— ๋Œ€ํ•œ ๋ถˆ๋งŒ์กฑ์„ ์ •๋Ÿ‰ํ™”ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ฐ„๋‹จํ•˜๊ฒŒ ์›๋ž˜ ์ •๋‹ต๊ณผ ์˜ˆ์ธกํ•œ ์ •๋‹ต์˜ ์ฐจ์ด๋ฅผ ์ˆ˜์น˜ํ™” ์‹œํ‚จ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๐Ÿ“Œ Optimization์ด๋ž€ ํšจ๊ณผ์ ์œผ๋กœ Loss Function์„ ์ตœ์†Œํ™”์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ฆ‰, ์˜ค๋ฅ˜๋ฅผ ์ตœ์†Œํ™” ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

Loss Function

๐Ÿ“Œ loss function์€ ํ˜„์žฌ ์‚ฌ์šฉํ•˜๋Š” classifier๊ฐ€ ์–ผ๋งˆ๋‚˜ ์ข‹์€ classifier์ธ์ง€ ๋งํ•ด์ค๋‹ˆ๋‹ค. loss๋ผ๋Š” ์ด๋ฆ„๊ณผ ๊ฐ™์ด loss๋ฅผ ์ตœ์†Œํ•˜๋Š” ๊ฒƒ์ด ์šฐ๋ฆฌ์˜ ๋ชฉํ‘œ์ด์ง€๋งŒ, train loss๊ฐ€ ์•„๋‹Œ test loss๋ฅผ ์ตœ์†Œํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ์ง„ํ–‰ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

** ์œ„์˜ ์‚ฌ์ง„์€ loss fuction์˜ ๊ฐ„๋‹จํ•œ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.(xix_i=image, yiy_i=label, LiL_i=ii์— ๋Œ€ํ•œ loss, NN=๋ฐ์ดํ„ฐ์˜ ์ˆ˜)

Multiclass SVM loss

๐Ÿ“Œ ์œ„์˜ ๊ณ ์–‘์ด, ์ž๋™์ฐจ, ๊ฐœ๊ตฌ๋ฆฌ ์•„๋ž˜์˜ ์ˆซ์ž๋Š” "WxWx" score ์ด๊ณ , ์˜ค๋ฅธ์ชฝ์˜ red box๊ฐ€ SVM loss์˜ ์ˆ˜์‹์ž…๋‹ˆ๋‹ค. ์ €ํฌ ์ด์ œ๋ถ€ํ„ฐ ์ด ์ˆ˜์‹์— ๋Œ€ํ•ด ๋œฏ์–ด๋ณด๊ณ  ์ดํ•ด ํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

๐Ÿ“Œ ํ•˜๋‚˜์”ฉ ์ ‘๊ทผํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. s=f(xi,W)s = f(x_i,W)์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰ s=Wxis = Wx_i๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. sjs_j๋Š” ์ •๋‹ต์ด ์•„๋‹Œ ๊ฐ’๋“ค์ž…๋‹ˆ๋‹ค. ์ฆ‰, ๊ณ ์–‘์ด์˜ ์—ด์—์„œ ๋ณธ๋‹ค๋ฉด car, frog์˜ ๊ฒฐ๊ณผ ๊ฐ’์ž…๋‹ˆ๋‹ค. syjs_{y_j}๋Š” ์ •๋‹ต์ธ class์˜ ์˜ˆ์ธก๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ์ฆ‰, ๊ณ ์–‘์ด์˜ ์—ด์—์„œ ๋ณธ๋‹ค๋ฉด ๊ณ ์–‘์ด๋กœ ์˜ˆ์ธกํ•œ ๊ฒฐ๊ณผ ๊ฐ’์ž…๋‹ˆ๋‹ค. ์œ„์˜ ๊ฒƒ ๋ฐ”ํƒ•์œผ๋กœ ์ˆ˜์‹์„ ํ•ด์„ํ•˜๋ฉด ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค. ์ •๋‹ต์ธ class์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ ๊ฐ’์ด ์ •๋‹ต์ด ์•„๋‹Œ ๊ฒฐ๊ณผ ๊ฐ’ + 1 ๋ณด๋‹ค ํฌ๋‹ค๋ฉด 0์ด๊ณ  ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด ์ •๋‹ต์ธ class์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ ๊ฐ’์ด ์ •๋‹ต์ด ์•„๋‹Œ ๊ฒฐ๊ณผ ๊ฐ’ + 1 ์ฐจ์ด๋ฅผ loss๋กœ ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“Œ ์œ„์˜ ๊ทธ๋ž˜ํ”„๊ฐ€ SVM loss์˜ ๊ทธ๋ž˜ํ”„์ž…๋‹ˆ๋‹ค. ๊ฒฝ์ฒฉ๊ณผ ๋น„์Šทํ•˜๋‹ค๊ณ  ํ•ด "Hinge loss"๋ผ๊ณ ๋„ ๋งํ•ฉ๋‹ˆ๋‹ค. SVM loss๋Š” ๊ฐ’์ด ์ปค์ง€๋ฉด ์ปค์งˆ์ˆ˜๋ก ์ข‹์ง€ ์•Š์€ ๋ชจ๋ธ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ loss ๊ฐ’ 0์— ๊ฐ€๊นŒ์›Œ ์ง„๋‹ค๋ฉด ์ข‹์€ ๋ชจ๋ธ์ด๋ผ๊ณ ๋„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (๋‹จ, test loss์— ๋Œ€ํ•ด์„œ)

๐Ÿ“Œ ์ดํ•ด๋ฅผ ๋•๊ธฐ ์œ„ํ•ด ์œ„์˜ ์˜ˆ์ œ์˜ SVM loss๋ฅผ ๊ตฌํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ณ ์–‘์ด๊ฐ€ ์ •๋‹ต์ธ ์—ด์˜ loss๋ฅผ ๊ตฌํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. Li=max(0,5.1โˆ’3.2+1)+max(0,โˆ’1.7โˆ’3.2+1)=max(0,2.9)+max(0,โˆ’3.9)=2.9+0=2.9L_i = max(0, 5.1-3.2+1) + max(0, -1.7-3.2+1) = max(0, 2.9) + max(0, -3.9) = 2.9 +0 = 2.9์˜ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์™€ ๊ฐ™์€ ํ˜•์‹์œผ๋กœ ๊ฐ๊ฐ์˜ loss๋ฅผ ๊ตฌํ•˜๊ณ  ๋”ํ•ด์„œ ํ‰๊ท ์„ ๊ตฌํ•˜๋ฉด 5.275.27์˜ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ต๋‹ˆ๋‹ค. ์˜ˆ์ œ๋ฅผ ํ•˜๋‚˜์”ฉ ๋”ฐ๋ผํ•˜์‹œ๋ฉด ๊ธˆ๋ฐฉ ์ดํ•ดํ•˜์‹ค ์ˆ˜ ์žˆ์„๊ฒ๋‹ˆ๋‹ค. ์ด์ œ ๋ช‡ ๊ฐ€์ง€ ์งˆ๋ฌธ์„ ํ†ตํ•ด SVM loss์˜ ํŠน์ง•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Q: What happens to loss if car scores change a bit?
๐Ÿ“Œ A: car score์ด ์กฐ๊ธˆ ๋ณ€ํ•ด๋„ loss๋Š” ๊ทธ๋Œ€๋กœ์ผ ๊ฒ๋‹ˆ๋‹ค. car score์™€ ๋‹ค๋ฅธ score์˜ ์ ์ˆ˜ ์ฐจ๊ฐ€ ํฌ๊ธฐ์— ๋ณ€ํ•จ ์—†์„ ๊ฒ๋‹ˆ๋‹ค.
Q: What is the min/max possible loss?
๐Ÿ“Œ A: min = 0, max = โˆž\infty

Q: At initialization W is small so all sโ‰ˆ0s\approx 0 . What is the loss?
๐Ÿ“Œ A: Class Num - 1

Q: What if the sum was over all classes(including j = y_i)
๐Ÿ“Œ A: Loss + 1

Q: What if we user mean instead of sum?
๐Ÿ“Œ A: ๊ฐ’์ด scale๋˜๋Š” ๊ฒƒ๋ฟ ์˜๋ฏธ๋Š” ๋‹ฌ๋ผ์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

Q: What if we used Li=โˆ‘jโ‰ yimax(0,sjโˆ’syi+1)2L_i = \sum_{j \neq y_i}max(0,s_j-s_{y_i}+1)^2 ?
๐Ÿ“Œ A: ์ผ๋ฐ˜์ ์œผ๋กœ "squaerd Hinge loss"๋ผ๊ณ  ๋ถ€๋ฅด๊ณ , ์œ„์˜ ๋ฐฉ๋ฒ•์€ ๋งŒ์•ฝ ๋งค์šฐ ์•ˆ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณฑํ•˜๋ฉด ๋” ๋งŽ์ด ์•ˆ์ข‹์•„์งˆ ๊ฒ๋‹ˆ๋‹ค. ์ฆ‰, ์•ˆ์ข‹์€ ์ชฝ์„ ๋” ๋งŽ์ด ์‹ ๊ฒฝ์“ฐ๊ฒŒ ๋ ๊ฒ๋‹ˆ๋‹ค.

Regularization

๐Ÿ“Œ ๊ธฐ๋ณธ์ ์ธ ๋ชจ๋ธ ํ•™์Šต์€ train data๋กœ ์ด๋ค„์ง‘๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” loss๋ฅผ ์ค„์ผ๋ ค๊ณ  ๋…ธ๋ ฅํ•  ๊ฒƒ์ด๊ณ , loss๋ฅผ ์ตœ์†Œํ™” ํ•œ๋‹ค๋ฉด ์ข‹์€ ๋ชจ๋ธ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์„๊นŒ์š”? ์•„๋‹™๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ๋ชฉ์ ์€ test data loss์˜ ์ตœ์†Œํ™” ์ž…๋‹ˆ๋‹ค. ์ฆ‰, ํ•™์Šต๋˜์ง€ ์•Š๋Š” ๋ฐ์ดํ„ฐ์˜ loss๋ฅผ ์ตœ์†Œํ™” ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ์œ„์˜ train data๋ฅผ ์ตœ์†Œํ™”ํ•˜๋ฉด ์œ„์˜ ํŒŒ๋ž€์ƒ‰ ์„ ๊ณผ ๊ฐ™์ด ๋ชจ๋“  ์›์˜ ์ ์„ ์ •ํ™•ํžˆ ์˜ˆ์ธกํ•˜๋Š” ๊ทธ๋ž˜ํ”„๊ฐ€ ๊ทธ๋ ค์งˆ๊ฒ๋‹ˆ๋‹ค. ํŒŒ๋ž€์ƒ‰ ์„ ์˜ ๊ทธ๋ž˜ํ”„๋Š” train data๋Š” ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ•˜๊ฒ ์ง€๋งŒ, test data๋Š” ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์ด๋Ÿฐ ์ƒํ™ฉ์„ "Overfitting"์ด๋ผ๊ณ  ์ด์•ผ๊ธฐํ•ฉ๋‹ˆ๋‹ค.(์ดˆ๋กœ์ƒ‰ ๋„ค๋ชจ๊ฐ€ test data์ž…๋‹ˆ๋‹ค.)

๐Ÿ“Œ ์šฐ๋ฆฌ์˜ ๋ชจ๋ธ์€ Overfitting ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์šฐ๋ฆฌ์˜ ๋ชจ๋ธ์„ ์กฐ๊ธˆ ๋” ์ผ๋ฐ˜์ ์ธ ๋ชจ๋ธ๋กœ ๋งŒ๋“ค์–ด์•ผํ•ฉ๋‹ˆ๋‹ค. ์œ„์˜ ๋…น์ƒ‰์˜ ์„  ๊ทธ๋ž˜ํ”„์ฒ˜๋Ÿผ ๋งŒ๋“ค์–ด์•ผํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿด๋•Œ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด "Regularization" ์ž…๋‹ˆ๋‹ค. "Regularization"์˜ ์ˆ˜์‹์€ ์œ„์˜ ฮปR(W)\lambda R(W) ์ž…๋‹ˆ๋‹ค. "Regularization"์˜ ์—ญํ™œ์€ ๋ชจ๋ธ์˜ ๋ณต์žกํ•จ์„ ์ œํ•œํ•ฉ๋‹ˆ๋‹ค. ์กฐ๊ธˆ ๋” ๊ตฌ์ฒด์ ์œผ๋กœ ๋ชจ๋ธ์ด training dataset์— ์™„๋ฒฝํ•˜๊ฒŒ fit ํ•˜์ง€ ๋ชปํ•˜๋„๋ก ์ œํ•œํ•˜๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

๐Ÿ“Œ ์œ„์˜ ์‚ฌ์ง„์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด "Regularization"์—๋Š” ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ค๋Š˜์€ L2, L1 "Regularization"์„ ์‚ดํŽด๋ณผ ์˜ˆ์ •์ž…๋‹ˆ๋‹ค. L2๋Š” R(W)=โˆ‘kโˆ‘lWk,l2R(W) = \sum_k\sum_l W^2_{k,l} ์™€ ๊ฐ™์ด ์ œ๊ณฑ์˜ ํ˜•ํƒœ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉฐ, L1์€ R(W)=โˆ‘kโˆ‘lโˆฃWk,lโˆฃR(W) = \sum_k\sum_l |W_{k,l}| ์ ˆ๋Œ€ ๊ฐ’์˜ ํ˜•ํƒœ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. L2์˜ ๊ฒฝ์šฐ ๋ชจ๋“  w์˜ ์š”์†Œ๊ฐ€ ๊ณจ๊ณ ๋ฃจ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ฒŒ ํ•˜๊ณ  ์‹ถ์„ ๋•Œ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. L1์˜ ๊ฒฝ์šฐ sparseํ•œ solution์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. L1์ด "์ข‹์ง€ ์•Š๋‹ค"๋ผ๊ณ  ๋Š๋ผ๊ณ  ์ธก์ •ํ•˜๋Š” ๊ฒƒ์€ "0"์ด ์•„๋‹Œ ์š”์†Œ๋“ค์˜ ์ˆซ์ž์ž…๋‹ˆ๋‹ค. L2์˜ ๊ฒฝ์šฐ์—๋Š” w์˜ ์š”์†Œ๊ฐ€ ์ „์ฒด์ ์œผ๋กœ ํผ์ €์žˆ์„ ๋•Œ "๋œ ๋ณต์žกํ•˜๋‹ค"๋ผ๊ณ  ๋Š๋‚๋‹ˆ๋‹ค.

Softmax Classifier(Multinomial Logistic Regression)

๐Ÿ“Œ SVM loss์˜ ๊ฒฝ์šฐ score์˜ ์˜๋ฏธ๋ฅผ ๋‹ค๋ฃจ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ •๋‹ต ํด๋ž˜์Šค์˜ ๊ฐ’์ด ์ •๋‹ต์˜ ํด๋ž˜์Šค์˜ ๊ฐ’๋ณด๋‹ค ์ผ์ • ์ˆ˜์ค€ ๋†’์œผ๋ฉด loss๋Š” 0์ž…๋‹ˆ๋‹ค. Softmax fucntion์˜ ์žฅ์ ์€ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ“Œ ์šฐ๋ฆฌ๋Š” ํ™•๋ฅ  ๊ฐ’์„ ์›ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ง€์ˆ˜ํ•จ์ˆ˜์˜ ํ˜•ํƒœ๋กœ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. loglog ํ˜•ํƒœ๋Š” ๋‹จ์กฐ ์ฆ๊ฐ€ ํ•จ์ˆ˜์ด์ž ์ตœ๋Œ€ํ™”๊ฐ€ ๊ฐ„ํŽธํ•˜๊ธฐ์— ์น˜ํ™˜ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ์šฐ๋ฆฌ๋Š” loss ์ฆ‰, ์†์‹ค์„ ์ฐพ๊ณ ์žˆ๊ธฐ์— ์Œ์ˆ˜๋ฅผ ๊ณฑํ•ด์ค๋‹ˆ๋‹ค. ์•„๋ž˜ ์‚ฌ์ง„์—์„œ ์˜ˆ์‹œ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ“Œ ์œ„์˜ ์‚ฌ์ง„์„ ๋ณด๋ฉด ์ฒ˜์Œ์—๋Š” ์ง€์ˆ˜ ํ•จ์ˆ˜์˜ ํ˜•ํƒœ๋กœ ๋‹ค์Œ์—๋Š” ํ‘œ์ค€ํ™”๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ณ ์–‘์ด์— ํ•ด๋‹น๋˜๋Š” ๊ฐ’์˜ loss๊ฐ’์„ ์ฐพ์Šต๋‹ˆ๋‹ค.

Q: What is the min/max possible loss_i?

๐Ÿ“Œ A: max = 0, min = โˆž\infty

Q: At initialization W is small so all sโ‰ˆ0s\approx 0 . What is the loss?

๐Ÿ“Œ A: logClogC

Optimization


๐Ÿ“Œ ์šฐ๋ฆฌ๋Š” ์–ด๋–ป๊ฒŒ ๊ฐ€์žฅ ์ข‹์€ W๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ์„๊นŒ์š”??? ๋ฐ”๋กœ ์ด Optimization์„ ํ†ตํ•ด ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Gradient Descent


๐Ÿ“Œ Gradient Descent์„ ์‚ฌ์šฉํ•ด W(๊ฐ€์ค‘์น˜)๋ฅผ ์ตœ์ ํ™” ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Gradient๋Š” ๊ธฐ์šธ๊ธฐ๋ฅผ ๋งํ•˜๋Š” ๊ฒƒ์ด๋ฉฐ, Descent๋Š” ๋‚ด๋ ค๊ฐ„๋‹ค๋Š” ์˜๋ฏธ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

https://blog.clairvoyantsoft.com/the-ascent-of-gradient-descent-23356390836f?gi=1718f0930e5b

๐Ÿ“Œ Gradient Descent๋Š” ๋ฏธ๋ถ„์„ ์‚ฌ์šฉํ•ด ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋ฒกํ„ฐ ํ˜•ํƒœ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๊ธฐ์— ๊ฐ๊ฐ์˜ ํŽธ๋ฏธ๋ถ„์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋ฏธ๋ถ„๊ฐ’๊ณผ learning rate(ํ•™์Šต๋ฅ )์„ ์‚ฌ์šฉํ•ด ์ตœ์ข…์ ์œผ๋กœ ์œ„์˜ ๊ทธ๋ž˜ํ”„์—์„œ Minimun cost๊ฐ€ ๋˜๋Š”์ง€์ ๊นŒ์ง€ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. learning rate(ํ•™์Šต๋ฅ )์€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋กœ์„œ ์‚ฌ์šฉ์ž๊ฐ€ ์ง์ ‘ ์ •ํ•ด์ค˜์•ผํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ์ž…๋‹ˆ๋‹ค. learning rate(ํ•™์Šต๋ฅ )์€ ํ•™์Šต์— ๋งŽ์€ ์˜ํ–ฅ์„ ์ฃผ๊ธฐ์— ์‹ ์ค‘ํ•˜๊ฒŒ ์„ ํƒํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.

Stochastic Gradient Descent

๐Ÿ“Œ Stochastic Gradient Descent์ด๋ž€ ์ง์—ญํ•˜๋ฉด ํ™•๋ฅ ์  ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์ด๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ง ๊ทธ๋ž˜๋„ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ์—” ๋„ˆ๋ฌด ๋น„ํšจ์œจ์ ์ด๊ธฐ์—, ํ™•๋ฅ ์ ์œผ๋กœ mini-batch๋ผ๋Š” ๊ฒƒ์„ ์‚ฌ์šฉํ•ด ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•œ ๋ฒˆ ํ•™์Šต์— mini-batch๋งŒํผ์˜ ํ•™์Šต์„ ํ•˜๋ฉฐ, ์ผ๋ฐ˜์ ์œผ๋กœ 32/64/125..๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
profile
Maybe I will be an AI Engineer?

0๊ฐœ์˜ ๋Œ“๊ธ€