[Paper Review] Transferring Inductive Bias Through Knowledge Distillation - (1/3)

์„œ์ฟ ยท2021๋…„ 8์›” 15์ผ
1

Inductive-Bias-Series

๋ชฉ๋ก ๋ณด๊ธฐ
2/4
post-thumbnail

์•ˆ๋…•ํ•˜์„ธ์š” :) ์˜ค๋Š˜ ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŒ…์œผ๋กœ ๋‹ค๋ค„๋ณผ ๋‚ด์šฉ์€ ์–ผ๋งˆ ์ „์— ํฅ๋ฏธ๋กญ๊ฒŒ ์ฝ์–ด๋ณด์•˜๋˜ "Transferring Inductive Bias Through Knowledge Distillation"์ด๋ผ๋Š” ๋…ผ๋ฌธ์ธ๋ฐ์š”! ํ•ด๋‹น ๋…ผ๋ฌธ์€ Knowledge Distillation์„ ์ด์šฉํ•˜์—ฌ ๊ณผ์—ฐ Inductive Bias๋ฅผ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์„ ๊นŒ๋ฅผ ๋‹ค๋ฃฌ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. ์•„์‰ฝ๊ฒŒ๋„ ์ด๋ฒˆ ICLR2021์—์„œ Accept๋˜์ง„ ๋ชปํ–ˆ์ง€๋งŒ ๋‹ค์–‘ํ•œ ์‹คํ—˜๊ณผ Knowledge Distillation์„ ์ด์šฉํ•˜์—ฌ Inductive Bias๋ฅผ Student๋ชจ๋ธ์—๊ฒŒ ์ „๋‹ฌํ•˜๋ ค๊ณ  ์‹œ๋„ํ•œ ์ฒซ ๋…ผ๋ฌธ์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ทธ๋งŒํผ ํฅ๋ฏธ๋กญ๊ฒŒ ์ฝ์€ ๋…ผ๋ฌธ์ธ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋ณธ๊ฒฉ์ ์ธ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํ•˜๊ธฐ ์ „์— ์ด๋ฒˆ ์žฅ์—์„œ๋Š” ์ค‘์š”ํ•œ ๊ฐœ๋…์ธ Knowledge Distillation๊ณผ Inductive Bias์— ๋Œ€ํ•˜์—ฌ ์ด์•ผ๊ธฐํ•˜๊ณ  ๋‹ค์Œ ์žฅ์—์„œ๋Š” ๋…ผ๋ฌธ์—์„œ ์ง„ํ–‰ํ•œ ๋‹ค์–‘ํ•œ ์‹คํ—˜๋“ค์— ๋Œ€ํ•ด ๋‹ค๋ฃจ์–ด ๋ณด๊ณ ์žํ•ฉ๋‹ˆ๋‹ค.

What is Knowledge Distillation

KD Diagram

Image from "Knowledge Distillation: A Survey (2020)"

Knowledge Distillation(KD)๋Š” Teacher ๋ชจ๋ธ์—์„œ Student ๋ชจ๋ธ๋กœ ์ง€์‹์„ ์ด์ „ํ•˜๋Š” ๊ณผ์ •์„ ๋งํ•˜๋ฉฐ, ์—ฌ๊ธฐ์„œ Teacher ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ(Logit ๊ฐ’)์ด Student ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. Knowledge Distillation(KD)๋Š” ๋ชจ๋ธ ๊ฒฝ๋Ÿ‰ํ™”(์••์ถ•)์— ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ฐ€์žฅ ์ž˜ ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.

Knowledge Distillation์€ ํฌ๊ฒŒ ๋‹ค์Œ๊ณผ ๊ฐ™์ด 2๊ฐ€์ง€ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ์ด ๋˜๋Š”๋ฐ์š”.
(1) Pre-train Teacher Model : ์„ ์ƒ(Teacher)๊ฐ€ ๋˜๋Š” ๋ชจ๋ธ์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
์Šคํƒญ1
(2) Train Student Model : ํ•™์ƒ(Student)์ด ๋˜๋Š” ๋ชจ๋ธ์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ Student ๋ชจ๋ธ์€ ์„ ์ƒ ๋ชจ๋ธ๋กœ ๋ถ€ํ„ฐ ๋ฐ˜ํ™˜ํ•˜๊ฒŒ ๋˜๋Š” ์†Œํ”„ํŠธ๋งฅ์Šค ๊ฐ’(๋กœ์ง“)๊ณผ ์œ ์‚ฌํ•œ ๊ฐ’์„ ๊ฐ–๋„๋ก ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์†Œํ”„ํŠธ ๋ผ๋ฒจ์ด๋ผ๊ณ  ํ•˜๋ฉฐ, ๋‹จ์ˆœํ•œ ๋กœ์ง“์„ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ์ง€๋งŒ ์ข€ ๋” ๋ถ„ํฌ๋ฅผ ์™„๋งŒํ•˜๊ฒŒ ํ•ด์ฃผ๊ธฐ ์œ„ํ•ด ์˜จ๋„(Temperature, T)๋ผ๋Š” ๊ฐœ๋…์„ ๋„์ž…ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
์Šคํƒญ2

์ „๋ฐ˜์ ์ธ ํ”„๋กœ์„ธ์Šค๋Š” ์•„๋ž˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ์‹์€ Knowledge Distillation(KD)์˜ Loss Function์ž…๋‹ˆ๋‹ค. ์‹์˜ ์•ž ๋ถ€๋ถ„์€ ์‹ค์ œ ํ•™์ƒ ๋ชจ๋ธ์ด ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๋ผ๋ฒจ ๊ฐ’์„ ์ž˜ ์˜ˆ์ธกํ•˜๋Š” ๊ฐ€์— ๋Œ€ํ•œ Loss์ด๋ฉฐ, ๋’ท ๋ถ€๋ถ„์€ ํ•™์ƒ ๋ชจ๋ธ์ด ์„ ์ƒ ๋ชจ๋ธ๊ณผ ์–ผ๋งˆ๋‚˜ ์œ ์‚ฌํ•˜๊ฒŒ ํ•™์Šต๋˜๋Š” ๊ฐ€์— ๋Œ€ํ•œ Loss์ž…๋‹ˆ๋‹ค.
์ „๋ฐ˜ ์Šคํƒญ

Knowledge Distillation(KD)์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ถ„๋ฅ˜๊ฐ€ ๋  ์ˆ˜ ์žˆ๋Š” ๋ฐ์š”. ๊ฐ๊ฐ์˜ ๋ถ„๋ฅ˜๋“ค์€ ์„œ๋กœ ๋…๋ฆฝ์ ์ธ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ๊ฐ ์ฃผ์ œ์— ๋Œ€ํ•œ ๋ถ„๋ฅ˜๋กœ ์ดํ•ดํ•˜์‹œ๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
๋ถ„๋ฅ˜ํ‘œ

Image from "Knowledge Distillation: A Survey (2020)"

์œ„ ๊ทธ๋ฆผ์—์„œ Knowledge์™€ Distillation์„ ๊ธฐ์ค€์œผ๋กœ ๋ถ„๋ฅ˜๋œ ํ•ญ๋ชฉ๋“ค์ด Knowledge Distillation(KD)์˜ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ด ๋˜๋Š” ๊ฐœ๋…์ด๊ธฐ ๋•Œ๋ฌธ์— ํ•ด๋‹น ๋ถ€๋ถ„์— ๋Œ€ํ•ด์„œ ์„ค๋ช…๋“œ๋ฆฌ๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. :)

Knowledge1

๊ฐ€์žฅ ๋จผ์ € ์ „๋‹ฌํ•˜๋Š” Knowledge๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ถ„๋ฅ˜๋ฅผ ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ์ด 3๊ฐ€์ง€๋กœ ๋ถ„๋ฅ˜๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

Knowledge2

(1) Relation-Based Knowledge๋Š” ์„ ์ƒ ๋ชจ๋ธ์˜ input, layer, output๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ํ•™์ƒ ๋ชจ๋ธ์ด ํ•™์Šตํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด Graph ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ์ด๋Ÿฌํ•œ ๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.
(2) Response-Based Knowledge๋Š” ์„ ์ƒ ๋ชจ๋ธ์˜ output(response) ์ •๋ณด๋ฅผ ํ•™์ƒ ๋ชจ๋ธ์ด ํ•™์Šตํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋ถ„๋ฅ˜ ๋ชจ๋ธ์˜ Logit์„ ํ•™์Šตํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.
(3) Feature-Based Knowledge๋Š” ๋„คํŠธ์›Œํฌ ์ค‘๊ฐ„์˜ layer(hint) ์ •๋ณด๋ฅผ ํ•™์ƒ ๋ชจ๋ธ์ด ํ•™์Šตํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์ด๋ฏธ์ง€์˜ ํŠน์„ฑ์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•œ ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ๋ชจ๋ธ ์ค‘๊ฐ„์˜ Feature Map์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

Distillation1

๋‹ค์Œ์œผ๋กœ ์–ด๋–ป๊ฒŒ Knowledge๋ฅผ ์ „๋‹ฌํ•˜๋Š”๊ฐ€์ธ Distillation์„ ๊ธฐ์ค€์œผ๋กœ ๋ถ„๋ฅ˜๋ฅผ ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ์ด 3๊ฐ€์ง€๋กœ ๋ถ„๋ฅ˜๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

Distillation2

(1) Offline Distillation : Pretrained Teacher๋ฅผ ๋ฏธ๋ฆฌ ๋งŒ๋“  ํ›„ Knowledge๋ฅผ ์ „๋‹ฌํ•จ
(2) Online Distillation : Teacher์™€ Student๋ฅผ ๋™์‹œ์— ํ•™์Šต๋˜๋ฉฐ ์„œ๋กœ Knowledge๋ฅผ ์ „๋‹ฌํ•จ
(3) Self-Distillation : ํ•˜๋‚˜์˜ ๋ชจ๋ธ ๋‚ด๋ถ€์—์„œ Knowledge๋ฅผ ์ „๋‹ฌํ•จ

What is Inductive Bias

Inductive Bias๋ž€, ๋ฐ์ดํ„ฐ์™€ ๋ฌด๊ด€ํ•˜๊ฒŒ ์ผ๋ฐ˜ํ™” ๋™์ž‘์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํŠน์„ฑ์œผ๋กœ, ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํŠน์ • ์†”๋ฃจ์…˜๊นŒ์ง€ ์ˆ˜๋ ดํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค๋‹ˆ๋‹ค. ์ ๋‹นํ•œ Inductive Bias๋Š”, ์šฐ๋ฆฌ๊ฐ€ ์ œํ•œ๋œ ๋ฐ์ดํ„ฐ๋‚˜ ์ปดํ“จํŒ… ํŒŒ์›Œ๋ฅผ ๊ฐ€์ง€๊ณ  ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ฑฐ๋‚˜, ํ•™์Šต์— ์‚ฌ์šฉ๋œ Train ๋ฐ์ดํ„ฐ๊ฐ€ Test ๋ฐ์ดํ„ฐ๋ฅผ ์™„๋ฒฝํ•˜๊ฒŒ ๋Œ€ํ‘œ(perfectly representative)ํ•˜์ง€ ๋ชปํ•  ๋•Œ ์ค‘์š”ํ•˜๊ฒŒ ์ž‘์šฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋งŒ์•ฝ, Inductive Bias๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด, ๋ชจ๋ธ์€ local minima์— ๋น ์งˆ ๊ฐ€๋Šฅ์„ฑ์ด ์กด์žฌํ•˜๋ฉฐ, ๋ชจ๋ธ์˜ initial state์™€ ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ์ˆœ์„œ์— ๋”ฐ๋ผ์„œ๋„ ์ˆ˜๋ ด ๊ฐ’์ด ๋ฐ”๋€” ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํ•ด์˜ ์ˆ˜๋ ด

Image from "Transferring Inductive Biases through Knowledge Distillation (2020)

์ผ๋ฐ˜์ ์œผ๋กœ Inductive Bias๋ฅผ ๋ชจ๋ธ์— ์ฃผ์ž…์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์€ 4๊ฐ€์ง€๋กœ ๊ฐ๊ฐ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค.
(1) Choose Appropriate Architecture : ์ ์ ˆํ•œ ๋ชจ๋ธ ๊ตฌ์กฐ์˜ ์ •์˜๋ฅผ ํ†ตํ•ด
(2) Choose Appropriate Objective Function : ์ ์ ˆํ•œ ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด
(3) Choose Appropriate Curriculum Method : ์ ์ ˆํ•œ ์ปค๋ฆฌํ˜๋Ÿผ์„ ํ†ตํ•ด
(4) Choose Appropriate Optimization Method : ์ ์ ˆํ•œ ์ตœ์ ํ™”๋ฅผ ํ†ตํ•ด

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์—ฌ๊ธฐ์„œ ํ•œ๊ฐ€์ง€๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ Knowledge Distillation(KD)๋ฅผ ํ†ตํ•ด์„œ๋„ Inductive Bias๋ฅผ ๋ชจ๋ธ์— ์ฃผ์ž…์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ด์•ผ๊ธฐํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก 

Inductive Bias์— ๋Œ€ํ•ด ๋” ๊ถ๊ธˆํ•˜์‹  ๋ถ„๋“ค์€ ์ œ ์ด์ „ ํฌ์ŠคํŠธ๋ฅผ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋…ผ๋ฌธ ๊ฐœ์š”

์•„์ด๋””์–ด

๋ณธ ๋…ผ๋ฌธ์€ Knowledge Distillation(KD)๊ฐ€ ์ผ๋ฐ˜์ ์ธ ์žฅ์ ์ธ ๋ชจ๋ธ ๊ฒฝ๋Ÿ‰ํ™” ์ด์™ธ์—๋„ ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค์„ ํ˜ผ์šฉํ•ด์„œ ์“ธ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์„ ์ด์•ผ๊ธฐํ•˜๋ฉฐ, ์ด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์„ ์ƒ ๋ชจ๋ธ์˜ Inductive Bias๋ฅผ ํ•™์ƒ์—๊ฒŒ ์ „์ˆ˜ํ•  ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ ์ด์•ผ๊ธฐํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ์€ ๋…ผ๋ฌธ์—์„œ ๋งํ•˜๋Š” ๋ณธ ์—ฐ๊ตฌ์˜ ๋ชฉ์ ์ž…๋‹ˆ๋‹ค. Knowledge Distillation(KD)๊ฐ€ ์ฒ˜์Œ ์†Œ๊ฐœ๋œ ๋…ผ๋ฌธ์—์„œ ์ด์ œ ์„ ์ƒ ๋ชจ๋ธ์ด ํ•™์ƒ์—๊ฒŒ ์ „๋‹ฌํ•˜๋Š” ์ง€์‹์„ dark knowledge๋ผ๊ณ  ์นญํ•˜๋Š”๋ฐ, ์ด๋Ÿฌํ•œ dark knowledge์— Inductive Bias๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์ง€ ์•Š์„๊นŒ๋ผ๋Š” ์˜๋ฌธ์ ์„ ์ œ๊ธฐํ•ฉ๋‹ˆ๋‹ค.
Purpose of the research

์ด๋ฅผ ์ฆ๋ช…ํ•˜๊ธฐ ์œ„ํ•ด ์ €์ž๋“ค์€ ์•„๋ž˜์™€ ๊ฐ™์€ ๋‘๊ฐ€์ง€ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ๊ฐ€์ง€๊ณ  ์‹คํ—˜์„ ์ „๊ฐœํ•ฉ๋‹ˆ๋‹ค. ๋จผ์ € ์ฒซ ๋ฒˆ์งธ ์‹œ๋‚˜๋ฆฌ์˜ค๋Š” RNNs(์„ ์ƒ ๋ชจ๋ธ)๊ณผ Transformers(ํ•™์ƒ ๋ชจ๋ธ), ๊ทธ๋ฆฌ๊ณ  ๋‘ ๋ฒˆ์งธ ์‹œ๋‚˜๋ฆฌ์˜ค๋Š” CNNs(์„ ์ƒ ๋ชจ๋ธ)๊ณผ MLPs(ํ•™์ƒ ๋ชจ๋ธ)์œผ๋กœ ์‹คํ—˜์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ์‹คํ—˜์€ (1) ์ •๋ง ์„ ์ƒ ๋ชจ๋ธ๋“ค์ด ๊ฐ€์ง€๊ณ  ์žˆ๋Š” Inductive Bias๊ฐ€ ์–ผ๋งˆ๋‚˜ ์œ ์˜๋ฏธํ•œ๊ฐ€๋ฅผ ๋ณด์—ฌ์ฃผ๊ฐ€, (2) ์„ ์ƒ ๋ชจ๋ธ์—๊ฒŒ ์ง€์‹์€ ์ „์ˆ˜ ๋ฐ›์€ ํ•™์ƒ ๋ชจ๋ธ์ด ์ •๋ง ์„ ์ƒ ๋ชจ๋ธ๊ณผ ์œ ์‚ฌํ•œ ํ•™์Šต์˜ ๊ฒฐ๊ณผ๋ฌผ์„ ๋ณด์—ฌ์ฃผ๋Š” ๊ฐ€๋ฅผ ๋ชฉ์ ์œผ๋กœ ์•ž์— ์†Œ๊ฐœํ•œ ๋‘ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ๋ฐฐ๊ฒฝ์œผ๋กœ ์‹คํ—˜์„ ์ „๊ฐœํ•˜์˜€์Šต๋‹ˆ๋‹ค.
์‹œ๋‚˜๋ฆฌ์˜ค

๊ฐ๊ฐ์˜ ์‹œ๋‚˜๋ฆฌ์˜ค๋“ค์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์„ค๋ช…ํ•˜๋Š” ๊ธ€๋กœ ์กฐ๋งŒ๊ฐ„ ์ฐพ์•„์˜ค๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
๊ธด ๊ธ€ ์ฝ์–ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค! :)

profile
Always be passionate โœจ

2๊ฐœ์˜ ๋Œ“๊ธ€

comment-user-thumbnail
2021๋…„ 8์›” 16์ผ

Knowledge Distillation๋ผ๋Š” ๊ฐœ๋… ์ž์ฒด๋ฅผ ์ฒ˜์Œ ์ ‘ํ•ด์„œ Teacher ๋ชจ๋ธ, Student ๋ชจ๋ธ ๊ฐ™์€ ํ‚ค์›Œ๋“œ๋“ค์ด ์ƒ์†Œํ•˜๊ฒŒ ๋Š๊ปด์ง€๋Š”๊ตฐ์š” ๊ทธ๋ž˜๋„ Inductive bias๋ผ๋Š” ํ‚ค์›Œ๋“œ๋ฅผ ํ•˜๋‚˜ ์ ‘ํ•˜๊ณ  ๊ฐ€๋Š”๊ตฐ์š” ์œ ์ตํ•œ ๊ธ€ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค^^

1๊ฐœ์˜ ๋‹ต๊ธ€