[DL] Generative vs Discriminate Model

๋ฏธ๋‚จ๋กœ๊ทธยท2022๋…„ 3์›” 9์ผ
0

Reference

๐Ÿ’ป ๋”ฅ๋Ÿฌ๋‹์˜ ๊นŠ์ด ์žˆ๋Š” ์ดํ•ด๋ฅผ ์œ„ํ•œ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ฐ•์˜ 1-2
๐Ÿ”— Discriminative vs. Generative model
๐Ÿ”— Generative model๊ณผ Discriminate model ์ฐจ์ด์ ๊ณผ ๋น„๊ต
๐Ÿ”— discriminative vs generative

์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ์ •๋ฆฌํ•  ๊ฐœ๋…์€ ์ฒซ ๋ฒˆ์งธ ๊ฐ•์˜์˜ 1-2 part์™€ ์•„๋ž˜ ์„ธ ๊ฐ€์ง€ ๋ธ”๋กœ๊ทธ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ์ •๋ฆฌํ–ˆ์Œ์„ ๋ฐํž™๋‹ˆ๋‹ค.

๋˜ํ•œ, ํ•ด๋‹น ํฌ์ŠคํŒ…์— ์‚ฌ์šฉ๋œ ์ด๋ฏธ์ง€๋Š” ์ฒซ ๋ฒˆ์งธ reference์˜ ๊ฐ•์˜ ์ž๋ฃŒ(pdf)์ž„์„ ๋ฐํž™๋‹ˆ๋‹ค.


Pattern Recognition์—์„œ classification์— ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ์€ 2๊ฐ€์ง€๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค.

Generative model๊ณผ Discriminate model์ธ๋ฐ์š”.

์ด ๋‘ ๊ฐ€์ง€๋ฅผ ๊ฐ๊ฐ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

๊ตฌ๊ธ€์˜ Background: What is a Generative Model? ์˜ ๊ธ€์„ ์‚ดํŽด๋ณด๋ฉด, ๋‘ ๊ฐ€์ง€ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ •์˜๊ฐ€ ๋‚˜์™€ ์žˆ์Šต๋‹ˆ๋‹ค.

  • Generative models can generate new data instances.
  • Discriminative models discriminate between different kinds of data instances.

์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ์ธ์Šคํ„ด์Šค๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ๊ณผ ์„œ๋กœ ๋‹ค๋ฅธ ์ข…๋ฅ˜์˜ ๋ฐ์ดํ„ฐ ์ธ์Šคํ„ด์Šค๋ฅผ ๊ตฌ๋ณ„ ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ด ์ฐจ์ด์ ์ž…๋‹ˆ๋‹ค.


generative model

generative model์ด๋ž€ sample์„ ํ™•๋ฅ ์ ์œผ๋กœ ํ‘œํ˜„ํ•˜์—ฌ ์ผ๋ฐ˜ํ™”ํ•œ ๋ชจ๋ธ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ XX๊ฐ€ ์ƒ์„ฑ๋˜๋Š” ๊ณผ์ •์„ ๋‘ ๊ฐœ์˜ ํ™•๋ฅ ๋ชจํ˜•

p(Y)p(Y)
p(XโˆฃY)p(X|Y)

์œผ๋กœ ์ •์˜ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ

bayes rule์„ ์‚ฌ์šฉํ•ด p(XโˆฃY)p(X|Y)๋ฅผ ๊ฐ„์ ‘์ ์œผ๋กœ ๋„์ถœํ•˜๋Š” ๋ชจ๋ธ์„ ๊ฐ€๋ฆฌํ‚ต๋‹ˆ๋‹ค.

๋” ์‰ฝ๊ฒŒ ๋ง๋กœ ์„ค๋ช…ํ•˜๋ฉด, ์ฃผ์–ด์ง„ training data๋ฅผ ํ•™์Šตํ•˜์—ฌ ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋Š” ์œ ์‚ฌํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

ํ•™์Šต ๋ฐ์ดํ„ฐ์™€ ์œ ์‚ฌํ•œ sample์„ ์ฐพ๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ด๊ธฐ ๋•Œ๋ฌธ์— ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ํŒŒ์•…ํ•˜์—ฌ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋ชฉ์ ์ด ๋ฉ๋‹ˆ๋‹ค.

generative model์˜ ๋ฌธ์ œ์ ์€ ๊ธฐ์กด์— ์‚ฌ์šฉ๋œ sample๋“ค๊ณผ์˜ ๊ฑฐ๋ฆฌ ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜๋ฏ€๋กœ ์ƒˆ๋กœ์šด feature๊ฐ€ ์ž…๋ ฅ๋˜๋ฉด ๊ฒฐ๊ณผ๊ฐ’์ด ๋‚˜์˜ค์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ 0๊ณผ 1์ฒ˜๋Ÿผ ๋ช…ํ™”ํ•œ ๊ฒฐ๋ก ์„ ๋„์ถœํ•˜๊ธฐ์— ์ ํ•ฉํ•˜์ง€ ์•Š์€ ๋ชจ๋ธ์ด๋ผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

generative model์„ parametric model๊ณผ non-parametric model๋กœ ๋ทด๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ์š”.

parametric model์€ ์–ด๋–ค ๊ฑฐ๋ฆฌ ๊ฐ’์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ’์„ ๋„์ถœํ•˜๋ฉฐ

non-parametric model์€ ๊ฑฐ์˜ ๋ชจ๋“  ์ƒ˜ํ”Œ๋“ค์„ ํ™œ์šฉํ•˜๋Š” ๋ชจ๋ธ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

generative model์€ class์˜ distribution์— ์ฃผ๋ชฉํ•˜์—ฌ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ด๊ธฐ ๋•Œ๋ฌธ์—

์ผ๋ฐ˜์ ์ธ bayesian inference๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ training data๊ฐ€ ๋งŽ์„ ์ˆ˜๋ก discriminative model๊ณผ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์œผ๋กœ ์ˆ˜๋ ดํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์Šต๋‹ˆ๋‹ค.


discriminative model

discriminative model์ด๋ž€ ์ผ๋ฐ˜์ ์ด๊ณ  ์ง๊ด€์ ์ธ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด๋ผ ์ƒ๊ฐํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ XX๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ๋ ˆ์ด๋ธ” YY๊ฐ€ ๋‚˜ํƒ€๋‚  ์กฐ๊ฑด๋ถ€ํ™•๋ฅ  p(YโˆฃX)p(Y|X)๋ฅผ ์ง์ ‘์ ์œผ๋กœ ๋ฐ˜ํ™˜ํ•˜๋Š” ๋ชจ๋ธ์„ ๊ฐ€๋ฆฌํ‚ค๋Š”๋ฐ์š”.

lable ์ •๋ณด๊ฐ€ ์žˆ์–ด์•ผ ํ•ด์„œ ์ง€๋„ํ•™์Šต์— ์†ํ•˜๋ฉฐ, X(sample)X(sample)์˜ lable์„ ์ž˜ ๊ตฌ๋ถ„ํ•˜๋Š” decision boundary๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

๋Œ€ํ‘œ์ ์ธ ์˜ˆ์‹œ๋กœ linear regression๊ณผ logistic regression์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๋ฏธ์ง€๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ๋‹ค์‹œ ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

discriminative model๋Š” ํ•™์Šต์— ์กด์žฌํ•˜๋Š” data๋ฅผ ํ™œ์šฉํ•ด decision boundary๋ฅผ ๋„์ถœํ•ฉ๋‹ˆ๋‹ค.

๋‘ ๊ฐ€์ง€ ์ข…๋ฅ˜์˜ sample์ด ์กด์žฌํ•  ๋•Œ ๋‘ ๊ฐ€์ง€ sample์„ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋Š” ์„ ์„ ์ฐพ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋‹จ์ˆœํžˆ ๋‘ ๊ฐœ์˜ ์ •ํ™•ํ•œ ์„ ์œผ๋กœ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๋Š” ์„ ์„ ์ฐพ์•„๋‚˜๊ฐ€๋Š” ๊ณผ์ •

discriminative model์˜ ์žฅ์ ๊ณผ ๋‹จ์ ์€

  • ์žฅ์ : ์™„์ „ํžˆ ๋‹ค๋ฅธ sample์ด ๋“ค์–ด์˜ค๋”๋ผ๋„ 0๊ณผ 1 ํ˜•ํƒœ์˜ ๊ฒฐ๋ก ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
  • ๋‹จ์ : ํ•ด๋‹น sample์ด ์ฒ˜์Œ ๋ณด๋Š” sample์ธ์ง€ ์•Œ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

discriminative model์€ ๋ช‡ ๊ฐ€์ง€ ํŠน์ง•์ ์ธ pattern์„ ์ฐพ์•„ ์ฐจ์ด๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

๊ทธ ์˜ˆ์‹œ๋กœ least square์™€ support vector machine์ด ์žˆ๋Š”๋ฐ์š”.

least square์˜ ๊ฒฝ์šฐ ๋ชจ๋“  sample๋“ค ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ€์šด๋ฐ ์„ ์„ ์ฐพ๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.

sample๋“ค ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ์„ ์„ ์ฐพ์ง€๋งŒ, ์ž˜๋ชป๋œ sample์ด ์กด์žฌํ•  ๊ฒฝ์šฐ ๊ฒฐ๊ณผ์— ์ž˜๋ชป๋œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฐ ํŠน์ดํ•œ sample์„ ์ œ๊ฑฐํ•˜๊ฑฐ๋‚˜ ๋ฌด์‹œํ•˜์—ฌ decision boundary๋ฅผ ์ฐพ๋Š” ๋ฐฉ๋ฒ•๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

support vector machine์€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์ด ์žˆ๋Š” sample 2๊ฐœ๋ฅผ ์ง‘์–ด์„œ ๊ทธ ๋‘ ๊ฐœ์˜ ๊ฐ€์šด๋ฐ๋ฅผ ๊ฐ€๋กœ์ง€๋ฅด๋Š” decision boundary๋ฅผ ๋„์ถœํ•ฉ๋‹ˆ๋‹ค.

์ด๋ฅผ maximum margin์ด๋ผ๊ณ  ํ•œ๋‹ค๋„ค์š”.

์ƒ๋Œ€์ ์œผ๋กœ ์ •ํ™•ํ•œ ๊ฒฐ๊ณผ ๊ฐ’์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.


์œ„ ๊ทธ๋ฆผ์€ ์†์œผ๋กœ ์“ด ์ˆซ์ž์˜ generative and discriminative model์ž…๋‹ˆ๋‹ค.

์™ผ์ชฝ์˜ discriminative model์€ ์†๊ธ€์”จ๊ฐ€ 0์ธ์ง€ 1์ธ์ง€ ์ฐจ์ด๋ฅผ ๋งํ•˜๋ ค๊ณ  ํ•˜๋ฉฐ,

line์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋””์— ๋ฐฐ์น˜๋˜๋Š”์ง€ ๋ชจ๋ธ๋ง ํ•  ํ•„์š”์—†์ด 0๊ณผ 1์„ ๊ตฌ๋ณ„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

generative model์˜ ๊ฒฝ์šฐ, ๋ฐ์ดํ„ฐ ๊ณต๊ฐ„์—์„œ ์‹ค์ œ์™€ ๊ฐ€๊นŒ์šด ์ˆซ์ž๋ฅผ ์ƒ์„ฑํ•ด์„œ 1๊ณผ 0์„ ์ƒ์„ฑํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ ๋ฐ์ดํ„ฐ ๊ณต๊ฐ„ ์ „์ฒด์˜ ๋ถ„ํฌ๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ณผ์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๋‘ model์„ ๋น„๊ตํ•˜๋Š” ์ด๋ฏธ์ง€๋Š” ๋” ๋งŽ์Šต๋‹ˆ๋‹ค.

generative model์€ posterior๋ฅผ ๊ฐ„์ ‘์ ์œผ๋กœ, discriminative model์€ ์ง์ ‘์ ์œผ๋กœ ๋„์ถœํ•ฉ๋‹ˆ๋‹ค.

generative model์€ ๋ฐ์ดํ„ฐ ๋ฒ”์ฃผ์˜ ๋ถ„ํฌ๋ฅผ ํ•™์Šตํ•˜๋ฉฐ, discriminative model์€ decision boundary๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

profile
๋ฏธ๋‚จ์ด ๊ท€์—ฝ์ฃ 

0๊ฐœ์˜ ๋Œ“๊ธ€