[Deep Learning๐Ÿ‘ฝ] Loss ์ •๋ฆฌ 2๏ธโƒฃ : VGG loss

ํ˜œ๋นˆยท2021๋…„ 2์›” 28์ผ
1

Hyebbb Deep Learning study

๋ชฉ๋ก ๋ณด๊ธฐ
2/2
post-thumbnail

โœ… VGG Loss (SRGAN Content Loss)

VGG Loss๋Š” SRGAN( [CVPR2017] photo-realistic single image super-resolution using a generative adversarial network )์—์„œ ์ฒ˜์Œ ๋“ฑ์žฅํ•œ Loss์ด๋‹ค.

VGG loss๋ฅผ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด SRGAN์—์„œ ์‚ฌ์šฉํ•˜๋Š” loss function ์ „์ฒด๋ฅผ ํ™•์ธํ•ด๋ณด์ž.

SRGAN์—์„œ ์‚ฌ์šฉํ•˜๋Š” loss function์€ ๋…ผ๋ฌธ ๋‚ด์—์„œ "Perceptual loss function"์ด๋ผ๊ณ  ์ •์˜ํ•˜๋Š”๋ฐ ํ˜•ํƒœ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  1. content loss
  2. adversarial loss

์ด๋ ‡๊ฒŒ ๋‘๊ฐ€์ง€์˜ ํ•ฉ์œผ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๊ณ , content loss์—์„œ ๊ธฐ์กด์˜ super resolution model์—์„œ์˜ MSE loss๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  ์ƒˆ๋กœ์šด 'VGG Loss'๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค๋Š” ์ ์ด ์ด ๋…ผ๋ฌธ์—์„œ ์ฃผ๋ชฉํ• ๋งŒํ•œ ์ ์ด๋‹ค. adversarial loss๋ถ€๋ถ„์€ ์•ž ํฌ์ŠคํŒ…์—์„œ ์ •๋ฆฌํ•œ GAN loss์˜ ํ˜•ํƒœ์™€ ๋˜‘๊ฐ™์œผ๋ฏ€๋กœ ๋„˜์–ด๊ฐ€๊ณ , content loss๋ถ€๋ถ„๋งŒ ์‚ดํŽด๋ณด๊ฒ ๋‹ค.

MSE loss์™€ ๋น„๊ต

content loss๋ถ€๋ถ„์„ MSE loss๊ฐ€ ์•„๋‹Œ VGG loss๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค๊ณ  ํ–ˆ๋Š”๋ฐ,
๋จผ์ € pixel-wise MSE loss๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹์œผ๋กœ ์ •์˜๋œ๋‹ค.

MSE loss๋Š” ์‹์—์„œ๋„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋“ฏ์ด high resolution ์ด๋ฏธ์ง€์™€ low resolution ์ด๋ฏธ์ง€๋ฅผ ํ™”์งˆ ๊ฐœ์„ ํ•˜์—ฌ ๋งŒ๋“ค์–ด๋‚ธ ์ด๋ฏธ์ง€์˜ ์ฐจ์ด๋ฅผ ์ œ๊ณฑํ•˜์—ฌ 'ํ‰๊ท '์„ ๋‚ธ๋‹ค. ๊ฒฐ๊ตญ์€ pixel๋‹จ์œ„๋กœ ํ‰๊ท ์„ ๋‚ด๋Š” loss์ด๋ฏ€๋กœ ๊ณผํ•˜๊ฒŒ smoothing๋˜๋Š” ๋‹จ์ ์ด ์กด์žฌํ•œ๋‹ค.

์ด ๋•Œ, PSNR์˜ ์‹์˜ ๋ถ„๋ชจ์—๋Š” MSE๊ฐ’์ด ํฌํ•จ๋˜๊ธฐ ๋•Œ๋ฌธ์—, PSNR๊ณผ MSE๋Š” ์—ญ์ˆ˜๊ด€๊ณ„์— ์žˆ์–ด MSE ๊ฐ’์ด ๋‚ฎ์•„์งˆ์ˆ˜๋ก PSNR์ด ๋†’์•„์ง€๋ฏ€๋กœ PSNR์„ ๋†’์ด๊ธฐ ์œ„ํ•ด MSE ๊ฐ’์„ ๊ณ„์† ์ž‘๊ฒŒ ๋งŒ๋“ค๊ฒŒ ๋  ๊ฒƒ์ด๊ณ ,
๊ฒฐ๊ตญ ๋„ˆ๋ฌด ๋งŽ์ด smoothing๋˜์–ด high frequency๋ถ€๋ถ„(ex. edge๋ถ€๋ถ„)์€ ๋‚ ๋ผ๊ฐ€ texture์ด ์ž˜ ํ‘œํ˜„๋˜์ง€ ์•Š๊ฒŒ ๋œ๋‹ค.

์ฐธ๊ณ ๋กœ, PSNR์˜ ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

VGG loss

์ด๋Ÿฌํ•œ MSE loss์˜ ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด 'VGG Loss'๋ฅผ ์ƒˆ๋กœ ์ •์˜ํ•˜๋Š”๋ฐ VGG Loss๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹์œผ๋กœ ์ •์˜๋œ๋‹ค.

perceptual similarity์— ๋” ๊ฐ€๊นŒ์šด loss๋กœ, ์ด loss๋Š” pre-train๋œ 19 layer VGG network์˜ ReLU ํ™œ์„ฑํ™” ๊ณ„์ธต์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ •์˜ํ•œ๋‹ค.

pre-trained๋œ VGG net์„ ์ด์šฉํ•ด์„œ feature map(VGG19 network ์† i๋ฒˆ์งธ maxpooling layer ์ „ activation ํ›„ j๋ฒˆ์งธ convolution์— ์˜ํ•ด ์–ป์–ด์ง„ feature map)์—์„œ์˜ ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค. reconstructed image์™€ reference image(high resolution image)์˜ feature map์‚ฌ์ด ๊ฑฐ๋ฆฌ๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

๋”ฐ๋ผ์„œ pixel ๊ฐ๊ฐ์˜ ๊ฐ’์ด ์•„๋‹Œ 'perceptual similarity'์— ์ง‘์ค‘ํ•˜์˜€๊ธฐ ๋•Œ๋ฌธ์— ์ข€ ๋” detailํ•œ ๋ถ€๋ถ„์„ ์ž˜ ์žก์•„๋‚ผ ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค.

(์ด vgg loss์ด ์ •๋ง perceptual similarity๋ฅผ ์ž˜ ์žก์•„๋‚ด๋Š”์ง€, ๊ด€๋ จ ์งˆ๋ฌธ๋“ค์— ๋Œ€ํ•œ ๋‹ต์„ ์ค€ ๋…ผ๋ฌธ์ด ๋ฐ”๋กœ ์ €๋ฒˆ ํฌ์ŠคํŠธ์— ๋ฆฌ๋ทฐํ–ˆ๋˜ ๋…ผ๋ฌธ์ด๋‹ค)
=> [CVPR 2018] The unreasonable effectiveness of deep features as a perceptual metric


์ด VGG loss๋Š” ๋ฌด์–ธ๊ฐ€๋ฅผ ํ• ๋•Œ๋งˆ๋‹ค ๊ผญ ๋“ฑ์žฅํ•˜๋Š” loss์ธ ๊ฒƒ ๊ฐ™๋‹ค..์ด loss๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์œผ๋ฉด edge ๋ถ€๋ถ„์ด ์ „ํ˜€ ์‚ด์•„๋‚˜์ง€ ์•Š๋‹ค๋Š” ๊ฑธ ์–ผ๋งˆ ์ „ ์ด๋ฏธ์ง€ ์ƒ์„ฑํ•˜๋Š” task๋ฅผ ํ•˜๋‹ค๊ฐ€๋„ ๋˜ ํ•œ๋ฒˆ ๋Š๊ผˆ์—ˆ๋‹ค..!

๋‹ค์Œ ๊ธ€๋กœ๋Š” Wasserstein GAN์—์„œ ์‚ฌ์šฉํ•œ WGAN loss๋ฅผ ์ •๋ฆฌํ•ด์•ผ๊ฒ ๋‹น,,,

๋`(>๏น<)โ€ฒ

profile
(โยดโ—ก`โ)

0๊ฐœ์˜ ๋Œ“๊ธ€