๐Ÿ““ StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

์ •๋˜์น˜ยท2022๋…„ 9์›” 7์ผ
0

๋…ผ๋ฌธ๋ฆฌ๋ทฐ

๋ชฉ๋ก ๋ณด๊ธฐ
3/5
post-thumbnail

0. Main Idea


StyleGAN์˜ Encoding ๊ณผ์ •์— Clip Loss๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ํ…์ŠคํŠธ๋กœ Style์„ ๋ฐ”๊พธ๋Š” ์ž‘์—…์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค.

Latent Vector w๋กœ ๋ถ€ํ„ฐ ๋งŒ๋“ค์–ด์ง„ ์ด๋ฏธ์ง€์— semantic embedding ๊ฐ’์ด ๋ฏธ๋ฆฌ ์„ค์ •ํ•œ ํŠน์ •ํ•œ ํ…์ŠคํŠธ์™€ ์œ ์‚ฌํ•ด์งˆ ์ˆ˜ ์žˆ๋Š” ๋ฐฉํ–ฅ์œผ๋กœ Latent vector๋ฅผ ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

1. Methods


1.1 Latent Optimization

argminDCLIP(G(w),t)+ฮปL2โˆฃโˆฃwโˆ’wsโˆฃโˆฃ2+ฮปIDLID(w)argmin D_{CLIP}(G(w), t) + \lambda _{L2} ||w - w_s||_2 + \lambda _{ID} L_{ID}(w)
  • G : StyleGAN์˜ generator
  • D_{CLIP} : G(w)์™€ text t์˜ embedding vector์˜ cosine distance ๊ณ„์‚ฐ
  • L2 distance์™€ identity loss๋Š” w vector๋ฅผ w source vector์™€ ์œ ์‚ฌํ•˜๊ฒŒ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ embedding
LID(w)=1โˆ’<R(G(ws)),R(G(w))>L_{ID}(w)=1-<R(G(w_s)), R(G(w))>
  • L_{ID} : Identity loss๋ฅผ ์˜๋ฏธ
  • R์€ ์‚ฌ์ „ํ•™์Šต ๋œ arcface๋ฅผ ์‚ฌ์šฉํ•ด์„œ w source์™€ w์˜ cosine similirity๋ฅผ ๊ตฌํ•˜๊ณ  ์ด๋ฅผ optimizaion

์–ด๋–ค ์‚ฌ๋žŒ์˜ attribute์„ ์กฐ์ ˆํ•  ๋•Œ๋Š” lambda ID์˜ ๊ฐ’์— ์ผ์ •์น˜ ์ฃผ๊ณ , Identity ๋ณ€๊ฒฝ ์‹œ lambda ID๊ฐ’์„ ๋‚ฎ์€ ๊ฐ’์œผ๋กœ ์„ค์ •

1.2 Latent Mapper

ํŠน์ • text prompt์— ๋Œ€ํ•œ mapping network ํ•™์Šต

Manipulation์˜ type๊ณผ ์„ธ๋ถ€์ ์œผ๋กœ ์–ด๋–ป๊ฒŒ ๋ณ€ํ™”ํ•˜๊ณ  ์‹ถ์€์ง€์— ๋”ฐ๋ผ ์„ธ๊ฐ€์ง€ ๋งคํ•‘ ๋„คํŠธ์›Œํฌ coarse, middle, fine style๋กœ ๊ตฌ๋ถ„


LOSS

LCLIP(w)=DCLIP(G(w+Mt(w)),t)L_{CLIP}(w)=D_{CLIP}(G(w + M_t(w)), t)

CLIP loss๋Š” mapper๋กœ ํ•˜์—ฌ๊ธˆ CLIP latent space์„œ cosine distance๋ฅผ ์ตœ์†Œํ™”ํ•˜๋„๋ก ํ•จ

L(W)=LCLIP+ฮปL2โˆฃโˆฃMt(w)โˆฃโˆฃ2+ฮปIDLID(w)L(W) = L_{CLIP} + \lambda _{L2} ||M_t(w)||_2 + \lambda _{ID} L_{ID}(w)

text prompt๋ฅผ ํ†ตํ•ด identity๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ํ•œ ๊ฐ€์ง€ ์ด์ƒ์˜ ์†์„ฑ์„ ํ•œ๋ฒˆ์— ๋ณ€๊ฒฝ ๊ฐ€๋Šฅ

1.3 Global Directions

global latent space์— text๋ฅผ ๋งคํ•‘ํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ์–ด๋–ค ์ž…๋ ฅ์ด ๋“ค์–ด์™€๋„ ์‚ฌ์šฉ๊ฐ€๋Šฅ

์›ํ•˜๋Š” ํŠน์„ฑ์„ ์ง€์‹œํ•˜๋Š” text prompt๊ฐ€ ์ฃผ์–ด์ง€๋ฉด ์›ํ•˜๋Š” ์กฐ์ž‘ ๋ฐฉํ–ฅ ฮ”s๊ฐ€ ๋‹ค๋ฅธ ํŠน์„ฑ์„ ํ•ด์น˜์ง€ ์•Š์œผ๋ฉฐ ์›ํ•˜๋Š” ์ด๋ฏธ์ง€ ์ƒ์„ฑ

ฮฑ๋Š” ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ๋ณ€ํ™”๋ฅผ ์ค„์ง€์— ๊ด€ํ•œ ๊ฒƒ(manipulation strength)์ด๊ณ  ฮฒ๋Š” ์ฑ„๋„ ๋ณ„ threshold๋กœ, ฮฒ๊ฐ€ ๋†’์œผ๋ฉด ํŠน์ •ํ•œ ์ฑ„๋„ ์Šคํƒ€์ผ๋งŒ ๋ณ€๊ฒฝ๋˜๊ณ  ๋‚ฎ์œผ๋ฉด ์ถ”๊ฐ€์ ์ธ ํŠน์ง•๋„ ๊ฐ™์ด ๋ณ€๊ฒฝ

2. Comparisions and Evaluation


2.1. Comparisions and Evaluation

Latent mapper ๋ฐ global direction ๋ฐฉ๋ฒ•์„ TediGAN๊ณผ ๋น„๊ต

๋ณต์žกํ•œ ์†์„ฑ์œผ๋กœ๋Š” trump, ๋œ ๋ณต์žกํ•˜๊ณ  ๋œ ๊ตฌ์ฒด์ ์ธ ์†์„ฑ์œผ๋กœ๋Š” ๋ชจํžˆ์นธ ํ—ค์–ด, ๋” ๋‹จ์ˆœํ•˜๊ณ  ์ผ๋ฐ˜์ ์ธ ์†์„ฑ์€ โ€˜์ฃผ๋ฆ„์ด ์—†๋Š”โ€™๊ฒƒ์œผ๋กœ ์„ค์ •ํ•˜์—ฌ ์‹คํ—˜ ์ง„ํ–‰

์‹คํ—˜ ๊ฒฐ๊ณผ, latent mapper๋Š” ๋ณต์žกํ•œ ์†์„ฑ์— ์ ํ•ฉํ•˜๊ณ , global direction์€ ๋” ๊ฐ„๋‹จํ•˜๊ฑฐ๋‚˜ ๋” ์ผ๋ฐ˜์ ์ธ ์†์„ฑ์— ์ ํ•ฉ

2.2. Limitation


StyleGAN์˜ generator์™€ ์‚ฌ์ „ํ•™์Šต ๋œ ๋ชจ๋ธ์— ์˜์กด์ ์ธ ๋ชจ์Šต์„ ๋ณด์ž„ โ†’ ์‚ฌ์ „ํ•™์Šต ๋œ generator์˜ ๋„๋ฉ”์ธ์— ๋ฒ—์–ด๋‚œ ๊ฒƒ๊นŒ์ง€ ์ด๋ฏธ์ง€๋ฅผ control ํ•  ์ˆ˜ ์—†์Œ

0๊ฐœ์˜ ๋Œ“๊ธ€

๊ด€๋ จ ์ฑ„์šฉ ์ •๋ณด

Powered by GraphCDN, the GraphQL CDN