[generation loss] image generation model for character에서 사용한 loss 모음

kiteday·2024년 2월 7일

생성모델

기본부터 시작하는 생성모델

목록 보기

2/3

Cartoon Image Processing: A Survey 페이퍼에서 정리한 캐릭터 생성 시 사용하는 loss들을 다시 정리

GAN에서 사용하는 loss들이고 실제로는 더 많이 있지만 for caractor에 대한 loss들이다.

1. Typical Loss Function

Pixel-level Loss

\mathcal L_1 = \sum_{i,j}^n |y^{(i,j)}-G(x)^{(i,j)}| \qquad (1)

\mathcal L_2 = \sum_{i,j}^n (y^{(i,j)}-G(x)^{(i,j)})^2 \qquad (2)

두 이미지를 element-wise로 측정한 것. L1과 비교해 L2는 더 큰 에러에 민감하고 작은 에러에 너그러움. 따라서 결과가 더 부드럽다. 가장 널리 사용되는 loss function.

$y$ : real sample image
$G(x)$ : generated sample

Total Variation Loss

\mathcal L_{tv} = \sqrt{({G(x)^{(i, j+1)}-G(x)^{(i, j)}}^2)+ ({G(x)^{(i+1, j)}-G(x)^{(i, j)}}^2)} \qquad (3)

generated image에 spatial(공간적) smoothness를 부과한다. salt-pepper와 같은 high-frequency noises를 감소 시켜준다. 주위 픽셀을 비교하여 합을 내어 정의하고, 이미지에 얼마나 노이즈가 있는지 측정한다.

Concept / Perceptual Loss

\mathcal L_{cont} = \sqrt{\sum_{i=1}^n (\phi^{l}(y)-\phi^{l}(G(x))} \qquad (4)

$\phi$ 를 사용한 이미지와 input image의 semantic content 사이의 mean square error

→ 두 이미지 feature map간의 MSE

$\phi$ : pretrained image classification network
$l$ : layer

Adversarial Loss

\mathcal L_{adv} = \mathbb E_{y \sim P_{data}(B)}[\log D(y)] + \mathbb E_{x \sim P_{data}(a)}[\log 1- D(G(x))] \qquad (5)

$L_{cov}$ 의 경우 image space에 explicit한 제한이 없으면 생성된 이미지는 서로 다른 부분에 일관성이 없는 경향이 있으며 일반적으로 작은 line segment를 포함한다.

$y \sim P_{data}(B)$ 와 $x \sim P_{data}(A)$ 는 data distribution
$y$ : real sample
$G(x)$ : generate sample, $G$ 는 domain $B$ 와 동일한 이미지를 만드려고 함
$D$ : $y$ 와 $G(x)$ 사이의 합성된 샘플을 구별하는 것

Cycle consistency Loss

\mathcal L_{cyc} = \mathbb E_{y \sim P_{data}(A)}[ \lVert F(G(x))-x \rVert_1] + \mathbb E_{y \sim P_{data}(B)}[ \lVert G(F(x))-x \rVert_1] \qquad (6)

$L_{adv}$ 는 각 input $x$ 에 대한 output $y$ 를 연결할 함수를 보장할 수 없기 때문에 $L_{cyc}$ 는 input과 output으로부터 solution을 매핑한다. 각 이미지 $x$ 는 domain $A$ 로부터 오고, 이미지는 반복되어(cycle) translation되어 원본 이미지고 된다.

$x \to G(x) \to F(G(x)) \approx x$
$G:A \to B$ and $F:B \to A$

Style Loss

\mathcal L_{style} = \sum_{\lambda} \lVert \sigma(f_{\lambda}(G(x)))- \sigma (f_{\lambda}(x))\rVert_2 + \sum_{\lambda} \lVert \mu(f_{\lambda}(G(x)))- \mu(f_{\lambda}(x))\rVert_2 \qquad (7)

AdaIN이 mean과 style features의 standard deviation만으로 transfer를 할 수 있게 된 이후로 나온 수식

$\sigma(x), \mu(x)$ 는 입력 $x$ 의 channel-wise variance와 mean
$f_\lambda(x)$ : $x$ 에 상응하는 $\lambda$ -th째 layers feature

2. Loss Functions Specially Designed for Cartoon

Surface Loss

Learning to Cartoonize Using White-box Cartoon Representations(2020)

\mathcal L_{surface} = \log D_s(\mathcal F_{dgf}(I_c, I_c) + \log (1-D_s(\mathcal F_{dgf}(G(I_p), G(I_p)))) \qquad (8)

cartoon은 coarse brush에서 rough brush로 smooth surface를 만들어서 cartoon 이미지와 비슷하게 함.

global semantic structure를 유지하면서 이미지를 smooth시키기 위해서 $F_{dgf}$ 를 적용

입력으로 $I$ 를 받으면 스스로 guide map을 가지고 extracted surface representation $\mathcal F_{dgf}(I,I)$ 를 반환함.

$I$ : input
- $I_c$ : input cartoon image
- $I_p$ : input photo image
$\mathcal F_{dgf}(I,I)$ : extracted surface representation, texture와 detail을 제거함
$D$ 는 모델의 output과 cartoon 이미지의 surface가 비슷한가 판단하고 $G$ 가 이미지를 잘 만들 수 있도록 가이드.

Structure Loss

Learning to Cartoonize Using White-box Cartoon Representations(2020)

\mathcal L_{structure} = \lVert VGG_n(F(I_p))-VGG_n(\mathcal F_{st}(G(I_p))) \rVert \qquad (9)

pretrained VGG16으로 high-level features를 추출. 결과와 추출한 representation 사이에 공간적 제약을 가한다.

$F_{st}$ : structure representation extraction
- 비현실적인 만화에서 global content, sparse color blocks, clear boundaries를 모방하는 구조 추출

Texture Loss

Learning to Cartoonize Using White-box Cartoon Representations(2020)

\mathcal F_{rcs}(I_{rgb})=(1-\alpha)(\beta_1*I_r+\beta_2*I_g+\beta_3*I_b)+\alpha*Y \qquad (10)

color와 luminance(반사되는 빛의 양)의 영향은 줄이고 high-quality texture만 가짐

$\mathcal F_{rcs}$ : single-channel texture representation from color image
- random color shift algorithm
$I_{rgb}$ : $I_r$ , $I_g$ , $I_b$ 3개의 color channels
$Y$ : rgb image로부터 변환된 grayscale image
논문에서는 $\alpha = 0.8$ , $\beta_1, \beta_2, \beta_3 \sim U(-1,1)$ 로 설정

L_{texture} = \log D_t(\mathcal F_{rcs}(I_c) + \log (1-D_s(\mathcal F_{rcs}(G(I_p)))) \qquad (11)

$D$ 는 model output과 reference cartoon 이미지 로부터 추출된 표현 사이의 텍스텨 구별
- clear contours to fine textures를 학습해서 generator 가이드

Domain-Adversarial Loss

XGAN(2020)

\mathcal L_{dann} = \mathbb E_{P_{data}(A)^{l}}(A, c_{dann}(e_A(x))) +\mathbb E_{P_{data}(B)^{l}}(B, c_{dann}(e_B(x))) \qquad (12)

같은 subspace로 나눠 domain A, B로부터 임베딩하고 semantic level에서 domain gap을 구함

→ $c_{dann}$ 을 훈련하기 위해

$c_{dann}$ : a binary classifier
- encoder $e_A, e_B$ 는 domain-adversarial classifier의 confuse를 최소화함으로 classification 정확도를 최대화
$l$ : classification loss function

Semantic Consistency Loss

XGAN(2020)

\mathcal L_{sem} = \mathbb E_{x \sim P_{data}(A)}\lVert e_A(x)-e_B(G(x)) \rVert + \mathbb E_{y \sim P_{data}(B)} \lVert e_B(y)-e_A(F(y)) \rVert \qquad (13)

입력 sementic을 domain translation 하는 것
⇒ 예: $x \in \mathcal D_A$ 의 입력 sementic을 다른 도메인 $G(x) \in \mathcal D_B$ 로 translated (혹은 반대로도.)

이 consistency property는 paired data가 없고 sub-optimal 이미지 비교에서 부적절하므로 pixel-loss에서 적용하기 어려움. 대신에 feature-level semantic consistancy loss를 사용한다.
→ domain translation하는 동안에 embedding을 학습하고 네트워크를 보존한다.

∥ · ∥ denotes a distance between vectors

Landmark Consistency Loss

CycleGAN(2019)

\mathcal L_{land} = \rVert R_B (G_{(A,L) →Y (x,l)}) − l \lVert_2 \qquad (14)

$L$ : input landmark heatmap ( $l \in L$ )
$R$ : pretrained U-Net, landmark regressor with 5-channel output for respective domain
- $R_B$ : domain B를 사용

Identity Loss

U-GAT-IT(2019)

\mathcal L_{ide} = \mathbb E_{x \sim P_{data}(A)}\lVert x-F(x) \rVert_1 + \mathbb E_{y \sim P_{data}(B)} \lVert y-G(y) \rVert_1 \qquad (15)

입력 이미지에 대한 color distribution을 보장하기 위함

CAM Loss (Class Activation Map)

CAM(2016), U-GAT-IT(2019)

L_{CAM}^{A \to B} = -(\mathbb E_{x \sim P_{data}(A)}[\log (\eta_A(x))] + \mathbb E_{x \sim P_{data}(B)}[1- \log (\eta_B(y))] ) \qquad (16)

⁍

L_{CAM}^D = -(\mathbb E_{y \sim P_{data}(B)}[(\eta_D(y))^2] + \mathbb E_{x \sim P_{data}(A)}[1-\eta_D G(x))^2] ) \qquad (18)

CNN의 global average pooling사용

$\eta_A, \eta_D$ : auxiliary classifiers
$y \sim P_{data}(B)$ or $x \sim P_{data}(A)$

Attribute Matching Loss

StyleCariGAN(2019)

L_{attr}^{p \to c} = -\mathbb E_{w \sim \mathcal W}[\phi_p(G_p(w)) \log \phi_c (G_{p \to c}(w)) \\+ (1-\phi_p(G_p(w)) \log (1-\phi_c(G_{p \to c}(w)))] ) \qquad (19)

L_{attr}^{c \to p} = -\mathbb E_{w \sim \mathcal W}[\phi_p(G_c(w)) \log \phi_p (G_{c \to p}(w)) \\ + (1-\phi_c(G_c(w)) \log (1-\phi_p(G_{c \to p}(w)))] ) \qquad (20)

L_{attr} = L_{attr}^{p \to c} + L_{attr}^{c \to p} \qquad (21)

photo와 caricatures의 facial attribute classifiers

photo와 caricatures의 binary cross entropy loss로 정의되어 있음

$\phi$ : attribute classfier
$G$ : styleGAN
- $G_{p \to c}$ : p2c-StylrCariGAN
- $G_{c \to p}$ : c2p-StylrCariGAN

Charateristic Loss

CariGAN(2018)

\mathcal L_{cha}^{B}(G) = \mathbb E_{x \sim P_{data(A)}}[1-cos(x-\overline{P_{data(A)}}, G(x) - \overline{P_{data(B)}})]

face와 평균 face의 차이가 독특한 캐리커쳐의 특징을 나타내기 때문에 과장된 후에도 얼굴 특징을 유지시켜야 하는 근본적인 아이디어를 배경으로 제안됨

$\overline{P_{data(A)}}$ : $P_{data(A)}$ 의 평균 (B의 경우에도 마찬가지)
reverse direction $\mathcal L _{cha}^A(F)$ 에 대해서도 비슷한 정의

Smoothness Regularization Loss

AutoToon (2020)

\mathcal L_{reg}=\sum_{i,j\in \hat F}(2- {<\hat F_{i, j-1}, \hat F_{i,j}> \over \lVert \hat F_{i,j-1} \rVert \lVert \hat F_{i,j} \rVert} - {<\hat F_{i-1, j}, \hat F_{i,j}> \over \lVert \hat F_{i-1,j} \rVert \lVert \hat F_{i,j} \rVert})

field를 smooth하게 warping하기 위한 cosine similarity

<> : dot product
$\hat F$ : warping field
- $i, j$ : pixel index

Distance Transform(DT) Loss

APDrawingGAN(2019)

d_{CM}(x_1, x_2) = \sum_{(j,k) \in \Theta_b(x_1)} I_{DT}(x_2)(j,k) + \sum_{(j,k) \in \Theta_w(x_1)} I'_{DT}(x_2)(j,k)

레퍼런스

[1] Zhao, Y., Ren, D., Chen, Y., Jia, W., Wang, R., & Liu, X. (2022). Cartoon Image Processing: A Survey. International Journal of Computer Vision, 130(11), 2733-2769.
[2] XGAN (2020) : Royer, A., Bousmalis, K., Gouws, S., Bertsch, F., Mosseri, I., Cole, F., & Murphy, K. (2020). Xgan: Unsupervised image-to-image translation for many-to-many mappings. In Domain Adaptation for Visual Understanding (pp. 33-49). Cham: Springer International Publishing.
[3] CycleGAN (2019) : Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223-2232).
[4] U-GAT-IT(2019) : Kim, J., Kim, M., Kang, H., & Lee, K. H. (2019, September). U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation. In International Conference on Learning Representations.
[5] CAM(2016) : Wang, C., Xiao, J., Han, Y., Yang, Q., Song, S., & Huang, G. (2021). CAM-loss: Towards Learning Spatially Discriminative Feature Representations. arXiv preprint arXiv:2109.01359.
[6] StyleCariGAN (2019) : Jang, W., Ju, G., Jung, Y., Yang, J., Tong, X., & Lee, S. (2021). StyleCariGAN: caricature generation via StyleGAN feature map modulation. ACM Transactions on Graphics (TOG), 40(4), 1-16.
[7] CariGAN (2018) : Li, W., Xiong, W., Liao, H., Huo, J., Gao, Y., & Luo, J. (2020). CariGAN: Caricature generation through weakly paired adversarial learning. Neural Networks, 132, 66-74.
[8] AutoToon (2020) : Gong, J., Hold-Geoffroy, Y., & Lu, J. (2020). Autotoon: Automatic geometric warping for face cartoon generation. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 360-369).
[9] APDrawingGAN (2019) : Yi, R., Liu, Y. J., Lai, Y. K., & Rosin, P. L. (2019). Apdrawinggan: Generating artistic portrait drawings from face photos with hierarchical gans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10743-10752).

레퍼런스는 편의상 위에서 언급한 모델명을 같이 하이라이트 함

kiteday

공부

이전 포스트

[style transfer] 1. 기본적인 style transfer 개념과 고전 모델

다음 포스트

[generation loss] image generation model for character에서 사용한 loss 모음

기본부터 시작하는 생성모델

1. Typical Loss Function

Pixel-level Loss

Total Variation Loss

Concept / Perceptual Loss

Adversarial Loss

Cycle consistency Loss

Style Loss

2. Loss Functions Specially Designed for Cartoon

Surface Loss

Structure Loss

Texture Loss

Domain-Adversarial Loss

Semantic Consistency Loss

Landmark Consistency Loss

Identity Loss

CAM Loss (Class Activation Map)

Attribute Matching Loss

Charateristic Loss

Smoothness Regularization Loss

Distance Transform(DT) Loss

레퍼런스

[style transfer] 1. 기본적인 style transfer 개념과 고전 모델

DDPM equation

0개의 댓글