๐Ÿ˜ ArcFace: Additive Angular Margin Loss for Deep Face Recognition

ukkikkiaiยท2024๋…„ 3์›” 26์ผ

Euron ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

๋ชฉ๋ก ๋ณด๊ธฐ
2/13

+) ๋ฆฌ๋ทฐ ๋…ผ๋ฌธ ์„ธ์…˜ ๋“ค์œผ๋ฉฐ

  • Softmax์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด triplet loss = '๊ฑฐ๋ฆฌ'๋กœ, arc loss = '๊ฐ'

Triplet Loss

  • Anchor, Positive๋Š” ๊ฐ€๊น๊ฒŒ, Negative๋Š” ์ตœ๋Œ€ํ•œ ๋ฉ€๋ฆฌ
    CNN์œผ๋กœ๋ถ€ํ„ฐ embedding์„ ์ถ”์ถœ์„ ํ•œ ํ›„์—, Triplet loss๋ฅผ ์ ์šฉ

ABSTRACT

Face Recognition์„ ์œ„ํ•ด Softmax loss function ๋Œ€์‹ , ๊น”๋”ํ•œ ๊ธฐํ•˜ํ•™์  ํ•ด์„๋ ฅ๊ณผ ๋ชจ๋ธ์˜ ๋ถ„๋ณ„๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” Additive Angular Margin Loss(ArcFace) ์ œ์•ˆํ•จ. ๋˜ํ•œ, ๊ฐ ํด๋ž˜์Šค๊ฐ€ K ๊ฐœ์˜ โ€˜sub-centerโ€™์™€ ํ•™์Šต ์ƒ˜ํ”Œ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์œผ๋ฉฐ, ๋Œ€๋ถ€๋ถ„ ๊นจ๋—ํ•œ(ํŒ๋ณ„ํ•˜๊ธฐ ์‰ฌ์šด) ์–ผ๊ตด๋“ค๋กœ ์ด๋ฃจ์–ด์ง„ ํ•˜๋‚˜์˜ ์ง€๋ฐฐ์ ์ธ sub-class์™€, nosiyํ•œ ์–ผ๊ตด๋“ค๋กœ ์ด๋ฃจ์–ด์ง„ sub-class๋ฅผ ๊ถŒ์žฅํ•˜๋Š” sub-center ArcFace๋„ ์ œ์•ˆํ•จ. ์ด ๋ฐฉ๋ฒ•์œผ๋กœ Generator๋‚˜ Discriminator ์—†์ด ๊ตฌ๋ณ„์ ์ธ ํŠน์ง• embedding๊ณผ ์–ผ๊ตด ์ƒ์„ฑ๋ ฅ์„ ๊ฐ•ํ™”ํ•  ์ˆ˜ ์žˆ์Œ์„ ์‹คํ—˜์ ์œผ๋กœ ์ฆ๋ช…ํ•จ.

1. INTRODUCTION

๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•: DCNN๊ณผ softmax-loss ๋˜๋Š” triplet loss based ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ Face Recognition์„ ์ง„ํ–‰ํ•จ

๊ทธ๋Ÿฌ๋‚˜, ํ•ด๋‹น ๋ฐฉ๋ฒ•์˜ ํ•œ๊ณ„:

1) Closed set classification์—์„œ ์„ฑ๋Šฅ์€ ๋›ฐ์–ด๋‚˜์ง€๋งŒ, Open set์—์„œ๋Š” ๊ตฌ๋ณ„๋ ฅ์ด ๋–จ์–ด์ง.

2) Matrix and Iteration size: N๊ฐœ์˜ identity์— ๋Œ€ํ•ด์„œ softmax loss๋Š” ์ฐจ์›์ด ์„ ํ˜•์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๊ณ , triplet loss์— ๋Œ€ํ•ด์„œ๋Š” iteration step์ด ํญ๋ฐœ์ ์œผ๋กœ ๋Š˜์–ด๋‚จ.

โ‡’ DCNN feature์˜ ๋งˆ์ง€๋ง‰ fully connected layer๋ฅผ cosine distance๋กœ ์‚ฌ์šฉ: arc cosine์œผ๋กœ ํ˜„์žฌ feature์™€ target center์™€์˜ ๊ฐ margin์„ ๊ณ„์‚ฐํ•จ.

: ์ง์ ‘์ ์œผ๋กœ geodestic distance์˜ margin์„ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ์Œ.

3) Noise ์˜ํ–ฅ: noise๊ฐ€ ๋‹ค์ˆ˜ ํฌํ•จ๋œ ํ€„๋ฆฌํ‹ฐ ๋‚ฎ์€ ์‚ฌ์ง„์ด training ๊ณผ์ •์—์„œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์ €ํ•˜์‹œํ‚ด + noise๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๋น„์šฉ์ด ๋งค์šฐ ํผ.

โ‡’ ArcFace์— sub-class๋ฅผ ๋„์ž…ํ•จ์œผ๋กœ์จ ๋ชจ๋“  ์ƒ˜ํ”Œ์„ positive center์— ๊ฐ€๊น๊ฒŒ ๊ฐ•์ œํ•˜๋Š” ํด๋ž˜์Šค ๋‚ด ์ œ์•ฝ์„ ์™„ํ™”

: Training face๊ฐ€ nosiyํ•˜๋ฉด positive class์— ์†ํ•˜์ง€ ์•Š๊ฒŒ๋จ. ๊ธฐ์กด์˜ ArcFace๊ฐ€ ํ•ด๋‹น noisy sample์˜ ์ €ํ•ด ์˜ํ–ฅ์„ ๋งŽ์ด ๋ฐ›์•˜๋‹ค๋ฉด, sub-class ArcFace์—์„œ๋Š” ํ•ด๋‹น ์ƒ˜ํ”Œ๋“ค์ด ๋‹ค๋ฅธ sub-class๋กœ ๋ถ„๋ฅ˜๋˜์–ด ์ง€๋ฐฐ์ ์ธ ์˜ํ–ฅ์„ ํ–‰์‚ฌํ•˜์ง€ ์•Š๋„๋ก ์กฐ์ ˆํ•จ.

3. PROPOSED APPROACH

3-1. ArcFace

  • ๊ธฐ์กด์˜ softmax loss function

โ‡’ ํ•œ๊ณ„: class ๋‚ด๋ถ€์˜ simlilarity์™€, class ๊ฐ„์˜ similarity๋ฅผ ๋šœ๋ ทํ•˜๊ฒŒ ์ตœ์ ํ™”ํ•˜์ง€ ๋ชปํ•จ.

  • (a)๋Š” ๊ธฐ์กด์˜ Norm-Softmax. ๊ฐ ํด๋ž˜์Šค ๋ณ„๋กœ feature๋“ค์ด ํผ์ ธ์žˆ์Œ
  • (b)๋Š” ArcFace๋กœ, ๊ฐ ํด๋ž˜์Šค ๋ณ„๋กœ feature๋“ค์ด ํ›จ์”ฌ ๋ญ‰์ณ์žˆ์Œ. ์ฆ‰ class ๋‚ด๋ถ€์˜ similarity์™€ class ์™ธ๋ถ€์˜ ๋ถ„๋ฆฌ๋ฅผ ๋” ์ด‰์ง„์‹œํ‚ด.

โ‡’ Feature embedding ์„ฑ๋Šฅ ์ƒ์Šน


  • ArcFace: Angle ์„ธํƒ€ ๋„์ž…
  • ์ตœ์ข… Loss function

3-2. Sub-center ArcFace

ArcFace์˜ ํ•œ๊ณ„: Feature embedding์— ๋งค์šฐ ํšจ๊ณผ์ ์ด๋‚˜, training ๋ฐ์ดํ„ฐ๊ฐ€ ๋ชจ๋‘ ๋งค์šฐ ํ€„๋ฆฌํ‹ฐ๊ฐ€ ๋†’์Œ์„ ๊ฐ€์ •ํ•จ.

โ‡’ ๊ทธ๋Ÿฌ๋‚˜ ๋งค์šฐ ํฐ ๋ฐ์ดํ„ฐ์…‹์ด ๋ชจ๋‘ ํ€„๋ฆฌํ‹ฐ๊ฐ€ ๋†’์„ ์ˆ˜ ์—†์Œ.

  • Sub-class์˜ ๋„์ž… (a) ์—์„œ ๋ณด์ด๋“ฏ์ด, K = 10์ด๋ผ๊ณ  ์„ค์ •์„ ํ•œ ๊ฒฝ์šฐ, ๊นจ๋—ํ•œ sample๊ณผ noisyํ•œ sample์€ ์„œ๋กœ ๋‹ค๋ฅธ class๋ฅผ ๊ฐ–๊ฒŒ ๋จ. (b) ์—์„œ๋Š” ์ƒ˜ํ”Œ๊ณผ ๊ทธ ์ƒ˜ํ”Œ์— ๋Œ€์‘ํ•˜๋Š” Center ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๊ฐ€, noisyํ•œ ์ƒ˜ํ”Œ์ผ ์ˆ˜๋ก ๊ฐ์ด ๋” ํผ์„ ์•Œ ์ˆ˜ ์žˆ์Œ. โ‡’ ์ฆ‰ sub-class๋ฅผ ๋„์ž…ํ•˜์—ฌ clean/dominant/nosiy/non-dominant์˜ ์ •๋ณด๋ฅผ ์ถ”๊ฐ€์ ์œผ๋กœ ์ œ๊ณตํ•˜์—ฌ ArcFace์˜ ํ•œ๊ณ„๋ฅผ ๋ณด์™„ํ•จ.

3-3. Inversion of ArcFace = ArcFace Generative model

  • Pre-trained ArcFace ๋ชจ๋ธ + BN layer์— ์ €์žฅ๋˜์–ด ์žˆ๋Š” ArcFace loss์˜ gradient + face statistic prior๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ƒˆ๋กœ์šด face image๋ฅผ ์ƒ์„ฑ โ‡’ ๋†’์€ confidence

  • Face Recognition(Discriminator) ์˜€๋˜ ArcFace๋ฅผ ์—ญ์œผ๋กœ(inversion) ํ™œ์šฉํ•˜๋ฉด ์ธ๋ฌผ์˜ ์ƒˆ๋กœ์šด image๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Œ.
    • Input: ๋ชจ๋ธ๊ณผ BN Layer, Class label y

    • Output: ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋“ค์˜ batch(Ir)

      โ‡’ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ random data๋ฅผ ์ƒ์„ฑ โ†’ BN Layer์—์„œ running mean&variance ๊ฐ€์ ธ์˜ค๊ธฐ โ†’ Forward Propagate๋กœ ArcFace loss๋ฅผ ๊ณ„์‚ฐ โ†’ Backward Propagate๋กœ Ir ์—…๋ฐ์ดํŠธ

4. EXPERIMENTS

4-1. ArcFace

  • Training Dataset: CASIA, VGG2, MS1MV0, Celeb500K
  • Testing Dataset: LFW, CFP-FP, AgeDB

โ‡’ ์‹คํ—˜ ๊ฒฐ๊ณผ: ๋ชจ๋“  ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•˜์—ฌ ArcFace๊ฐ€ ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ž„.

4-2. Inversion of ArcFace

1) Close-set Face Generation

  • CosFace๋ชจ๋ธ์˜ ๋ฒ ์ด์Šค๋ผ์ธ๊ณผ ๋น„๊ตํ•˜์˜€์„ ๋•Œ FID score 70.39๋กœ ๋” ๋†’์Œ.

โ‡’ ArcFace๋Š” GAN๊ณผ ๋‹ค๋ฅด๊ฒŒ discriminator, generator์˜ ์ถ”๊ฐ€์ ์ธ ํ•™์Šต์ด ํ•„์š”ํ•˜์ง€ ์•Š์Œ.

CoseFace(5-7 ๋ฒˆ์งธ ์—ด)

ArcFace(2-4 ๋ฒˆ์งธ ์—ด)

  • Identity์˜ ํŠน์ง•์„ ์œ ์ง€ํ•œ ์‚ฌ์ง„๋“ค์„ ๊ธฐ์กด์˜ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ์œผ๋กœ๋„ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Œ.

2) Open-set Face Generation

  • ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ArcFace๊ฐ€ ๊ฐ€์žฅ ๋ณธ ์‚ฌ์ง„์˜ ํŠน์ง•์„ ์ž˜ ์œ ์ง€ํ•˜๋ฉด์„œ ๊ตฌํ˜„ํ•ด ๋ƒ„.

5. CONCLUSION

1) ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ArcFace๋ผ๋Š” ์ถ”๊ฐ€ ๊ฐ๋„ ์—ฌ๋ฐฑ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์ œ์•ˆํ•จ. ์ด๋Š” ์–ผ๊ตด ์ธ์‹์„ ์œ„ํ•œ ์‹ฌ์ธต feature embedding์˜ discrimination์„ ํšจ๊ณผ์ ์œผ๋กœ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ.

2) ๋˜ํ•œ, ArcFace์— ํ•˜์œ„ ํด๋ž˜์Šค๋ฅผ ๋„์ž…ํ•˜์—ฌ ๋Œ€๋Ÿ‰์˜ ํ˜„์‹ค ์„ธ๊ณ„ ์žก์Œ์—์„œ ๋‚ด๋ถ€ ํด๋ž˜์Šค ์ œ์•ฝ์„ ์™„ํ™”ํ•จ. ์ด๋Ÿฌํ•œ ํ•˜์œ„ ์ค‘์‹ฌ ArcFace๋Š” ๋Œ€๋ถ€๋ถ„์˜ ๊นจ๋—ํ•œ ์–ผ๊ตด์„ ํฌํ•จํ•˜๋Š” ์ฃผ์š” ํ•˜์œ„ ํด๋ž˜์Šค์™€ ์–ด๋ ค์šด ๋˜๋Š” noise ๋งŽ์€ ์–ผ๊ตด์„ ํฌํ•จํ•˜๋Š” ๋น„์ฃผ์š” ํ•˜์œ„ ํด๋ž˜์Šค๋ฅผ ์ด‰์ง„ํ•˜์—ฌ ์ž๋™์ ์œผ๋กœ ๋ถ„๋ฆฌํ•  ์ˆ˜ ์žˆ์Œ.

3) ๋งˆ์ง€๋ง‰์œผ๋กœ, ArcFace๋Š” ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ ์–ผ๊ตด ์ด๋ฏธ์ง€๋กœ ๋งคํ•‘ํ•˜์—ฌ ๋ชจ๋ธ์˜ generation์„ ๊ฐ•ํ™”ํ•จ. Pre-trained ArcFace ๋ชจ๋ธ์€ gradient์™€ BN๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ open, close์—์„œ ๋ชจ๋‘ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์ด ๊ฐ€๋Šฅํ•จ.


** ๊ผญ์ง€: Generation AI ์˜์—ญ์—์„œ๋Š” Inverted ArcFace๋ณด๋‹ค GAN์ด ๋” ์ต์ˆ™ํ•˜๊ณ , ํ˜„์žฌ๊นŒ์ง€๋„ ๋” ๋งŽ์ด ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋Š” ๊ฒƒ ๊ฐ™์€๋ฐ, Inverted ArcFace๋ฅผ ์‚ฌ์šฉ ์ค‘์ธ ๋ถ„์•ผ๊ฐ€ ๊ถ๊ธˆํ•จ.

profile
์œ ์ •๋ฏผ

0๊ฐœ์˜ ๋Œ“๊ธ€