[Paper Review] Visualizing Data using t-SNE

์˜์„œ์ฟ ยท2021๋…„ 9์›” 7์ผ
2
post-thumbnail

์„ ์ • ์ด์œ 

์˜ค๋Š˜ ๋ฆฌ๋ทฐ/๋ฒˆ์—ญ/๊ตฌํ˜„ํ•  ๋…ผ๋ฌธ์€ Visualizing Data using t-SNE์œผ๋กœ, 2008๋…„์— Geoffrey Hinton์ด ์ €์ž์ธ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. t-SNE(t-Stochastic Nearest Neighbor)์€ ์ง€๊ธˆ๊นŒ์ง€๋„ ์‹œ๊ฐํ™”๋ฅผ ํ•˜๋Š” ๋ฐ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๊ณ ์ฐจ์›์˜ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„๋˜๋Š” ๋ฐ์ดํ„ฐ ๊ฐ„์˜ neighbor structure ๋ฅผ ๋ณด์กดํ•˜๋Š” 2 ์ฐจ์›์˜ embedding vector ๋ฅผ ํ•™์Šตํ•จ์œผ๋กœ์จ, ๊ณ ์ฐจ์›์˜ ๋ฐ์ดํ„ฐ๋ฅผ 2 ์ฐจ์›์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.


๋…ผ๋ฌธ์š”์•ฝ

Abstract

  • t-SNE๋Š” 2์ฐจ์› ๋˜๋Š” 3์ฐจ์› ์ง€๋„์— ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์— ์œ„์น˜๋ฅผ ๋ถ€์—ฌํ•จ์œผ๋กœ์„œ ์ด๋ฅผ ์‹œ๊ฐํ™”ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ด์ฃผ๋Š” ๋ฐฉ๋ฒ•๋ก ์ž…๋‹ˆ๋‹ค.
  • t-SNE๋Š” SNE(Stochastic Neighbor Embedding (Hinton and Roweis, 2002)) ๋ฐฉ๋ฒ•์—์„œ ์ข€ ๋” ๊ฐœ์„ ๋œ ๋ฐฉ๋ฒ•๋ก ์ด๋ฉฐ ๊ธฐ์กด์— ์žˆ๋˜ crowding problem๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋งŒ๋“ค์–ด์กŒ์Šต๋‹ˆ๋‹ค.
  • t-SNE๋Š” ๋งค์šฐ ํฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์‹œ๊ฐํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์ธ์ ‘ ๊ทธ๋ž˜ํ”„์—์„œ random walks ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ์•”์‹œ์ ์ธ ๊ตฌ์กฐ๊ฐ€ ๋ฐ์ดํ„ฐ์˜ ํ•˜์œ„ ์ง‘ํ•ฉ์ด ํ‘œ์‹œ๋˜๋Š” ๋ฐฉ์‹์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ t-SNE ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๊ณ , Sammon Mapping, Isomap ๋ฐ locally linear embedding๊ณผ ๋น„๊ต๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

1. Introduction

  • ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ์˜ ์‹œ๊ฐํ™”๋Š” ๋งŽ์€ ๋„๋ฉ”์ธ ์˜์—ญ์—์„œ ์ฃผ์š”ํ•˜๊ฒŒ ๋‹ค๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.

    • (์˜ˆ์‹œ) ์œ ๋ฐฉ์•” ๊ด€๋ จ ์„ธํฌ ํ•ต(30๊ฐœ์˜ ๋ณ€์ˆ˜), ์ด๋ฏธ์ง€ ๋ฐ ๋ฌธ์žฅ์„ ๊ตฌ์„ฑํ•˜๋Š” ๋ฒกํ„ฐ๋“ค์€ ๋ช‡ ๋ฐฑ์—์„œ ์ฒœ ๊ฐœ ์ด์ƒ์˜ ๋ณ€์ˆ˜๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋‹น์‹œ ํ˜„์กดํ•˜๋˜ ์•„์ด์ฝ˜๊ทธ๋ž˜ํ”ฝ ๋””์Šคํ”Œ๋ ˆ์ด(iconographic disaplay), Chernoff Face (Chernoff, 1973), pixel-based techniques (Keim, 2000)๊ณผ ๊ฐ™์€ ์‹œ๊ฐํ™” ๊ธฐ๋ฒ•์€ ๋Œ€๋‘๋ถ„ ๋‘๊ฐœ ์ด์ƒ์˜ ๋ฐ์ดํ„ฐ ์ฐจ์›์„ ์‹œ๊ฐํ™”ํ•  ์ˆ˜ ์žˆ๋„๋ก๋งŒ ์ œ๊ณตํ•ด์ฃผ๊ณ , ์‚ฌ๋žŒ์ด ์ด๋ฅผ ํ•ด์„ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ง„ํ–‰์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

    Chernoff-face : https://en.wikipedia.org/wiki/Chernoff_face

pixel-based techniques : https://www.semanticscholar.org/paper/Pixel-Oriented-Visualization-Techniques-for-Very-Keim/ce1eb9ed41232690a1ab0b6b7322cfdb10a385cc

  • ์ด์™€๋Š” ๋ฐ˜๋Œ€๋กœ, ์ฐจ์›์ถ•์†Œ(dimensional reduction) ๋ฐฉ๋ฒ•์€ ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ๋ฅผ ์ €์ฐจ์›(2์ฐจ์› ๋˜๋Š” 3์ฐจ์›)์œผ๋กœ ์ถ•์•ฝ์‹œ์ผœ, ์ด๋ฅผ ์‚ฐ์ ๋„๋กœ ํ‘œํ˜„์„ ํ•ด์ค„ ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

  • ์ฐจ์› ์ถ•์†Œ์˜ ๋ชฉํ‘œ๋Š” ์ €์ฐจ์› ๋ฐ์ดํ„ฐ์—์„œ ์ค‘์š”ํ•œ ๊ณ ์ฐจ์›๋ฐ์ดํ„ฐ์˜ ๊ตฌ์กฐ๋ฅผ ์œ ์ง€ํ•ด์ฃผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

    X=[x1,x2,...,xn]โ†’Y=[y1,y2,...,yn]X = [x_{1},x_{2}, ... ,x_{n}] โ†’ Y= [y_{1},y_{2}, ... ,y_{n}]

  • PCA(์ฃผ์„ฑ๋ถ„๋ถ„์„), MDS(๋‹ค์ฐจ์›์ฒ™๋„๋ฒ•)์€ ๋Œ€ํ‘œ์ ์ธ ์ฐจ์›์ถ•์†Œ ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ, ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ์ €์ฐจ์› ํ‘œํ˜„์œผ๋กœ ๋ฉ€๋ฆฌ ๋–จ์–ด๋œจ๋ฆฌ๋Š” ๋ฐ ์ดˆ์ ์„ ๋งž์ถ˜ ์„ ํ˜• ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.

  • ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ์—์„œ๋Š” ์„ ํ˜• ๋งคํ•‘์œผ๋กœ๋Š” ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ ์ •๋ณด์œ ์ง€๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ์ง€์—ญ ๊ตฌ์กฐ๋ฅผ ๋ณด์กดํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ๋งŽ์€ ์ˆ˜์˜ ๋น„์„ ํ˜• ์ฐจ์› ๊ฐ์†Œ ๊ธฐ์ˆ ์ด ์ œ์•ˆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

    • Sammon Mapping (Sammon, 1969)
    • Curvilinear Components Analysis (CCA; Demartines and Herault, 1997)
    • Stochastic Neighbor Embedding (SNE, Hinton and Roweis , 2002)
    • Isomap (Tenenbaum et al., 2000)
    • Maximum Variance Unfolding (MVU; Weinberger et al., 2004)
    • Locally Linear Embedding (LLE; Roweis and Saul, 2000)
    • Laplacian Eigenmaps (Belkin and Niyogi, 2002).
  • ํ•˜์ง€๋งŒ, ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•๋ก ๋“ค๋„ ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ์˜ ๋กœ์ปฌ ๋˜๋Š” ๊ธ€๋กœ๋ฒŒํ•œ ์ •๋ณด๋ฅผ ๋‘˜ ๋‹ค ์ž˜ ์œ ์ง€ํ•˜์ง€๋Š” ๋ชปํ•˜์˜€๋‹ค. t-SNE๋Š” ์ด๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค.

2. Stochastic Neighbor Embedding

  • SNE(Stochastic Neighbor Embedding)์€ ๋ฐ์ดํ„ฐ ์  ์‚ฌ์ด์˜ ๊ณ ์ฐจ์›์˜ "์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ"๋ฅผ "์œ ์‚ฌ์„ฑ์„ ๋‚˜ํƒ€๋‚ด๋Š” ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ "๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

  • SNE๋Š”ย pjโˆฃip_{j|i}์™€ย qjโˆฃiq_{j|i}์‚ฌ์ด์˜ ๋ถˆ์ผ์น˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ์ €์ฐจ์› ๋ฐ์ดํ„ฐ ํ‘œํ˜„์„ ์ฐพ๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

  • pjโˆฃip_{j|i} ๋Š” ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ์กฐ๊ฑด๋ถ€, qjโˆฃiq_{j|i} ๋Š” ์ €์ฐจ์› ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ์กฐ๊ฑด๋ถ€์ž…๋‹ˆ๋‹ค

    • xjx_{j}์™€ xix_{i}์˜ ์œ ์‚ฌ์„ฑ์€, xix_{i}๊ฐ€ xix_{i}๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ํ•œ ๊ฐ€์šฐ์Šค ์•„๋ž˜์—์„œ ํ™•๋ฅ  ๋ฐ€๋„์— ๋น„๋ก€ํ•˜์—ฌ ์ด์›ƒ์„ ์„ ํƒํ•œ๋‹ค๋ฉด xjx_{j}๋ฅผ ์ด์›ƒ์œผ๋กœ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋Š” ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ  pjโˆฃip_{j|i}๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

      pjโˆฃi=exp(โˆ’โˆฃxiโˆ’xjโˆฃ2/2ฯƒi2)โˆ‘kโ‰ iexp(โˆ’โˆฃxiโˆ’xkโˆฃ2/2ฯƒi2)p_{j|i}=\frac{exp(โˆ’|x_{i}โˆ’x_{j}|^{2}/2ฯƒ^{2}_{i})}{โˆ‘_{kโ‰ i}exp(โˆ’|x_{i}โˆ’x_{k}|^{2}/2ฯƒ^{2}_{i})}

    • yjy_{j}์™€ yiy_{i}์˜ ์œ ์‚ฌ์„ฑ์€, yiy_{i}๊ฐ€ yiy_{i}๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ํ•œ ๊ฐ€์šฐ์Šค ์•„๋ž˜์—์„œ ํ™•๋ฅ  ๋ฐ€๋„์— ๋น„๋ก€ํ•˜์—ฌ ์ด์›ƒ์„ ์„ ํƒํ•œ๋‹ค๋ฉด yjy_{j}๋ฅผ ์ด์›ƒ์œผ๋กœ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋Š” ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ  qjโˆฃiq_{j|i}๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

      qjโˆฃi=exp(โˆ’โˆฃyiโˆ’yjโˆฃ2)โˆ‘kโ‰ iexp(โˆ’โˆฃyiโˆ’ykโˆฃ2)q_{j|i}=\frac{exp(โˆ’|y_{i}โˆ’y_{j}|^{2})}{โˆ‘_{kโ‰ i}exp(โˆ’|y_{i}โˆ’y_{k}|^{2})}

    • ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ  ๊ฐ’(pjโˆฃip_{j|i})์ด ๋†’์œผ๋ฉด ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ๊ฐ€๊น์Šต๋‹ˆ๋‹ค.

    • ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ  ๊ฐ’(pjโˆฃip_{j|i})์ด ๋‚ฎ์œผ๋ฉด ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ๋ฉ‰๋‹ˆ๋‹ค.

    • piโˆฃi=0p_{i|i}=0 , qiโˆฃi=0q_{i|i}=0 ,

    • SNE๋Š” ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๊ธฐ ๋•Œ๋ฌธ์—, ์ธ์ ‘ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ๊ฒฝ์šฐ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์ด ๋†’์ง€๋งŒ, ๋„“๊ฒŒ ๋ถ„๋ฆฌ๋œ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ๊ฒฝ์šฐ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์ด ๋ฐœ์‚ฐํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

  • ์ฐจ์› ์ถ•์†Œ๊ฐ€ ์ œ๋Œ€๋กœ ์ž˜ ์ด๋ค„์กŒ๋‹ค๋ฉด ๊ณ ์ฐจ์› ๊ณต๊ฐ„์—์„œ ์ด์›ƒ์œผ๋กœ ๋ฝ‘ํž ํ™•๋ฅ ๊ณผ ์ €์ฐจ์› ๊ณต๊ฐ„์—์„œ ์ด์›ƒ์œผ๋กœ ๋ฝ‘ํž ํ™•๋ฅ ์ด ๋น„์Šทํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ ‡๋“ฏ SNE์˜ ๋ชฉ์ ์€ p์™€ q์˜ ๋ถ„ํฌ ์ฐจ์ด๊ฐ€ ์ตœ๋Œ€ํ•œ ์ž‘๊ฒŒ๋” ํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

  • ๋‘ ํ™•๋ฅ ๋ถ„ํฌ๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋น„์Šทํ•œ์ง€ ์ธก์ •ํ•˜๋Š” ์ง€ํ‘œ ์ฒ™๋„๋Š” Kullback-Leibler divergence๋ฅผ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.

    • KL Divergence๋Š” ์–ด๋–ค ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ๋‹ค๋ฅธ ํ™•๋ฅ  ๋ถ„ํฌ๋กœ ๊ทผ์‚ฌํ•  ๋•Œ ์ •ํ™•ํžˆ ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ์ •๋ณด(์—”ํŠธ๋กœํ”ผ)๊ฐ€ ์†์‹ค๋˜๋Š”์ง€๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • ๋‘ ํ™•๋ฅ ๋ถ„ํฌ๊ฐ€ ์™„์ „ํžˆ ๋‹ค๋ฅด๋ฉด 1, ๋™์ผํ•˜๋ฉด 0์˜ ๊ฐ’์„ ๊ฐ–์Šต๋‹ˆ๋‹ค.

    • SNE๋Š” ์•„๋ž˜ ๋น„์šฉํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

      Cost=โˆ‘iKL(PiโˆฃQi)=โˆ‘iโˆ‘jpjโˆฃilogpjโˆฃiqjโˆฃiCost = โˆ‘_{i}KL(P_{i}|Q_{i}) = โˆ‘_{i}โˆ‘_{j}p_{j|i}log\frac{p_{j|i}}{q_{j|i}}

  • SNE ๋น„์šฉ ํ•จ์ˆ˜๋Š” ๋งต์—์„œ ๋ฐ์ดํ„ฐ์˜ ๋กœ์ปฌ ๊ตฌ์กฐ๋ฅผ ์œ ์ง€ํ•˜๋Š”๋ฐ ์ดˆ์ ์„ ๋งž์ถฅ๋‹ˆ๋‹ค.

  • ์„ ํƒ๋  ๋‚˜๋จธ์ง€ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๊ฐ๊ฐ์˜ ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ xi์— ์ง‘์ค‘๋˜๋Š” ๊ฐ€์šฐ์‹œ์•ˆ์˜ ๋ถ„์‚ฐ ฯƒi์˜ ๋‹จ์ผ ๊ฐ’์œผ๋กœ ํ•˜๋Š” ๊ฒƒ์€ ๋ถ€์ ์ ˆํ•ฉ๋‹ˆ๋‹ค. (๋ฐ์ดํ„ฐ์˜ ๋ฐ€๋„๊ฐ€ ๋‹ค์–‘ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ฐ€๋„๊ฐ€ ๋†’์€ ์˜์—ญ์—์„œ ฯƒi์˜ ๊ฐ’์ด ์ž‘์„์ˆ˜๋ก ์ข‹์Œ)

  • ์–ด๋–คํ•œ ฯƒi ๊ฐ’์ด๋˜ ๋‹ค๋ฅธ ๋ชจ๋“  ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์— ๋Œ€ํ•ด ํ™•๋ฅ ๋ถ„ํฌ Pi๋ฅผ ์œ ๋„๋ฉ๋‹ˆ๋‹ค. ๋ถ„ํฌ๋Š” ฯƒi๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ์ฆ๊ฐ€ํ•˜๋Š” ์—”ํŠธ๋กœํ”ผ๋ฅผ ๊ฐ–์Šต๋‹ˆ๋‹ค.

  • SNE๋Š” ฯƒi์˜ ๊ฐ’์— ๋Œ€ํ•ด ์ด์ง„ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๊ฐ€ ์ง€์ •ํ•œ ๊ณ ์ •๋œ ๋ณต์žก๋„(perplexity)๋ฅผ ๊ฐ–๋Š” Pi๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. Perplexity๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค.

    Perp(Pi)=2H(Pi)Perp(P_{i}) = 2^{H(P_{i})}

  • ์ด๋•Œ H(Pi)H(P_{i})๋Š” Shannon Entrophy of PiP_{i}์ž…๋‹ˆ๋‹ค.

    H(Pi)=โˆ’โˆ‘jpjโˆฃilog2pjโˆฃiH(P_{i}) = -โˆ‘_{j}p_{j|i}log_{2}p_{j|i}

  • SNE์˜ ๊ฐ’์€ perplextity์— ํฐ ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š์Šต๋‹ˆ๋‹ค. (fairly robust)

  • ์ตœ์ข…์ ์œผ๋กœ ๊ตฌํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฏธ์ง€์ˆ˜๋Š” ์ €์ฐจ์›์— ์ž„๋ฒ ๋”ฉ๋œ ์ขŒํ‘œ๊ฐ’ yi, SNE๋Š” ๊ทธ๋ž˜๋””์–ธํŠธ ๋””์„ผํŠธ(gradient descent) ๋ฐฉ์‹์œผ๋กœ yi๋“ค์„ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.

  • ์ตœ์ ํ™” ์ดˆ๊ธฐ์—๋Š” ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ local minimum ์— ๋น ์ง€์ง€ ์•Š๋„๋ก ์œ ์˜ํ•œ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ๊ณผ์ •์ด ํ•œ๋ฒˆ์— ์ด๋ค„์งˆ ์ˆ˜๋Š” ์—†์Šต๋‹ˆ๋‹ค. ์ ๋‹นํ•œ amount of momentum๊ณผ step size๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ๋ฒˆ ๋ฐ˜๋ณต ์ˆ˜ํ–‰์„ ํ•ด์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

3. t-Distributed Stochastic Neighbor Embedding

3.1. Symmetric SNE

  • ์•ž์—์„œ ์ •์˜ํ•œ ๊ฑฐ๋ฆฌ ํ•จ์ˆ˜๋Š” ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์ด๊ธฐ ๋•Œ๋ฌธ์— Symmetric(๋น„๋Œ€์นญ)ํ•˜์ง€ ๋ชปํ•˜๋ฏ€๋กœ, ๋ฐ์ดํ„ฐ ์Œ์— ๋Œ€ํ•œ Joint Probability(๋Œ€์นญ)์„ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • ์ด์ „ ์กฐ๊ฑด๋ถ€ํ™•๋ฅ , ์‚ฌ๊ฑด B๊ฐ€ ๋ฐœ์ƒํ–ˆ๋‹ค๋Š” ๊ฐ€์ •ํ•˜์— ์‚ฌ๊ฑด A๊ฐ€ ์ผ์–ด๋‚  ํ™•๋ฅ  ( pjโˆฃip_{j|i}โ‰ piโˆฃjp_{i|j} )

  • Joint Probability๋ž€, ๋‘๊ฐœ์˜ ์„œ๋กœ๋‹ค๋ฅธ ์‚ฌ๊ฑด A, B๊ฐ€ ๋™์‹œ์— ์ผ์–ด๋‚  ํ™•๋ฅ  ( p(j,i)p(j,i) = p(i,j)p(i,j) )

  • Joint Probability๋ฅผ ์‚ฌ์šฉํ•ด์คŒ์œผ๋กœ์„œ ์—ฐ์‚ฐ์ด ๋นจ๋ผ์งˆ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

  • ๋Œ€์นญ SNE์˜ gradient๋Š” ๋น„๋Œ€์นญ SNE์˜ gradient์™€ ์ƒ๋‹นํžˆ ์œ ์‚ฌํ•˜๋ฉฐ ์‹คํ—˜์—์„œ ๋Œ€์นญ SNE๊ฐ€ ๋น„๋Œ€์นญ SNE์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ข‹๊ณ  ๊ฒฝ์šฐ์— ๋”ฐ๋ผ ์กฐ๊ธˆ ๋” ๋‚˜์€ ๋งต์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค.

  • ฯƒi๋Š” ๊ฐ ๊ฐœ์ฒด๋งˆ๋‹ค ๋ฐ์ดํ„ฐ ๋ฐ€๋„๊ฐ€ ๋‹ฌ๋ผ์„œ ์ด์›ƒ์œผ๋กœ ๋ฝ‘ํž ํ™•๋ฅ ์ด ์™œ๊ณก๋˜๋Š” ํ˜„์ƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค. ์ €์ž๋Š” ๋ฐ˜๋ณต ์‹คํ—˜ ๊ฒฐ๊ณผ p๋ฅผ ๊ณ„์‚ฐํ•  ๋•Œ ์“ฐ๋Š” ฯƒi๋Š” ๊ณ ์ •๋œ ๊ฐ’์„ ์จ๋„ ์„ฑ๋Šฅ์— ํฐ ์ฐจ์ด๋ฅผ ๋ณด์ด์ง€ ์•Š์•˜๋‹ค๊ณ  ํ•˜์—ฌ ฯƒi ๊ณ„์‚ฐ์„ ์ƒ๋žตํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • ๋Œ€์นญ์  ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๋Œ€์นญ์ ์œผ๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•˜์—ฌ ๋‘ ํ™•๋ฅ  ๊ฐ’์˜ ํ‰๊ท ์œผ๋กœ ๋‘ ์ ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ์ •์˜ํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

  • ๋ณ€๊ฒฝ๋œ ์ˆ˜์‹์˜ Cost Function๊ณผ ๊ทธ gradient ๊ฐ’์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

3.2. The Crowding Problem

  • ๊ณ ์ฐจ์›์—์„œ ์ €์ฐจ์›์œผ๋กœ ์ ์„ Projection ํ•˜๋ฉด, ๊ฑฐ๋ฆฌ๊ฐ€ ๋ฉ€๊ณ  ๊ฐ€๊นŒ์šด ๊ฐœ๋…์ด ๋ถ•๊ดด๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. 3์ฐจ์›์—์„œ๋Š” ์„œ๋กœ ๋‹ค๋ฅธ 4๊ฐœ์˜ ์ ์ด ์„œ๋กœ์™€ ๊ฐ™์€ ๊ฑฐ๋ฆฌ์— ์œ„์น˜ํ•˜๋„๋ก ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ 2์ฐจ์›์—์„œ๋Š” 4๊ฐœ์˜ ์ ์ด ๊ฑฐ๋ฆฌ๊ฐ€ ๋‹ฌ๋ผ์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. (The Crowding Problem)
  • t-SNE ์ „์— ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์œผ๋กœ UNI-SNE๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋“  ์Šคํ”„๋ง์— ์•ฝ๊ฐ„์˜ ๊ฑฐ๋ถ€๊ฐ์„ ๋”ํ•จ์œผ๋กœ์จ ํ˜ผ์žก ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ์‹œ๋„๊ฐ€ ์ œ์‹œ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. (2007, Cook et al).
  • ๊ณ ์ฐจ์›์—์„œ ๋ฉ€๋ฆฌ ๋–จ์–ด์ ธ ์žˆ๋˜ ์ ์€ ์ €์ฐจ์›์—์„œ ๋” ๋ฉ€๊ฒŒ, ๊ณ ์ฐจ์›์—์„œ ๊ฐ€๊นŒ์› ๋˜ ์ ์€ ์ €์ฐจ์›์—์„œ ๋” ๊ฐ€๊น๊ฒŒ ๋งŒ๋“ค์–ด์ค„ ์ธ์œ„์ ์ธ ์žฅ์น˜๊ฐ€ ํ•„์š”ํ•˜์—ฌ ๊ณ ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ๋ฐ”๋กœ, Student t-Distribution์ž…๋‹ˆ๋‹ค.
  • Low Dimensional Domain์—์„œ๋งŒ Gaussian ๋Œ€์‹ ์— ์ˆ˜์ •๋œ ํ˜•ํƒœ(Student t-Distribution)์˜ ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , High Dimensional์—์„œ๋Š” ์ด์ „๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ ์‚ฌ์šฉํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

source : https://andyjconnelly.wordpress.com/2017/05/16/uncertainty-and-repeats/

3.3. Mismatched Tails can Compensate for Mismatched Dimensionalities

  • t-SNE์—์„œ๋Š” ์ €์ฐจ์› ์ง€๋„์— ์žˆ๋Š” heavy-tailed ๋ถ„ํฌ๋กœ์„œ ์ž์œ ๋„๊ฐ€ 1๋„์ธ ํ•™์ƒ t-๋ถ„ํฌ๋ฅผ ์ฑ„์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค(Cauchy ๋ถ„ํฌ์™€ ๋™์ผ). ์ด ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ณต๋™ ํ™•๋ฅ  qijq_{ij}๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

  • ๊ณ„์‚ฐ์ ์œผ๋กœ ํŽธ๋ฆฌํ•œ ์†์„ฑ์€ Student t ๋ถ„ํฌ๊ฐ€ Gaussians์˜ ๋ฌดํ•œํ•œ ํ˜ผํ•ฉ(infinite mixture of Gaussians)๊ณผ ๊ฐ™์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์ง€์ˆ˜๊ฐ€ ํฌํ•จ๋˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— Student t ๋ถ„ํฌ ์•„๋ž˜์˜ ์ ์˜ ๋ฐ€๋„๋ฅผ ๊ฐ€์šฐ์‹œ์•ˆ๋ณด๋‹ค ๋” ๋น ๋ฅด๊ฒŒ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

  • The gradient of the Kullback-Leibler divergence between P and the Student-t based joint probability distribution Q

source : ๋…ผ๋ฌธ ์›๋ฌธ

  • ์ €์ž๋Š” t-SNE๊ฐ€ ๊ธฐ์กด SNE์™€ UNI-SNE์— ๋น„ํ•˜์—ฌ ์ข‹์€ ์ด์œ ๋กœ 2๊ฐ€์ง€ ์ด์œ ๋ฅผ ๋“ญ๋‹ˆ๋‹ค.
  1. t-SNE ๊ทธ๋ผ๋””์–ธํŠธ๋Š” ์ €์ฐจ์› ํ‘œํ˜„์—์„œ ์ž‘์€ pairwise distance๋กœ ๋ชจ๋ธ๋ง ๋œ ๋‹ค๋ฅธ datapoints๋ฅผ ๊ฐ•ํ•˜๊ฒŒ ๋ฐ˜๋ฐœํ•ฉ๋‹ˆ๋‹ค.

  2. t-SNE๊ฐ€ ์ž‘์€ pairwise distance์— ์˜ํ•ด ๋ชจ๋ธ๋ง๋œ ๋‹ค๋ฅธ datapoint๋“ค ์‚ฌ์ด์— ์ผ์œผํ‚จ ๊ฐ•๋ ฅํ•œ ๋ฐ˜๋ฐœ์€ ๋ฌดํ•œ๋Œ€๋กœ ๋ฐœ์‚ฐํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

  • ์š”์•ฝํ•˜์ž๋ฉด, t-SNE๋Š” (1) ํฐ pairwise ๊ฑฐ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์„œ๋กœ ๋‹ค๋ฅธ datapoints๋ฅผ ๋ชจ๋ธ๋งํ•˜๊ณ , (2) ์ž‘์€ pairwise ๊ฑฐ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์œ ์‚ฌํ•œ datapoints๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” ๋ฐ ์ค‘์ ์„ ๋‘ก๋‹ˆ๋‹ค.

3.4. Optimization Methods for t-SNE

  • ๋‹ค์Œ์€ ํ•ด๋‹น ๋…ผ๋ฌธ์ด ์ œ์‹œํ•œ t-SNE ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.

source : ๋…ผ๋ฌธ ์›๋ฌธ

  • ์œ„ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋‚˜์˜จ ๊ณต์‹๋“ค์„ ์ •๋ฆฌํ•ด๋ณด์ž๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • gradient decent๋ฅผ ์ง„ํ–‰ํ•˜๋ฉด ๋ถˆ์•ˆ์ •ํ•˜๊ธฐ ๋•Œ๋ฌธ์— momentum์œผ๋กœ paraemter๋ฅผ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ณต ํšŸ์ˆ˜๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด momentum์„ ํ™œ์šฉํ•˜๋ฉฐ, momentum์ด ์ž‘์œผ๋ฉด ๋งต ํฌ์ธํŠธ๊ฐ€ ์ ์ ˆํ•˜๊ฒŒ ์ž˜ ์ •๋ฆฌ ๋  ๋•Œ๊นŒ์ง€ ํ•™์Šต์ด ๋ฉ๋‹ˆ๋‹ค.

  • Jacobs(1988)์— ์˜ํ•ด ์„ค๋ช…๋œ adaptive learning rate๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต์„ ๊ฐ€์†ํ™” ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด ๋ฐฉ๋ฒ•์€ ๊ทธ๋ผ๋””์–ธํŠธ๊ฐ€ ์•ˆ์ •ํ•œ ๋ฐฉํ–ฅ์œผ๋กœ ์ ์ง„์ ์œผ๋กœ ํ•™์Šต ์†๋„๋ฅผ ์ฆ๊ฐ€์‹œํ‚ต๋‹ˆ๋‹ค.

  • ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋‹ค๋ฅธ ๋น„๋ชจ์ˆ˜ ์ฐจ์›์˜ ์ถ•์†Œ ๊ธฐ์ˆ ๋กœ ์ƒ์„ฑ๋œ ๊ฒƒ๋ณด๋‹ค ํ›จ์”ฌ ๋‚˜์€ ์‹œ๊ฐํ™”๋ฅผ ์ƒ์„ฑํ•˜์ง€๋งŒ, ์•„๋ž˜์˜ 2๊ฐ€์ง€ ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์ข€ ๋” ํšจ์œจ์ ์œผ๋กœ ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. Early Compression (์ด๋ฅธ ์••์ถ• ๋ฐฉ๋ฒ•)

    • ์ตœ์ ํ™”๊ฐ€ ์‹œ์ž‘๋  ๋•Œ ๋งต ํฌ์ธํŠธ๋“ค์ด ์„œ๋กœ ๊ฐ€๊นŒ์ด์— ์žˆ๋„๋ก ๊ฐ•์ œํ•ฉ๋‹ˆ๋‹ค. (force the map points to stay close together at the start of the optimization)

    • ๋งต ํฌ์ธํŠธ ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ์ž‘์œผ๋ฉด ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ์„œ๋กœ ์ด๋™ํ•˜๊ธฐ ์‰ฝ๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ€๋Šฅํ•œ ๊ธ€๋กœ๋ฒŒ ํ•˜๊ฒŒ ๋ฐ์ดํ„ฐ์˜ ๊ณต๊ฐ„์„ ํƒ์ƒ‰ํ•˜๋Š” ๊ฒƒ์ด ํ›จ์”ฌ ์‰ฌ์›Œ์ง‘๋‹ˆ๋‹ค.

    • ์›์ ์œผ๋กœ๋ถ€ํ„ฐ์˜ ๋งต ํฌ์ธํŠธ์˜ ์ œ๊ณฑ ๊ฑฐ๋ฆฌ์˜ ํ•ฉ์— ๋น„๋ก€ํ•˜๋Š” ๋น„์šฉ ํ•จ์ˆ˜์— L2 ํŽ˜๋„ํ‹ฐ๋ฅผ ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ, ํŽ˜๋„ํ‹ฐ ๊ธฐ๊ฐ„์˜ ํฌ๊ธฐ์™€ ๋ฐ˜๋ณต์€ ์ˆ˜๋™์œผ๋กœ ์„ค์ •ํ•ด์ค๋‹ˆ๋‹ค.

  2. Early Exaggeration (์ด๋ฅธ ๊ณผ์žฅ ๋ฐฉ๋ฒ•)

    • pij์— ํŠน์ • ์ƒ์ˆ˜(๋…ผ๋ฌธ ์˜ˆ์‹œ๋กœ 4)๋ฅผ ๊ณฑํ•˜์—ฌ, qij๊ฐ€ ์ƒ๋Œ€์ ์œผ๋กœ ์ž‘๊ธฐ ๋•Œ๋ฌธ์— pij์— ๋Œ€์‘ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ํฌ๊ฒŒ ์›€์ง์ด๊ณ  ๋งต ํฌ์ธํŠธ๊ฐ€ ๋„“๊ฒŒ ์›€์ง์ด๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

    • ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๋งต ํฌ์ธํŠธ์—์„œ ๋‹จ๋‹จํ•˜๊ณ  ์„œ๋กœ ๊ฐ„์˜ ๋„“๊ฒŒ ๋ถ„๋ฆฌ๋œ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ˜•์„ฑํ•˜๋Š” ๊ฒฝํ–ฅ์œผ๋กœ ๋นˆ ๊ณต๊ฐ„์ด ๋งŽ์ด ์ƒ๊ธธ ์ˆ˜ ์žˆ๋„๋ก ์กฐ์ •ํ•ด์ฃผ๋Š” ๋ฐ ์˜์˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

4. Experiments

  • t-SNE๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด, ์ฐจ์› ์ถ•์†Œ๋ฅผ ์œ„ํ•œ 7๊ฐ€์ง€ ๋‹ค๋ฅธ ๋น„๋ชจ์ˆ˜ ๊ธฐ์ˆ ๊ณผ ๋น„๊ต๋˜๋Š” ์‹คํ—˜์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
  • ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” (1) Sammon ๋งคํ•‘, (2) Isomap, (3) LLE ๋งŒ t-SNE์™€ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค.
  • Appendix์—์„œ t-SNE์™€ CCA, (5) SNE, (6) MVU ๋ฐ (7) Laplacian ๊ณ ์œ  ๋งต์„ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค. (๋ณธ ํฌ์ŠคํŠธ์—์„œ๋Š” ์ƒ๋žต)

4.1. Data Sets

์‹คํ—˜์œผ๋กœ ์•„๋ž˜ 5๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ด์šฉํ–ˆ์Šต๋‹ˆ๋‹ค:

  1. MNIST ๋ฐ์ดํ„ฐ ์„ธํŠธ
  2. Olivettifaces ๋ฐ์ดํ„ฐ ์„ธํŠธ
  3. COIL-20 ๋ฐ์ดํ„ฐ ์„ธํŠธ
  4. the word-features ๋ฐ์ดํ„ฐ ์„ธํŠธ
  5. Netflix ๋ฐ์ดํ„ฐ ์„ธํŠธ

4.2. Experimental Setup

  • ๋ชจ๋“  ์‹คํ—˜์—์„œ ๋ฐ์ดํ„ฐ์˜ ์ฐจ์›์„ 30์œผ๋กœ ์ค„์ด๊ธฐ ์œ„ํ•ด PCA๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•ด์ฃผ๋ฉด, ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ ๊ฐ„์˜ pairwise ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ ์†๋„๊ฐ€ ๋นจ๋ผ์ง€๊ณ  interpoint ๊ฑฐ๋ฆฌ๊ฐ€ ์‹ฌ๊ฐํ•˜๊ฒŒ ์™œ๊ณก๋˜์ง€ ์•Š๊ณ  ์ผ๋ถ€ ๋…ธ์ด์ฆˆ๊ฐ€ ์–ต์ œ๋ฉ๋‹ˆ๋‹ค.
  • ์ฐจ์›์ถ•์†Œ ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•˜์—ฌ 30 ์ฐจ์› ํ‘œํ˜„์„ 2 ์ฐจ์› ์ง€๋„๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๊ฒฐ๊ณผ ๋งต์„ ์‚ฐ์ ๋„๋กœ ํ‘œ์‹œํ•ด์ค๋‹ˆ๋‹ค.
  • ๋ชจ๋“  ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•ด ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ์žˆ์ง€๋งŒ ํด๋ž˜์Šค ์ •๋ณด๋Š” ๋งต ํฌ์ธํŠธ์˜ ์ƒ‰์ƒ(or ์‹ฌ๋ณผ)์„ ์„ ํƒํ•˜๋Š” ๋ฐ๋งŒ ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. (ํด๋ž˜์Šค ์ •๋ณด๋Š” ๋งต ํฌ์ธํŠธ์˜ ๊ณต๊ฐ„ ์ขŒํ‘œ๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜์ง€ ์•Š์Œ)
  • Table 1์—์„œ Perp(perplexity)๋Š” ๊ฐ€์šฐ์‹œ์•ˆ ์ปค๋„์— ์˜ํ•ด ์œ ๋„๋œ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ  ๋ถ„ํฌ์˜ ๋ณต์žก๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๊ณ , k๋Š” ์ธ์ ‘ ๊ทธ๋ž˜ํ”„์— ์‚ฌ์šฉ๋œ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ด์›ƒ์˜ ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

Source : ๋…ผ๋ฌธ ๋ณธ๋ฌธ

  • Isomap๊ณผ LLE์„ ์‚ฌ์šฉํ•œ ์‹คํ—˜์—์„œ, ์˜ค์ง ๋Œ€์‘ํ•˜๋Š” ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋งŒ ์‹œ๊ฐํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • Sammon ๋งคํ•‘ ์ตœ์ ํ™”์˜ ๊ฒฝ์šฐ, Newton ๋ฐฉ๋ฒ•์„ ์ˆ˜ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. (iter =500)

4.3. Results

Source : ๋…ผ๋ฌธ ๋ณธ๋ฌธ

  • Figure 2์™€ Figure 3์—์„œ MNIST ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ t-SNE, Sammon ๋งคํ•‘, Isomap ๋ฐ LLE์„ ์‚ฌ์šฉํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • ํ•ด๋‹น ๊ฒฐ๊ณผ๋Š” ๋‹ค๋ฅธ ๊ธฐ์ˆ ์— ๋น„ํ•ด t-SNE์˜ ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. t-SNE๋Š” ์ˆซ์ž ํด๋ž˜์Šค๋“ค ์‚ฌ์ด์˜ ๋ถ„๋ฆฌ๊ฐ€ ๊ฑฐ์˜ ์™„๋ฒฝํ•œ ๋งต์„ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • Sammon ๋งคํ•‘์€ 3 ๊ฐœ์˜ ํด๋ž˜์Šค (์ˆซ์ž 0, 1 ๋ฐ 7์„ ๋‚˜ํƒ€๋‚ด๋Š”)๋งŒ ๋‹ค๋ฅธ ํด๋ž˜์Šค์™€ ์•ฝ๊ฐ„ ๋ถ„๋ฆฌ ๋œ โ€œ๊ณตโ€ ํ˜•ํƒœ๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
  • Isomap๊ณผ LLE๋Š” ์ˆซ์ž ํด๋ž˜์Šค ์‚ฌ์ด์— ํฐ ์ค‘๋ณต์ด ์žˆ๋Š” ์†”๋ฃจ์…˜์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • t-SNE ๋งต์˜ ์ƒ์„ธํ•œ ๊ฒ€์‚ฌ๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ตญ๋ถ€์ ์ธ ๊ตฌ์กฐ (์˜ˆ๋ฅผ ๋“ค์–ด, ๋ฐฉํ–ฅ)๊ฐ€ ์บก์ณ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Source : ๋…ผ๋ฌธ ๋ณธ๋ฌธ

  • Figure 4๋Š” t-SNE, Sammon ๋งคํ•‘, Isomap ๋ฐ LLE๋ฅผ Olivetti ์–ผ๊ตด ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ์ ์šฉํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.
  • t-SNE๋Š” ๋ฐ์ดํ„ฐ์˜ ์ž์—ฐ์Šค๋Ÿฌ์šด ํด๋ž˜์Šค๋ฅผ ์ž˜ ๋“ค์–ด๋‚ด ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์–ด๋–ค ์‚ฌ๋žŒ๋“ค์˜ 10 ๊ฐœ์˜ ์ด๋ฏธ์ง€๊ฐ€ ๋‘ ๊ฐœ์˜ cluster๋กœ ๋‚˜๋‰˜๋Š” ๋“ฑ์˜ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค. ๋ณดํ†ต ์ด๋ฏธ์ง€์˜ ํ•˜์œ„ ์ง‘ํ•ฉ์ด ๋จธ๋ฆฌ๊ฐ€ ํฌ๊ฒŒ ๋‹ค๋ฅธ ๋ฐฉํ–ฅ์„ ํ–ฅํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋˜๋Š” ๋งค์šฐ ๋‹ค๋ฅธ ํ‘œํ˜„์ด๋‚˜ ์•ˆ๊ฒฝ์„ ๊ฐ€์ง€๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค๋ฅธ cluster๋กœ ์ธ์‹ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • Isomap๊ณผ LLE๋Š” ๋ฐ์ดํ„ฐ์˜ ํด๋ž˜์Šค ๊ตฌ์กฐ์— ๋Œ€ํ•ด ๊ฑฐ์˜ ํ†ต์ฐฐ๋ ฅ์„ ์ œ๊ณตํ•˜์ง€ ๋ชปํ•˜๋Š” ์†”๋ฃจ์…˜์„ ์ œ๊ณตํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • Sammon ๋งตํ•‘์— ์˜ํ•ด ์ƒ์„ฑ๋œ ๋งต์€ ๊ฐ ํด๋ž˜์Šค์˜ ๋ฉค๋ฒ„ ์ค‘ ์ƒ๋‹น์ˆ˜๋ฅผ ์„œ๋กœ ๊ฐ€๊น๊ฒŒ ๋ชจ๋ธ๋งํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— isomap์ด๋‚˜ LLE๋ณด๋‹ค๋Š” ์ข‹์•˜์ง€๋งŒ, ์–ด๋–ค ํด๋ž˜์Šค๋„ ๋งต ์ƒ์—์„œ๋Š” ๋ช…ํ™•ํžˆ ๋ถ„๋ฆฌ๋˜์–ด ์žˆ์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.
  • Olivetti ์–ผ๊ตด ์ด๋ฏธ์ง€์—์„œ ํ”ฝ์…€ ๊ณต๊ฐ„์—์„œ ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ 1๋ช…๋‹น 10 ๊ฐœ์˜ ์ด๋ฏธ์ง€๊ฐ€ ์ž์—ฐ์Šค๋Ÿฌ์šด ํด๋ž˜์Šค๋ฅผ ํ˜•์„ฑํ•œ๋‹ค๋Š” ๊ฒƒ์ด ๋ช…ํ™•ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

Source : ๋…ผ๋ฌธ ๋ณธ๋ฌธ

  • Figure 5๋Š” t-SNE, Sammon ๋งคํ•‘, Isomap ๋ฐ LLE์„ COIL20 ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ์ ์šฉํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.
  • 20 ๊ฐœ์˜ ๊ฐ์ฒด ์ค‘ ๋งŽ์€ ๋ถ€๋ถ„์—์„œ t-SNE์€ ๋‹ซํžŒ ๋ฃจํ”„์™€ ๊ฐ™์ด 1 ์ฐจ์› ๊ด€์ ์˜ ๋‹ค์–‘์„ฑ์„ ์ •ํ™•ํ•˜๊ฒŒ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•ž๋ฉด๊ณผ ๋’ท๋ฉด์—์„œ ๋น„์Šทํ•˜๊ฒŒ ๋ณด์ด๋Š” ๋ฌผ์ฒด์˜ ๊ฒฝ์šฐ, t-SNE๊ฐ€ ๋ฃจํ”„๋ฅผ ์™œ๊ณกํ•˜์—ฌ ์•ž๋ฉด๊ณผ ๋’ท๋ฉด์˜ ์ด๋ฏธ์ง€๊ฐ€ ๊ฐ€๊นŒ์šด ์ง€์ ์— ๋งคํ•‘๋ฉ๋‹ˆ๋‹ค. COIL-20 ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ์žˆ๋Š” 4 ๊ฐ€์ง€ ์œ ํ˜• (four manifolds)๋ฅผ ๋ช…ํ™•ํžˆ ๊ตฌ๋ถ„ํ•ด๋ƒ…๋‹ˆ๋‹ค.
  • ํ•˜์ง€๋งŒ t-SNE๋ฅผ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€ 3๊ฐœ์˜ ๋ฐฉ๋ฒ•๋ก ๋“ค์€ ์ด๋ฅผ ๋ช…ํ™•ํ•˜๊ฒŒ ๋ถ„๋ฆฌํ•ด๋‚ด์ง€ ๋ชป ํ•ฉ๋‹ˆ๋‹ค. Figure 5๋Š” ๋˜ํ•œ ๋‹ค๋ฅธ ์„ธ ๊ฐ€์ง€ ๊ธฐ์ˆ ์ด ๋งค์šฐ ๋‹ค๋ฅธ ๋Œ€์ƒ์— ํ•ด๋‹นํ•˜๋Š” ๋งค๋‹ˆ ํด๋“œ๋ฅผ ๊นจ๋—ํ•˜๊ฒŒ ๋ถ„๋ฆฌํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฑฐ์˜ ๋น„์Šทํ•˜์ง€ ์•Š์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • ๋˜ํ•œ Isomap๊ณผ LLE๋Š” COIL-20 ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ์†Œ์ˆ˜์˜ ํด๋ž˜์Šค๋งŒ ์‹œ๊ฐํ™”ํ•ฉ๋‹ˆ๋‹ค.

5. Applying t-SNE to Large Data Sets

  • ๋‹ค๋ฅธ ๋งŽ์€ ์‹œ๊ฐํ™” ๊ธฐ์ˆ ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ t-SNE๋Š” quadratic in the number of datapoints๋ฅผ ๊ณ„์‚ฐํ•˜๋Š”๋ฐ๋Š” ๋ฉ”๋ชจ๋ฆฌ ๋ณต์žก์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. t-SNE์˜ ํ‘œ์ค€ ๋ฒ„์ „์„ 10,000 ํฌ์ธํŠธ ์ด์ƒ์„ ํฌํ•จํ•˜๋Š” ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ์ ์šฉํ•˜๋Š” ๊ฒƒ์€ ์‹คํ–‰ ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

Source : ๋…ผ๋ฌธ ๋ณธ๋ฌธ

  • ์ด๋ฒˆ ํŒŒํŠธ์—์„œ๋Š” ์ „์ฒด(๋Œ€๋žต์ ์œผ๋กœ ๋งค์šฐ ํฐ) ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์˜ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ t-SNE๋ฅผ ์ˆ˜์ •ํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ (์ผ๋ช… landmark point)์˜ ๋žœ๋ค ์„œ๋ธŒ์…‹์„ ํ‘œ์‹œํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋งค์šฐ ํฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ๋žœ๋ค ์„œ๋ธŒ ์„ธํŠธ๋ฅผ ํ‘œ์‹œํ•˜๋„๋ก t-SNE๋ฅผ ์ˆ˜์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. (Figure 6 ์ฐธ๊ณ )

  • ํ‘œ์‹œ๋˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ๊ธฐ๋ณธ ๋งค๋‹ˆํด๋“œ์— ๋Œ€ํ•ด ์ œ๊ณตํ•˜๋Š” ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ฐ‘์— ์˜ˆ์‹œ๋ฅผ ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

    • A, B, C๊ฐ€ ๋ชจ๋‘ ๊ณ ์ฐจ์› ๊ณต๊ฐ„์—์„œ ๋“ฑ๊ฑฐ๋ฆฌ์— ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
    • A์™€ B ์‚ฌ์ด์— ๋งŽ์€ ํ‘œ์‹œ๋˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ์žˆ๊ณ  A์™€ C ์‚ฌ์ด์— ์—†๋Š” ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ๋งŽ์œผ๋ฉด A์™€ B๊ฐ€ A์™€ C์™€ ๋™์ผํ•œ ํด๋Ÿฌ์Šคํ„ฐ์˜ ์ผ๋ถ€๊ฐ€ ๋  ํ™•๋ฅ ์ด ๋” ๋†’์Šต๋‹ˆ๋‹ค.
  • ํ•ด๋‹น ๋ฐฉ๋ฒ•๋ก ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ˆœ์„œ๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.

    • ์›ํ•˜๋Š” ์ด์›ƒ ์ˆ˜๋ฅผ ์„ ํƒํ•˜๊ณ  ๋ชจ๋“  ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์— ๋Œ€ํ•ด ์ธ์ ‘ ๊ทธ๋ž˜ํ”„๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ์—ฐ์‚ฐ๋Ÿ‰์ด ๋งŽ์ง€๋งŒ ํ•„์š”ํ•œ ๊ณผ์ •์ด๊ธฐ์— ํ•œ ๋ฒˆ๋งŒ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
    • ๊ฐ ๋žœ๋“œ๋งˆํฌ ํฌ์ธํŠธ์— ๋Œ€ํ•ด, ๋žœ๋“œ๋งˆํฌ ํฌ์ธํŠธ์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ๋‹ค๋ฅธ ๋žœ๋“œ๋งˆํฌ ํฌ์ธํŠธ์— ๋„์ฐฉํ•˜์ž๋งˆ์ž ์ƒˆ๋กœ์šด ๋žœ๋ค ์›Œํฌ๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
    • ๋žœ๋ค์›Œํฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋™์•ˆ, ๋…ธ๋“œ xi์—์„œ ๋…ธ๋“œ xj๋กœ ๋ฐœ์‚ฐํ•˜๋Š” ์—์ง€๋ฅผ ์„ ํƒํ•  ํ™•๋ฅ ์€ eโˆ’โˆฃxiโˆ’xjโˆฃ2e^{-|x_{i}-x_{j}|^{2}}์— ๋น„๋ก€ํ•ฉ๋‹ˆ๋‹ค.
    • pjโˆฃip_{j|i}๋ฅผ ๋žœ๋“œ๋งˆํฌ ํฌ์ธํŠธ xi์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ๋žœ๋“œ๋งˆํฌ ํฌ์ธํŠธ xj์—์„œ ๋๋‚˜๋Š” ๋ฌด์ž‘์œ„ ๋„๋ณด์˜ ๋น„์œจ๋กœ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด ๋ฐฉ๋ฒ•์€ Isomap์ด ์  ์‚ฌ์ด์˜ ์Œ ๋ฐฉํ–ฅ ๊ฑฐ๋ฆฌ๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐฉ์‹๊ณผ ๋‹ฎ์•˜์ง€๋งŒ, diffusion map (Lafon and Lee, 2006; Nadler et al., 2006)์—์„œ์™€ ๊ฐ™์ด ๊ฐ€์žฅ ์งง์€ ์ธ์ ‘(shortest path)๋งŒ์„ ๋ณด๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ, ์ด๋ฅผ ํ™œ์šฉํ•ด ๋žœ๋ค ์›Œํฌ ๊ธฐ๋ฐ˜์˜ ์ธก์ •์น˜๊ฐ€ ์ธ์ ‘ ๊ทธ๋ž˜ํ”„๋ฅผ ํ†ตํ•ด ๋ชจ๋“  ๊ฒฝ๋กœ์— ํ†ตํ•ฉ๋˜๋„๋ก ํ•˜์—ฌ ๊ฒฐ๊ณผ์ ์œผ๋กœ ๋ฌด์ž‘์œ„ ์›Œํฌ ๊ธฐ๋ฐ˜ ์ธก์ •์€ short-circuit (Lee and Verleysen, 2005)์— ํ›จ์”ฌ ๋œ ๋ฏผ๊ฐํ•ฉ๋‹ˆ๋‹ค.

  • ๋žœ๋ค ์›Œํฌ ๊ธฐ๋ฐ˜ ์œ ์‚ฌ๋„(pjโˆฃip_{j|i})๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฐ€์žฅ ํ™•์‹คํ•œ ๋ฐฉ๋ฒ•์€ 100๋งŒ๋ฒˆ์˜ ๋žœ๋ค ์›Œํฌ๋ฅผ ์‰ฝ๊ฒŒ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๊ฐ์•ˆํ• ๋•Œ, ์ด๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  • ๋˜๋Š” Grady (2006)๋Š” sparse ์„ ํ˜• ์‹œ์Šคํ…œ์„ ํ•ด๊ฒฐํ•˜๋Š” ์Œ ๋ฐฉํ–ฅ ์œ ์‚ฌ๋„ย ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ถ„์„ ์†”๋ฃจ์…˜์„ ์ œ์‹œํ•˜๊ธฐ๋„ ํ•˜์˜€์Šต๋‹ˆ๋‹ค. (์˜ˆ๋น„ ์‹คํ—˜์—์„œ Random Walk(๋ฌด์ž‘์œ„ ๊ฑธ์Œ)๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ๊ณผ ๋ถ„์„์  ์†”๋ฃจ์…˜ ์‚ฌ์ด์— ํฐ ์ฐจ์ด์ ์„ ๋ฐœ๊ฒฌํ•˜์ง€ ๋ชปํ•จ)

Source : ๋…ผ๋ฌธ ๋ณธ๋ฌธ

  • Figure 7์€ ๋žœ๋ค ์›Œํฌ ๋ฒ„์ „์„ ์ ์šฉํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋กœ, t-SNE๋ฅผ MNIST ๋ฐ์ดํ„ฐ ์„ธํŠธ(60,000 ์ˆซ์ž)๋กœ๋ถ€ํ„ฐ ๋ฌด์ž‘์œ„๋กœ ์„ ํƒ๋œ 6000๊ฐœ๋ฅผ ์ด์šฉํ•˜์—ฌ pairwise ์œ ์‚ฌ์„ฑ pjโˆฃip_{j|i}๋ฅผ ๊ณ„์‚ฐํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์‹คํ—˜์—์„œ๋Š” k = 20 ๊ฐ€๊นŒ์šด ์ด์›ƒ๋“ค์˜ ๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฌ์„ฑ๋œ ์ด์›ƒ ๊ทธ๋ž˜ํ”„๋ฅผ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • ์‚ฝ์ž… ๊ทธ๋ฆผ(inset figure)์€ ์ƒ‰์ƒ์ด ์ž๋ฆฟ์ˆ˜์˜ ๋ ˆ์ด๋ธ”์„ ๋‚˜ํƒ€๋‚ด๋Š” ์‚ฐ์ ๋„์™€ ๋™์ผํ•œ ์‹œ๊ฐํ™”๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

  • t-SNE ๋งต์—์„œ๋Š” ๋ชจ๋“  ํด๋ž˜์Šค๊ฐ€ ๋ช…ํ™•ํ•˜๊ฒŒ ๊ตฌ๋ถ„๋ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, t-SNE๋Š” 1, 4, 7, 9์˜ ๋ฐฉํ–ฅ์ด๋‚˜ 2์˜ โ€œ๊ณ ๋ฆฌ ๋ชจ์–‘โ€๊ณผ ๊ฐ™์€ ๊ฐ ํด๋ž˜์Šค ๋‚ด์˜ ๋ณ€ํ˜•์˜ ์ฃผ ํฌ๊ธฐ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

  • t-SNE์˜ ๊ฐ•ํ•œ ์„ฑ๋Šฅ์€ ๋˜ํ•œ ์ €์ฐจ์› ํ‘œํ˜„์— ๋Œ€ํ•ด ํ›ˆ๋ จ๋œ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ด์›ƒ ๋ถ„๋ฅ˜ ์ž์˜ ์ผ๋ฐ˜ํ™” ์˜ค์ฐจ์— ๋ฐ˜์˜ํ•ฉ๋‹ˆ๋‹ค.

  • ์›๋ž˜์˜ 784 ์ฐจ์› ๋ฐ์ดํ„ฐ ์ ์— ๋Œ€ํ•ด ํ›ˆ๋ จ๋œ ์ตœ๊ทผ์ ‘ ์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ์˜ ์ผ๋ฐ˜ํ™” ์˜ค์ฐจ (10 ๋ฐฐ ๊ต์ฐจ ๊ฒ€์ฆ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ธก์ •)๊ฐ€ 5.75 % ์ธ ๋ฐ˜๋ฉด์—, 2 ์ฐจ์›์— ํ›ˆ๋ จ ๋œ ์ตœ๊ทผ์ ‘ ์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ์˜ ์ผ๋ฐ˜ํ™” ์˜ค์ฐจ t-SNE์— ์˜ํ•ด ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ ํ‘œํ˜„์€ ๋‹จ์ง€ 5.13 %์— ๋ถˆ๊ณผํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

6. Discussion

์•ž์˜ ๋‘ ์„น์…˜์—์„œ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ t-SNE์˜ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ์ด ์„น์…˜์—์„œ๋Š” t-SNE์™€ ๋‹ค๋ฅธ ๋น„๋ชจ์ˆ˜ ๊ธฐ๋ฒ•์˜ ์ฐจ์ด์ ์— ๋Œ€ํ•ด ๋…ผ์˜ํ•˜๊ณ , ์•ฝ์  ๋ฐ ๊ฐ€๋Šฅํ•œ ๊ฐœ์„ ์ ์„ ๋…ผ์˜ํ•ฉ๋‹ˆ๋‹ค.

  • PCA ์™€ ๋ฐ€์ ‘ํ•œ ๊ด€๊ณ„๊ฐ€ ์žˆ๋Š” classical scaling(Torgerson, 1952)์€ ๊ณ ์ฐจ์›์  ์Œ ๊ฑฐ๋ฆฌ์™€ ๊ทธ๊ฒƒ๋“ค ์‚ฌ์ด์˜ ์ œ๊ณฑ ์˜ค์ฐจ์˜ ํ•ฉ์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ ์„ ํ˜• ๋ณ€ํ™˜์„ ๋ฐœ๊ฒฌํ•˜์˜€์Šต๋‹ˆ๋‹ค.
    - Classical scaling๊ณผ ๊ฐ™์€ ์„ ํ˜• ๋ฐฉ๋ฒ•์€ ๊ทผ์ฒ˜์˜ ๋ฐ์ดํ„ฐ ์  ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ๋„๋ฆฌ ๋ถ„๋ฆฌ ๋œ ๋ฐ์ดํ„ฐ ์  ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์œ ์ง€ํ•˜๋Š” ๋ฐ ์ดˆ์ ์„ ๋งž์ถ”์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋Š” ๊ณก์„  ๋งค๋‹ˆ ํด๋“œ๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š”๋ฐ ์ข‹์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

  • Classical scaling์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ์œ„ํ•œ ์ค‘์š”ํ•œ ์ ‘๊ทผ ๋ฐฉ๋ฒ•์œผ๋กœ ์ œ์‹œ๋œ ๊ฒƒ์ด ๋ฐ”๋กœ Sammon ๋งคํ•‘ (Sammon, 1969)์ž…๋‹ˆ๋‹ค.
    - ์ด ๋ฐฉ๋ฒ•์€ ๊ฐ๊ฐ์˜ pairwise ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ์˜ ํ‘œํ˜„์—์„œ ์ œ๊ณฑ ์˜ค์ฐจ๋ฅผ ๋†’์€ ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ๋กœ ๋‚˜๋ˆ„์–ด classical scaling์˜ ๋น„์šฉ ํ•จ์ˆ˜๋ฅผ ๋ณ€๊ฒฝํ•ด์ค๋‹ˆ๋‹ค.

    	![Cost Func](https://velog.velcdn.com/images%2Feuisuk-chung%2Fpost%2F2c2e78ec-c91d-4c0d-b6c5-b7ddfa3b1873%2Fimage.png)
    
    	- ๊ฒฐ๊ณผ ๋น„์šฉ ํ•จ์ˆ˜๋Š” ๊ทธ๋ ˆ๋””์–ธํŠธ์˜ ์œ ๋„๋ฅผ ๋‹จ์ˆœํ™”ํ•˜๊ธฐ ์œ„ํ•ด ํ•ฉ๊ณ„ ์™ธ๋ถ€์˜ ์ƒ์ˆ˜๊ฐ€ ์ถ”๊ฐ€๋˜๋Š” ์œ„์น˜์— ์˜ํ•ด ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค.
    	- Sammon ๋น„์šฉ ํ•จ์ˆ˜์˜ ๊ฐ€์žฅ ํฐ ์•ฝ์ ์€ ์ง€๋„์—์„œ ์Œ๊ฑฐ๋ฆฌ๋ฅผ ์œ ์ง€ํ•˜๋Š” ์ค‘์š”์„ฑ์ด pairwise ๊ฑฐ๋ฆฌ์˜ ์ž‘์€ ์ฐจ์ด์— ํฌ๊ฒŒ ์˜์กดํ•œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.
    	- ์ž‘์€ pairwise ๊ฑฐ๋ฆฌ๊ฐ€ ๋ฐ์ดํ„ฐ์˜ ๋กœ์ปฌ ๊ตฌ์กฐ๋ฅผ ๊ณ ๋ คํ•  ๋•Œ ์ž‘์€ ์Œ ๊ฑฐ๋ฆฌ์— ๊ฑฐ์˜ ๋™์ผํ•œ ์ค‘์š”์„ฑ์„ ๋ถ€์—ฌํ•˜๋Š” ๊ฒƒ์ด ๋” ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ๋งค์šฐ ๊ทผ์ ‘ํ•œ 2 ๊ฐœ์˜ ๊ณ ์ฐจ์› ์  ๋ชจ๋ธ์˜ ์ž‘์€ ์˜ค์ฐจ๋Š” ๋น„์šฉ ํ•จ์ˆ˜์— ํฐ ๊ธฐ์—ฌ๋ฅผ ํ•ฉ๋‹ˆ๋‹ค.
  • Sammon ๋งคํ•‘๊ณผ๋Š” ๋‹ฌ๋ฆฌ, t-SNE์— ์˜ํ•œ ๊ณ ์ฐจ์› ๊ณต๊ฐ„์—์„œ ์‚ฌ์šฉ๋œ ๊ฐ€์šฐ์‹œ์•ˆ ์ปค๋„์€ ๋ฐ์ดํ„ฐ์˜ ๋กœ์ปฌ ๋ฐ ๊ธ€๋กœ๋ฒŒ ๊ตฌ์กฐ์™€ ๊ฐ€์šฐ์Šค์˜ ํ‘œ์ค€ ํŽธ์ฐจ์— ๋น„๋ก€ํ•˜์—ฌ ์„œ๋กœ ๊ฐ€๊นŒ์šด ๋ฐ์ดํ„ฐ ์  ์Œ์— ๋Œ€ํ•œ ์†Œํ”„ํŠธ ๊ฒฝ๊ณ„๋ฅผ ์ •์˜ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
    - ๋ถ„๋ฆฌ๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ฒƒ์˜ ์ค‘์š”์„ฑ์€ ๋ถ„๋ฆฌ์˜ ํฌ๊ธฐ์™€ ๊ฑฐ์˜ ๋ฌด๊ด€ํ•ฉ๋‹ˆ๋‹ค.
    - t-SNE๋Š” ๋ฐ์ดํ„ฐ์˜ ๋กœ์ปฌ ๋ฐ€๋„์— ๊ธฐ์ดˆํ•˜์—ฌ ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์— ๋Œ€ํ•œ ๋กœ์ปฌ ์ธ์ ‘ ํฌ๊ธฐ๋ฅผ ๊ฐœ๋ณ„์ ์œผ๋กœ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

  • Isomap์— ๋น„ํ•ด t-SNE๊ฐ€ ๊ฐ•๋ ฅํ•œ ์ด์œ  : Isomap์˜ โ€œshort-circuitingโ€์— ๋Œ€ํ•œ ์ทจ์•ฝ์„ฑ + Isomap์€ ์ฃผ๋กœ ์ž‘์€ ์ธก์ง€ ๊ฑฐ๋ฆฌ๋ณด๋‹ค๋Š” ํฐ ์ธก์ง€์„  ๊ฑฐ๋ฆฌ๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” ๋ฐ ์ค‘์ ์„ ๋‘๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

  • LLE์— ๋น„ํ•ด t-SNE๊ฐ€ ๊ฐ•๋ ฅํ•œ ์ด์œ  : LLE๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ชจ๋“  ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ๋‹จ์ผ ์ง€์ ์œผ๋กœ ์ ‘ํžˆ๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์ € ์ฐจ์› ํ‘œํ˜„์˜ ๊ณต๋ถ„์‚ฐ์— ๋Œ€ํ•œ ์ œ์•ฝ์„ ๋‘ก๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ์ด ์ œ์•ฝ ์กฐ๊ฑด์€ ๋Œ€๋‹ค์ˆ˜์˜ ๋งต ํฌ์ธํŠธ๋ฅผ ๋งต์˜ ์ค‘์•™์— ๋ฐฐ์น˜ํ•˜๊ณ  ๋ช‡ ๊ฐœ์˜ ๋„๋ฆฌ ๋ถ„์‚ฐ ๋œ ํฌ์ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํฐ ๊ณต๋ถ„์‚ฐ์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

  • ๋˜ํ•œ Isomap ๋ฐ LLE์™€ ๊ฐ™์€ ์ธ์ ‘ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ๊ธฐ์ˆ ์€ ๋‘ ๊ฐœ ์ด์ƒ์˜ ๋„๋ฆฌ ๋ถ„๋ฆฌ ๋œ ํ•˜์œ„ ๋งค๋‹ˆ ํด๋“œ๋กœ ๊ตฌ์„ฑ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์‹œ๊ฐํ™” ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ๋Š” ์—ฐ๊ฒฐ๋œ ์ธ์ ‘ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ฐœ์ƒ์‹œํ‚ค์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์—ฐ๊ฒฐ๋œ ๊ฐ ๊ตฌ์„ฑ ์š”์†Œ์— ๋Œ€ํ•ด ๋ณ„๋„์˜ ์ง€๋„๋ฅผ ์ƒ์„ฑ ํ•  ์ˆ˜๋„ ์žˆ์ง€๋งŒ ์ด ๋ฐฉ๋ฒ•์€ ์ƒ๋Œ€์  ์œ ์‚ฌ์„ฑ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์žƒ์–ด๋ฒ„๋ฆฌ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

6.2. Weaknesses

  • t- SNE๋Š” ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•œ ๋‹ค๋ฅธ ๊ธฐ์ˆ ๊ณผ ๋น„๊ตํ•˜์—ฌ ์œ ๋ฆฌํ•˜์ง€๋งŒ, tSNE๋Š” ์„ธ ๊ฐ€์ง€ ์ž ์žฌ์ ์ธ ์•ฝ์ ์„ ๊ฐ–์Šต๋‹ˆ๋‹ค.
    1. t-SNE์ด ์ผ๋ฐ˜์ ์ธ ์ฐจ์› ์ถ•์†Œ ์ž‘์—…์—์„œ ๋ถ„๋ช…ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
    2. t-SNE๋Š” ์ฐจ์›์˜ ์ €์ฃผ์— ๋ฏผ๊ฐํ•ฉ๋‹ˆ๋‹ค.
    3. t-SNE์˜ ๋น„์šฉ ํ•จ์ˆ˜๊ฐ€ ์ „์—ญ ์ตœ์ ๊ฐ’์œผ๋กœ ์ˆ˜๋ ด๋œ๋‹ค๋Š” ๋ณด์žฅ์ด ์—†์Šต๋‹ˆ๋‹ค.

1. ์•ฝ์  1 : ๋‹ค๋ฅธ ๋ชฉ์ ์„ ์œ„ํ•œ ์ฐจ์› ์ถ•์†Œ์—๋Š” tSNE๊ฐ€ ์ ํ•ฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

  • t-SNE๋Š” 2์ฐจ์› ~3์ฐจ์›์œผ๋กœ ์ถ•์†Œ๋˜๋Š” ๊ฒƒ๊นŒ์ง€ ์œ ํšจํ•˜์ง€๋งŒ, ๋ฐ์ดํ„ฐ์˜ ์ฐจ์›์ด 3 ์ฐจ์› ์ดˆ๊ณผ๋กœ ์ถ•์†Œ๋˜๋Š” ๊ฒฝ์šฐ์—์„œ t-SNE์ด ์–ด๋–ป๊ฒŒ ์ˆ˜ํ–‰๋˜๋Š”์ง€๋Š” ๋ถ„๋ช…ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • ๋ฐ์ดํ„ฐ๋ฅผ 2 ์ฐจ์› ๋˜๋Š” 3 ์ฐจ์›์œผ๋กœ ์ถ•์†Œ ํ•  ๋•Œ t-SNE์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‘๊บผ์šด ๊ผฌ๋ฆฌ ๋•Œ๋ฌธ์— d > 3 ์ฐจ์›์œผ๋กœ ์‰ฝ๊ฒŒ Projection ์‹œํ‚ค๋Š” ๊ฑด ์ข‹์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • Student-t ๋ถ„ํฌ์˜ ๊ณ ์ฐจ์› ๊ณต๊ฐ„์—์„œ ๋ฌด๊ฑฐ์šด ๊ผฌ๋ฆฌ๋Š” ํ™•๋ฅ  ์งˆ๋Ÿ‰์˜ ์ƒ๋Œ€์ ์œผ๋กœ ํฐ ๋ถ€๋ถ„์„ ์ฐจ์ง€ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  • ๋ฐ์ดํ„ฐ์˜ ์ฐจ์›์ด 3๋ณด๋‹ค ํฐ ์ฐจ์›์œผ๋กœ ์ถ•์†Œ๋˜์–ด์•ผ ํ•˜๋Š” ์ž‘์—…์˜ ๊ฒฝ์šฐ, 1 ์ด์ƒ์˜ ์ž์œ ๋„๋ฅผ ๊ฐ–๋Š” Student-t ๋ถ„ํฌ๊ฐ€ ๋” ์ ํ•ฉ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

2. ์•ฝ์  2 : ์ฐจ์›์˜ ์ €์ฃผ์— ๋ฏผ๊ฐํ•ฉ๋‹ˆ๋‹ค.

  • t-SNE๋Š” ๋ฐ์ดํ„ฐ์˜ ๋กœ์ปฌ ์†์„ฑ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐ์ดํ„ฐ์˜ ์ฐจ์›์„ ์ค„์—ฌ์ฃผ๋ฏ€๋กœ t-SNE๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ณ ์œ ํ•œ ์ฐจ์›์˜ ์ €์ฃผ์— ๋ฏผ๊ฐํ•ฉ๋‹ˆ๋‹ค.
  • ๋†’์€ ๊ณ ์œ  ์ฐจ์›๊ณผ ๋‹ค์–‘์„ฑ์„ ๊ฐ–๋Š” ๊ธฐ๋ณธ ๋งค๋‹ˆํด๋“œ๊ฐ€ ์žˆ๋Š” ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ t-SNE์ด ์•”๋ฌต์ ์œผ๋กœ ๊ฐ€๊นŒ์šด ์ด์›ƒ ์‚ฌ์ด์˜ ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋งŒ๋“œ๋Š” ๋งค๋‹ˆํด๋“œ์˜ ๋กœ์ปฌ ์„ ํ˜•์„ฑ ๊ฐ€์ •์ด ์œ„๋ฐ˜ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • t-SNE๊ฐ€ ๋งค์šฐ ๋†’์€ ๊ณ ์œ  ์ฐจ์› (intrinsic dimensionality)์„ ๊ฐ–๋Š” ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ์ ์šฉ๋˜๋ฉด ์„ฑ๊ณตํ•˜์ง€ ๋ชปํ•  ์ˆ˜๋„ ์žˆ๋‹ค. ์˜ˆ๋กœ Meytlis and Sirovich (2007)์˜ ์ตœ๊ทผ ์—ฐ๊ตฌ์— ๋”ฐ๋ฅด๋ฉด ์–ผ๊ตด์˜ ์ด๋ฏธ์ง€ ๊ณต๊ฐ„์€ ๋Œ€๋žต 100 ์ฐจ์›์ด๋‚˜ ๋ฉ๋‹ˆ๋‹ค.
  • Isomap ๋ฐ LLE์™€ ๊ฐ™์€ ๋งค๋‹ˆ ํด๋“œ ํ•™์Šต์ž๋Š” ๋˜‘๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ๊ฒช๊ณ ์žˆ๋‹ค. ์ฐจ์›์˜ ์ €์ฃผ ๋ฌธ์ œ๋ฅผ (๋ถ€๋ถ„์ ์œผ๋กœ) ํ•ด๊ฒฐํ•  ์ˆ˜์žˆ๋Š” ๋ฐฉ๋ฒ•์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
    - Autoencoder๊ณผ ๊ฐ™์ด ๋‹ค์–‘ํ•œ ๋น„์„ ํ˜• ๋ ˆ์ด์–ด์—์„œ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์˜ ๋งค๋‹ˆํด๋“œ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ‘œํ˜„ํ•˜๋Š” ๋ชจ๋ธ์—์„œ ์–ป์€ ๋ฐ์ดํ„ฐ ํ‘œํ˜„์— ๋Œ€ํ•ด t-SNE๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
    - ๋”ฅ๋ ˆ์ด์–ด ์•„ํ‚คํ…์ฒ˜๋Š” ๋ณต์žกํ•œ ๋น„์„ ํ˜• ํ•จ์ˆ˜๋ฅผ ํ›จ์”ฌ ๋‹จ์ˆœํ•œ ๋ฐฉ์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    - Autoencoder๊ฐ€ t-SNE์™€ ๊ฐ™์€ ๋กœ์ปฌ ๋ฉ”์†Œ๋“œ๋ณด๋‹ค ๋” ์ž˜ ๋ณ€ํ™”ํ•˜๋Š” ๋งค๋‹ˆ ํด๋“œ๋ฅผ ๋” ์ž˜ ์‹๋ณ„ ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด autoencoder์— ์˜ํ•ด ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ ํ‘œํ˜„์—์„œ t-SNE๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ฉด ์‹œ๊ฐํ™”์˜ ํ’ˆ์งˆ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    - ๊ทธ๋Ÿฌ๋‚˜ ๋ณธ์งˆ์ ์œผ๋กœ ๊ณ ์ฐจ์›์ ์ธ ๊ตฌ์กฐ๋ฅผ ์™„์ „ํžˆ ํ‘œํ˜„ํ•˜๋Š” ๊ฒƒ์€ ์ •์˜์ƒ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค.

3. ์•ฝ์  3 : t-SNE ๋น„์šฉ ํ•จ์ˆ˜์˜ Non - convexity

  • ๋Œ€๋ถ€๋ถ„์˜ ์ตœ์ฒจ๋‹จ ์ฐจ์› ์ถ•์†Œ ๊ธฐ์ˆ  (์˜ˆ : classical scaling, Isomap, LLE)์˜ ์ข‹์€ ํŠน์„ฑ์€ ๋น„์šฉ ํ•จ์ˆ˜์˜ ๋ณผ๋ก์„ฑ์˜ ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • t-SNE์˜ ์ฃผ์š” ๋‹จ์ ์€ ๋น„์šฉ ํ•จ์ˆ˜๊ฐ€ ๋ณผ๋กํ•˜์ง€ ์•Š๊ธฐ๋•Œ๋ฌธ์— ์—ฌ๋Ÿฌ ์ตœ์ ํ™” ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์„ ํƒํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ ์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ , ๊ทธ์— ๋”ฐ๋ผ ๊ตฌ์„ฑ๋œ ์†”๋ฃจ์…˜์ด ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค.
  • t-SNE๊ฐ€ ๋งต ํฌ์ธํŠธ์˜ ์ดˆ๊ธฐ ๋ฌด์ž‘์œ„ ๊ตฌ์„ฑ์—์„œ ์‹คํ–‰๋  ๋•Œ๋งˆ๋‹ค ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์œผ๋‚˜, ๋‹ค์–‘ํ•œ ์ตœ์ ํ™” ๋งค๊ฐœ ๋ณ€์ˆ˜๊ฐ€ ๋‹ค์–‘ํ•œ ์‹œ๊ฐํ™” ์ž‘์—…์— ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆํ–ˆ์œผ๋ฉฐ ์ตœ์ ์˜ ํ’ˆ์งˆ์€ ์‹คํ–‰๋งˆ๋‹ค ํฌ๊ฒŒ ๋‹ค๋ฅด์ง€ ์•Š์Œ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.

๋…ผ๋ฌธ๊ตฌํ˜„

๋ณธ ๋…ผ๋ฌธ reproduction ๊ณผ์ œ์— ์žˆ์–ด์„œ ๊ตฌํ˜„์„ ํ•ด๋ณธ ๋ถ€๋ถ„์€ t-SNE ๋…ผ๋ฌธ์˜ experiment ๋‚ด์šฉ ์ค‘ MNIST Dataset์— ๋Œ€ํ•œ ๋‚ด์šฉ์ด๋‹ค. ๋…ผ๋ฌธ๊ณผ ๋™์ผํ•œ ์‹คํ—˜ํ™˜๊ฒฝ์œผ๋กœ ์…‹ํŒ…ํ–ˆ์œผ๋ฉฐ, ํ•จ์ˆ˜ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•ด ํ•จ์ˆ˜๋งŒ ๊ตฌํ˜„ํ•˜๊ณ  ์‹ค์ œ๋กœ ๋Œ๋ฆฐ ๋ชจ๋ธ์„ ๋Œ๋ฆด๋•Œ๋Š” sklearn ๋‚ด์žฅํ•จ์ˆ˜๋ฅผ ์ด์šฉํ–ˆ๋‹ค. ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋…ผ๋ฌธ์— ๋‚˜์˜จ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ’์„ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ๋…ผ๋ฌธ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ Isomap๊ณผ LLE ๋˜ํ•œ ์ ์šฉํ•˜์—ฌ ๋น„๊ต๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. t-SNE์˜ ๋ฒ ์ด์Šค๋ผ์ธ ์ฝ”๋“œ๋Š” ์•„๋ž˜ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ๊ตฌํ˜„ํ•˜์˜€๋‹ค. (https://nlml.github.io/in-raw-numpy/in-raw-numpy-t-sne/)

๊ธฐ๋ณธ metric ์—ฐ์‚ฐ ํ•จ์ˆ˜

# neg-์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ ํ•จ์ˆ˜(-|xi-xj|^2)
def neg_squared_euc_dists(X):
    sq_sum = np.square(np.linalg.norm(X, ord = 2, axis = 1))
    D = np.add(np.add(-2 * np.dot(X, X.T), sq_sum).T, sq_sum)
    return -D

# ๊ฑฐ๋ฆฌ๋ฅผ ํ™•๋ฅ ๊ฐ’๋“ค๋กœ ๋ณ€ํ™˜ํ•ด์ฃผ๋Š” ํ•จ์ˆ˜
def calc_prob_matrix(distances, sigmas=None):
    if sigmas is not None:
        two_sig_sq = 2.0 * np.square(sigmas.reshape((-1, 1)))
        return scipy.special.softmax((distances / two_sig_sq), axis = 1)
    else:
        return scipy.special.softmax(distances, axis = 1)

# Perplexity ๋ฐ˜ํ™˜ ํ•จ์ˆ˜
def perplexity(distances, sigma):
    probability_matrix = calc_prob_matrix(distances, sigma)
    Hp = -np.sum(probability_matrix * np.log2(probability_matrix), 1)
    return 2**Hp

# ์˜ˆ์ธก๊ฐ’์ด ํƒ€๊ฒŸ๊ฐ’๊ณผ ๊ฐ™์•„์ง€๋„๋ก ํƒ์ƒ‰์„ ๋ฐ˜๋ณตํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜
def binary_search(function, target, tol=1e-10, max_iter=10000, 
                  lower=1e-20, upper=1000.):
    
    for i in range(max_iter):
        guess = (lower + upper) / 2.
        value = function(guess)
        
        if value > target: upper = guess
        else:  lower = guess
            
        # break if target = prediction
        if np.abs(value - target) <= tol:
            break
            
    return guess

# ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์–ป์€ ๊ฑฐ๋ฆฌ ํ–‰๋ ฌ์˜ ๊ฐ ํ–‰์— ์ ํ•ฉํ•œ ์‹œ๊ทธ๋งˆ ๋ฐ˜ํ™˜
def find_optimal_sigmas(distances, target_perplexity):
    sigmas = [] 
    for i in range(distances.shape[0]):
        # Perplexity calculation for one row
        function = lambda sigma:  perplexity(distances[i:i+1, :], np.array(sigma))
        # return correct sigmas
        correct_sigma = binary_search(function, target_perplexity)
        sigmas.append(correct_sigma)
    return np.array(sigmas)

๋…ผ๋ฌธ์—์„œ ์ •์˜๋œ Equation

*# Eauation 1 - p(j|i)*
def p_ji_prob(X, target_perplexity):
    distances = neg_squared_euc_dists(X)
    sigmas = find_optimal_sigmas(distances, target_perplexity)
    probability_matrix = calc_prob_matrix(distances, sigmas)
	  p_ji_output =(probability_matrix + probability_matrix.T) / (2.0 * probability_matrix.shape[0])
    return p_ji_output

# Equation 4 - q(ij)
def q_tsne(Y):
    distances = neg_squared_euc_dists(Y)
    inv_distances = np.power(1.0 - distances, -1)
    np.fill_diagonal(inv_distances, 0.)
    q_tsne = inv_distances / np.sum(inv_distances)
    return q_tsne, inv_distances

# Equation 5 - Gradient of Cost
def tsne_grad(P, Q, Y, inv_distances):
    pq_diff = P - Q  
    pq_expanded = np.expand_dims(pq_diff, 2) *#p(ij)-q(ij)*
    y_diffs = np.expand_dims(Y, 1) - np.expand_dims(Y, 0) *#y(i)-y(j)*
    distances_expanded = np.expand_dims(inv_distances, 2) *#(1+||yi-yj||^2)^(-1)*

    return 4. * (pq_expanded * y_diffs * distances_expanded).sum(1)

# Y ์—…๋ฐ์ดํŠธ ํ•จ์ˆ˜
def estimate_sne(X, y, P, num_iters, q_function, gradient_function, learning_rate, momentum):
    
    Y = np.random.RandomState(1234).normal(0., 0.0001, [X.shape[0], 2])
    Y_m2 = Y.copy(); Y_m1 = Y.copy()
    
    *# Gradient Descent*
    for i in range(num_iters):
        Q, distances = q_function(Y)
        gradient = gradient_function(P, Q, Y, distances)

        *# Y Update*
        Y = Y - learning_rate * gradient
        Y += momentum * (Y_m1 - Y_m2)
        Y_m2 = Y_m1.copy()
        Y_m1 = Y.copy()
        
    return Y

๋…ผ๋ฌธ ์‹คํ—˜ ๊ตฌํ˜„ ์ฝ”๋“œ

# import libraries
from __future__ import print_function
import time
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE, LocallyLinearEmbedding, Isomap
%matplotlib inline
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns
# fetch data 
mnist = fetch_openml('mnist_784')
X = mnist.data / 255.0
y = mnist.target
print(X.shape, y.shape)

# make dataframe
feat_cols = [ 'pixel'+str(i) for i in range(X.shape[1]) ]
df = pd.DataFrame(X,columns=feat_cols)
df['y'] = y
df['label'] = df['y'].apply(lambda i: str(i))
X, y = None, None
print('Size of the dataframe: {}'.format(df.shape))
# ๋žœ๋ค ์‹œ๋“œ ์„ค์ •
np.random.seed(42)
rndperm = np.random.permutation(df.shape[0])

# 6000๊ฐœ row sampling
N = 6000
df_subset = df.loc[rndperm[:N],:].copy()
data_subset = df_subset[feat_cols].values

# pca๋ฅผ ์ด์šฉํ•œ ์ฐจ์›์ถ•์†Œ (30์ฐจ์›)
pca = PCA(n_components=30)
pca_result = pca.fit_transform(data_subset)
# fit model
tsne_result = tsne.fit_transform(pca_result)
lle_result = lle.fit_transform(pca_result)
isomap_result = isomap.fit_transform(pca_result)
# visualize t-sne
df_subset['tsne-2d-one'] = tsne_result[:,0]
df_subset['tsne-2d-two'] = tsne_result[:,1]
plt.figure(figsize=(16,10))
sns.scatterplot(
    x="tsne-2d-one", y="tsne-2d-two",
    hue="y",
    palette=sns.color_palette("hls", 10),
    data=df_subset,
    legend="full",
    alpha=0.3
)

# visualize isomap
df_subset['isomap-2d-one'] = isomap_result[:,0]
df_subset['isomap-2d-two'] = isomap_result[:,1]
plt.figure(figsize=(16,10))
sns.scatterplot(
    x="isomap-2d-one", y="isomap-2d-two",
    hue="y",
    palette=sns.color_palette("hls", 10),
    data=df_subset,
    legend="full",
    alpha=0.3
)

# visualize lle
df_subset['lle-2d-one'] = lle_result[:,0]
df_subset['lle-2d-two'] = lle_result[:,1]
plt.figure(figsize=(16,10))
sns.scatterplot(
    x="lle-2d-one", y="lle-2d-two",
    hue="y",
    palette=sns.color_palette("hls", 10),
    data=df_subset,
    legend="full",
    alpha=0.3
)

๊ธ€์„ ๋งˆ๋ฌด๋ฆฌํ•˜๋ฉฐ

  • ์ด๋ฒˆ ํฌ์ŠคํŠธ์—์„œ๋Š” ๋…ผ๋ฌธ์„ ์ฝ๊ณ , ๋…ผ๋ฌธ์˜ ์‹คํ—˜ ํ™˜๊ฒฝ๊ณผ ๋™์ผํ•˜๊ฒŒ 6000๊ฐœ์˜ MNIST ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•ด ๋ณด์•˜์Šต๋‹ˆ๋‹ค.
  • ๋žœ๋ค ์‹œ๋“œ๋กœ ์ถ”์ถœํ•œ 6000๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ผ ์™„์ „ํžˆ ๋ฐ์ดํ„ฐ์˜ ํ˜•ํƒœ๊ฐ€ ๊ฐ™์ง€๋Š” ์•Š์ง€๋งŒ, ํ™•์‹คํžˆ ๋น„๊ต ๋ชจ๋ธ๋ณด๋‹ค t-SNE๊ฐ€ ์ข€ ๋” 2์ฐจ์› ์ƒ์— ๋ฐ์ดํ„ฐ๋ฅผ ์ •ํ™•ํžˆ ๊ตฌ๋ถ„ํ•ด์ฃผ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
  • ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” MNIST ๋ฐ์ดํ„ฐ ์™ธ์—๋„ Olivetti faces, COIL-20 ๋ฐ์ดํ„ฐ ๋“ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€๋Š”๋ฐ ๊ตฌํ˜„ ์ƒ์— ์ฐจ์ด๊ฐ€ ์—†๋‹ค๊ณ  ํŒ๋‹จํ•˜์—ฌ ์ด๋ฅผ ์ƒ๋žตํ•˜๊ณ  MNIST ๋ฐ์ดํ„ฐ(6000๊ฐœ ์ถ”์ถœ) ์‹คํ—˜๋งŒ์„ ๊ตฌํ˜„ํ•˜๊ณ , ์ข€ ๋” ๋…ผ๋ฌธ์„ ๊นŠ๊ฒŒ ํŒŒ๊ณ  ๋“ค๋ ค๊ณ  ๋…ธ๋ ฅํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • t-SNE๋Š” ํ˜„์žฌ๋„ ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ๋ฅผ ์ €์ฐจ์›์— ํ‘œํ˜„ํ• ๋•Œ ๋งŽ์ด ์“ฐ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ, ์ด๋ฒˆ ๋…ผ๋ฌธ ์ •๋ฆฌ ๋ฐ ๊ตฌํ˜„์„ ํ†ตํ•ด ์ถ”ํ›„ ์žˆ์„ ์—ฐ๊ตฌ์— ๋„์›€์ด ๋  ๊ฒƒ์ด๋ผ๊ณ  ๋ฏฟ์Šต๋‹ˆ๋‹ค.

๊ธด ๊ธ€ ์ฝ์–ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค ^~^

profile
Always be passionate โœจ

0๊ฐœ์˜ ๋Œ“๊ธ€