Medical Image Segmentation - Principle Component Analysis (PCA)

Gyuha Park·2021년 8월 25일

Medical Image Analysis

목록 보기

12/21

1. Dimensionality Reduction

PCA는 데이터의 차원을 줄일 때 사용한다. 위 그림은 2D data를 x1 축을 기준으로 projection한 모습이다. 이 경우, 데이터가 골고루 분산 되지 않는 문제가 있다. PCA의 목적은 projection된 data의 분산 값을 최대화 하는 축을 찾는 것이 목적이다.

2. Principal Component Analysis(PCA)

Data $X$ 를 unit vector $\vec{e}$ 에 projection 했을 때, $X\vec{e}$ 는 projection된 data이며 $Var[X\vec{e}]$ 는 projection된 data의 분산 값이다. 그리고 식으로 표현하면 $\vec{e}^TC\vec{e}$ 이다. 아래는 증명 과정이다.

$Var[X\vec{e}]=\cfrac{1}{m-1}\sum\limits_{i=1}^m[X\vec{e}-E(X\vec{e})]^2=\cfrac{1}{m-1}\sum\limits_{i=1}^m[X\vec{e}-E(X)\vec{e}]^2,\ (E(X)=0)$

$\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\cfrac{1}{m-1}\sum\limits_{i=1}^m(X\vec{e})^2\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\cfrac{1}{m-1}(X\vec{e})^T(X\vec{e})$

$\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\cfrac{1}{m-1}\vec{e}^TX^TX\vec{e}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\vec{e}^T(\cfrac{X^TX}{m-1})\vec{e},\ (\cfrac{X^TX}{m-1}=C)$

$\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\vec{e}^TC\vec{e}$

그러므로 PCA는 $\vec{e}^TC\vec{e}$ 를 최대로 하는 $\vec{e},\ (||\vec{e}||^2=1)$ 를 찾는 것이 목적이다.

최적의 $\vec{e}$ 를 찾기 위해 lagrange multiplier method를 이용한다. 식은 다음과 같다.

$L(\vec{e},\lambda)=\vec{e}^TC\vec{e}-\lambda(\vec{e}^T\vec{e}-1)$

$\frac{\partial L}{\partial\vec{e}}=(C+C^T)\vec{e}-2\lambda\vec{e}$

$\ \ \ \ \ \ =2C\vec{e}-2\lambda\vec{e}=0$

$\ \ \ \ \ \ \therefore\ C=\vec{e}\lambda\vec{e}^T$

$\ \ \ \ \ \ \therefore\ C\vec{e}=\lambda\vec{e}$

정리하면 covariance matrix $C$ 의 eigen vector와 eigen value를 구하는 문제가 된다.

위 그림에서 빨간색 eigen vector의 eigen value는 $\lambda_1$ 이며 파란색 eigen vector의 eigen value는 $\lambda_2$ 이다. Eigen value $\lambda_1$ 이 더 큰 분산 값을 갖고 있기 때문에 빨간색 eigen vector로 data를 projection 하면 된다.

3. PCA Implementation

Training set

$x^{(1)},\ x^{(2)},\ \ldots,\ x^{(i)},\ \ldots,\ x^{(m)},\ (x^{(i)}=[x_1^{(i)},\ x_2^{(i)},\ \ldots,\ x_j^{(i)},\ \ldots,\ x_n^{(i)}])$
Preprocessing(feature scaling / mean normalization)

$\mu_j=\cfrac{1}{m}\sum\limits_{i=1}^mx_j^{(i)}$

$dx_j^{(i)}=x_j^{(i)}-\mu_j$
Compute covariance matrix

$C=\cfrac{1}{m}\sum\limits_{i=1}^m(dx^{(i)})(dx^{(i)})^T$
Compute eigenvectors of matrix ( $[U,S,V]=SVD(C)$ )

$U=\begin{bmatrix} |&|&&|\\ u^{(1)}&u^{(2)}&\ldots&u^{(n)}\\|&|&&|\end{bmatrix}\in\mathop{\mathbb{R}}^{n\times n}$
Compute principal components

$z=U_{reduce}^Tx$

4. PCA Reconstruction

Data x를 구한 eigen vector $U_{recude}^T$ 로 projection할 수도 있지만 다시 $U_{reduce}$ 를 곱해 복원할 수도 있다. 단, 복원 시 다른 eigen vector를 고려하지 않기 때문에 근사한 값이다.

Gyuha Park

Medical Imaging & AI

이전 포스트

Atlas based method / Label fusion

다음 포스트