[인사이드 머신러닝] EVD 없이 Eigenvalue 찾기: Power Iteration (Power Method)

cleansky·2021년 9월 27일

인사이드 머신러닝

목록 보기

12/12

Preliminaries

크기가 $n \times n$ 인 임의의 행렬 $\mathbf{A}$ 에 고유값 분해(Eigenvalue Decomposition, EVD)를 적용하면 다음과 같은 관계식을 얻을 수 있다.

\tag{1} \mathbf{A} = \mathbf{Q}\Lambda \mathbf{Q}^T

$\mathbf{Q}$ 의 $i$ 번째 column은 $i$ 번째 right eigenvector1 $\mathbf{q}_i$ 이고, 대각행렬 $\mathbf{\Lambda}$ 의 $i$ 번째 대각성분은 $i$ 번째 eigenvalue $\lambda_i$ 이다.

\tag{2} \mathbf{A}\mathbf{q}_i = \lambda_i \mathbf{q}_i

$\mathbf{A}$ 의 eigenvector들은 basis vector이므로 임의의 벡터 $\mathbf{v}$ 를 다음과 같이 $\mathbf{A}$ 의 eigenvector들의 linear combination으로 나타낼 수 있다.

\tag{3} \mathbf{v} = c_1 \mathbf{q}_1 + c_2 \mathbf{q}_2 + \cdots + c_n \mathbf{q}_n

식 (3)의 양변에 $\mathbf{A}$ 를 곱하면 식 (2)의 관계에 의해 아래와 같이 정리가 가능하다.

\tag{4} \begin{aligned} \mathbf{A}\mathbf{v} &= c_1 \mathbf{A}\mathbf{q}_1 + c_2 \mathbf{A}\mathbf{q}_2 + \cdots + c_n \mathbf{A}\mathbf{q}_n \\ &= c_1 \lambda_1\mathbf{q}_1 + c_2 \lambda_2\mathbf{q}_2 + \cdots + c_n \lambda_n \mathbf{q}_n \end{aligned}

만약, 식 (4)와 같이 양변에 $\mathbf{A}$ 를 곱하는 과정을 $k$ 번 반복한다면,

\tag{5} \begin{aligned} \mathbf{A}^k \mathbf{q}_i&= \mathbf{Q}\Lambda \mathbf{Q}^T \mathbf{Q}\Lambda \mathbf{Q}^T \ \cdots \mathbf{Q}\Lambda \mathbf{Q}^T \mathbf{q}_i\\ &= \mathbf{Q}\Lambda^k\mathbf{Q}^T\mathbf{q}_i \\ &= \lambda_i^k \mathbf{q}_i \end{aligned}

가 되므로 다음의 관계를 만족한다.

\tag{6} \mathbf{A}^k\mathbf{v} = c_1 \lambda_1^k\mathbf{q}_1 + c_2 \lambda_2^k\mathbf{q}_2 + \cdots + c_n \lambda_n^k\mathbf{q}_n

\tag{7} \mathbf{A}^k\mathbf{v} \approx c_1\lambda_1^k\mathbf{q}_1

eigenvector에 스칼라 곱을 적용하더라도 eigenvector가 span하는 공간은 그대로이므로 $\lambda_1$ 에 대응하는 eigenvector $\mathbf{v}_1$ 을 식 (7)로부터 찾을 수 있다. 이때, $k$ 가 충분히 커서 식 (7)의 관계를 만족한다면

\mathbf{A}^{k+1}\mathbf{v} \approx \lambda_1\mathbf{A}^{k}\mathbf{v}

이므로, $k$ 를 증가시켜가며 연산 결과로 주어지는 벡터의 크기(즉, norm)의 변화량을 관찰하고, 그 변화량이 일정하다면 (즉, 크기의 증가율이 특정 임계치보다 작다면), 해당 벡터의 크기가 곧 $|\lambda_1|$ 이 된다. (왜냐하면 eigenvector의 크기는 그 정의에 의해 1이므로) $\lambda_1$ 의 부호는 $|\lambda_1|$ 과 $\mathbf{v}$ 가 주어졌으므로 쉽게 찾을 수 있다. 따라서 다음 장에 요약된 과정을 통해 eigenvalue decomposition 없이 행렬 $\mathbf{A}$ 의 largest eigenvalue를 찾을 수 있다. 아래의 알고리즘을 보면 largest eigenvalue를 찾은 후, 그 영향을 제거한 후 smallest eigenvalue를 찾을 수도 있다. 이러한 과정을 반복하다보면 모든 eigenvalue와 eigenvector를 찾는 것도 가능하다 [8].

Power Iteartion

앞에서 살펴본 원리를 이용하여 iterative하게 eigenvalue를 찾는 과정을 power iteration이라고 한다. Power method라고도 불리며 이따금 Von Mises iteration이라고도 한다.
본 절에서는 power iteration을 이용하여 semi-positive definite matrix의 eigenvalue를 찾는 경우를 경우를 고려한다. 즉, 모든 $i$ 에 대하여 $\lambda_i \ge 0$ 를 만족한다. Eigenvalue를 다루는 대부분의 문제들은 covariance matrix에 기반하고, covariance matrix는 항상 semi-positive definite 성질을 만족한다. 그렇지만 smallest eigenvalue를 찾는 과정에서도 언급이 되겠지만 negative-definite matrix에도 동일한 원리를 그대로 적용할 수 있다. Eigenvalue가 양수와 음수 모두 존재한다면 부호와 상관없이 절대값이 큰 eigenvalue부터 찾는다.

아래의 내용은 이전에 따로 영어로 정리해두었던 내용을 그대로 복사한 것임. 일부는 [5]의 내용을 그대로 카피한 것임.

Suppose that we have a positive-semidefinite matrix $\mathbf{R}$ which is established by sample covariance matrix estimation.
Define a random unit vector as $\mathbf{v}_0$ . For example, $\mathbf{v}_0^T=[1, 0, 0, ..., 0]^T$ .

Then, to find the largest eigenvalue of $\mathbf{R}$ , do the following:

\begin{aligned} &\textbf{while (1)}\\ & \ \ \ \ \ \ \mathbf{w}_i \leftarrow \mathbf{R}\mathbf{v}_i \\ & \ \ \ \ \ \ p_i \leftarrow ||\mathbf{w}_i|| = \sqrt{\mathbf{w}_i^T\mathbf{w}_i} \\ & \ \ \ \ \ \ \mathbf{v}_i \leftarrow \mathbf{w}_i/p_i \\ & \ \ \ \ \ \ \textbf{if} \ |1 - p_{i-1}/p_i| < T \\ & \ \ \ \ \ \ \ \ \ \ \ \ \textbf{break;} \\ & \ \ \ \ \ \ \textbf{end if} \\ &\textbf{end while} \end{aligned}

The largest eigenvalue estimate $\hat{\lambda}_{max}(\mathbf{R}) = p_i$ .

The smallest eigenvalue of $\mathbf{R}$ can be also estimated using the same procedures shown above using spectral shift $\mathbf{C} = \mathbf{R} - \hat{\lambda}_{max}\mathbf{I}$ . The reason this shift works is that a positive-definite matrix has all positive eigenvalues. Therefore, $\mathbf{C}$ has all non-positive eigenvalues, with the smallest eigenvalue of $\mathbf{R}$ now the largest-magnitude (most negative) eigenvalue of $\mathbf{C}$ .

Let us denote the largest eigenvalue estimate using $\mathbf{C}$ by $\hat{\lambda}_{max}(\mathbf{C})$ . Since we have used vector norm operation, $\hat{\lambda}_{max}(\mathbf{C})$ is positive. However, $\mathbf{C}$ is always negative-definite, and the negative sign should be multiplied to $\hat{\lambda}_{max}(\mathbf{C})$ again. Furthermore, by the relation of $\mathbf{C} = \mathbf{R} - \hat{\lambda}_{max}\mathbf{I}$ , it is obvious that $-\hat{\lambda}_{max}(\mathbf{C}) = \lambda_{min}(\mathbf{R}) - \lambda_{max}(\mathbf{R})$ . As a result, we can reach the following result:

\hat{\lambda}_{min}(\mathbf{R}) = \hat{\lambda}_{max}(\mathbf{R}) - \hat{\lambda}_{max}(\mathbf{C})

Applications

LS estimation에 필수적인 matrix inversion의 안정성을 높일 때 주로 사용된다. 당연한 얘기이지만 안정성을 높이는 대신 역행렬의 계산 정확도 면에서 손해를 보게 되며, unbiasedness 특성은 사라진다. 그럼에도 불구하고 전체적인 성능 관점에서 유리하기 때문에 빈번히 사용된다. 참고로 머신러닝에서 유명한 Ridge Regression이나 minimum variance distortionless response (MVDR 또는 CAPON) beamforming에도 널리 활용된다.

Footnotes

1: 거의 대부분의 EVD의 수식과 응용은 right eigenvector를 기준으로 하며 right eigenvector는 식 (2)를 만족한다. 반면, left eigenvector는 행벡터로 $\mathbf{q}_i\mathbf{A} = \lambda_i \mathbf{q}_i$ 를 만족한다. right/left가 명시되어 있지 않은 경우 보통은 right eigenvector를 의미하는 것이 관례이다 [1-2].