최소제곱해 (least squares solution), 정규방정식(normal equation)

반디·2023년 1월 21일

선형대수

목록 보기

1/1

최소제곱해 (least squares solution)

선형회귀 모델은 ${\bf b} = A{\bf x}$ 의 해를 찾는 문제로 생각할 수 있습니다. 그러나, 위 식이 해를 가지지 않는다(inconsistent)면 어떻게 할까요? 근사치 즉, best approximate solution을 찾아야할 것입니다.

그런데 best approximate solution이라는 것은 어떻게 정의되는 걸까요?

우리의 best approximate solution은 least-squares solution이라는 이름을 가지는데요, 다음 정의를 보면 왜 least-squares solution이라고 부르는 지 알 수 있습니다.

Let $A$ : $m \times n$ matrix and ${\bf b}$ : vector in $\mathbb{R}^m$ .
A least-squares solution of the matrix equation ${\bf b} = A{\bf x}$ is a vector $\hat{\bf x}$ in $\mathbb{R}^n$ s.t.
$dist({\bf b}, A{\hat {\bf x}}) \le dist({\bf b}, A{\bf x})$ for $\forall {\bf x} \in \mathbb{R}^n$ .

$dist(b, A{\hat x}) = ||{\bf b} - A{\hat {\bf x}}||$ 이고, 이 값은 벡터 ${\bf b} - A{\hat {\bf x}}$ 의 원소들의 제곱을 모두 더한 값에 제곱근을 씌운 값입니다. 따라서, ${\bf b} - A{\hat {\bf x}}$ 벡터의 원소들의 제곱을 모두 더한 값 (sum of squares)을 최소로 하는 해(solution)이기 때문에 최소제곱해(least-squres solution)이라고 부릅니다.

최소제곱해를 그림으로도 살펴보겠습니다.

방정식 ${\bf b} = A{\bf x}$ 가 해를 갖지 않는다고 가정하겠습니다.
식의 우변인 $A{\bf x}$ 는 $A$ 의 column space, $col(A)$ 에 속하는 벡터이므로, ${\bf b} = A{\bf x}$ 가 해를 갖지 않는다는 것은, ${\bf b}$ 가 $col(A)$ 에 속하지 않는다는 것입니다.

https://textbooks.math.gatech.edu/ila/least-squares.html

따라서 위 그림처럼, 벡터 ${\bf b}$ 는 $col(A)$ 에 속하지 않고 동떨어져있습니다.

그렇다면 $dist({\bf b}, A{\hat {\bf x}}) \le dist({\bf b}, A{\bf x})$ for $\forall {\bf x} \in \mathbb{R}^n$ 를 만족하는 최소제곱해 ${\hat {\bf x}}$ 는 어떤 녀석일까요?

바로 $col(A)$ 와 ${\bf b}$ 사이의 거리가 최소가 되도록 하는 벡터입니다. 즉, ${\bf b}$ 의 $col(A)$ 로의 orthogonal projection 입니다!
사실 이것은 best approximation theorm에 의한 결과이죠.

The Best Approximation Theorem
Let $W$ be a subspace of $\mathbb{R}^ n$ , let $y$ be any vector in $\mathbb{R}^ n$ , and let $\hat{y}$ be the orthogonal projection of $y$ onto $W$ . Then $\hat{y}$ is theclosest point in $W$ to $y$ , in the sense that
$||y − {\hat y}|| < ||y − v||$ for all $v$ in $W$ distinct from ${\hat y}$ .

정규방정식 (normal equation)

${\bf b} = A{\bf x}$ 가 해를 갖지 않을 때, 이 방정식의 최소제곱해(least square solution)이 ${\bf b}$ 의 $col(A)$ 로의 orthogonal projection 임을 알게되었습니다.

구체적으로 최소제곱해를 구하는 방법을 알아보겠습니다.

Orthogonal Decomposition Theorem
Let $W$ : subspace of $\mathbb{R}^n$ . Then each $y \in \mathbb{R}^n$ can be written uniquely in the form $y = \hat{y} + z$ , where $\hat{y} \in W$ and $z \in W^\perp$ .

${\bf b}$ 는 orthogonal decomposition theorem에 의해 다음과 같이 분해될 수 있습니다.

{\bf b} = {\bf \hat{\bf b}}+({\bf b}- \hat{\bf b}), \text{ where } \hat{\bf b} \in col(A), ({\bf b}- \hat{\bf b}) \in col(A)^\perp

행렬 $A$ 의 $j$ 번째 column을 $a_j$ 라고 할 때, 다음을 확인할 수 있습니다.

({\bf b}- \hat{\bf b}) \in col(A)^\perp \iff ({\bf b}- \hat{\bf b}) \perp col(A) \iff ({\bf b}- A\hat{\bf x}) \perp col(A)

\iff ({\bf b}- A\hat{\bf x}) \cdot a_j = 0\text{ for } j=1, \ldots, n \iff a_j^T({\bf b}- A\hat{\bf x}) = 0 \text{ for } j=1, \ldots, n

\iff A^T({\bf b}- A\hat{\bf x}) = 0 \iff A^T{\bf b} = A^TA\hat{\bf x}

제일 마지막 식인 $A^T{\bf b} = A^TA\hat{\bf x}$ 를 $A{\bf x} = {\bf b}$ 의 정규방정식이라고 부릅니다.
즉, ( $A{\bf x} = {\bf b}$ 최소제곱해집합) = ( $A^TA\hat{\bf x} = A^T{\bf b}$ 의 해집합) 입니다.

그런데 최소제곱해는 유일하게 존재할까요? 일반적으로는 그렇지 않습니다.
$A$ 의 column들이 linearly dependent한 경우에는 무한히 많은 해를 갖습니다.

유일하게 존재하는 경우는 다음 theorem으로 정리할 수 있습니다.

Theorem
Let $A$ : $m \times n$ matrix and ${\bf b} \in \mathbb{R}^m$ .
The following are equivalent:
1. $A{\bf x} = {\bf b}$ has a unique least-squares solution.
2. The columns of $A$ are linearly independent.
3. $A^TA$ is invertible.
In this case, the least-square solution is $\hat {\bf x} = (A^TA)^{-1}A^T{\bf b}$ .

(proof)
(1. $\iff$ 3.)
$A{\bf x} = {\bf b}$ 의 최소제곱해는 $A^TA\hat{\bf x} = A^T{\bf b}$ 의 해와 같습니다.
그리고 다음을 만족하는 두 벡터 ${\bf v, u}$ 를 생각해보겠습니다:

A^TA{\bf v} = A^T{\bf b} \text{ and }A^TA{\bf u} = 0.

이 때, ${\bf v} = {\bf v'}+{\bf u}$ for some ${\bf v'}$ s.t. $A^TA{\bf v'} = A^T{\bf b}$ 입니다.
즉, $A^TA{\bf \hat x} = A^T{\bf b}$ 의 해집합은 $A^TA{\bf x} = 0$ 의 해집합을 translate한 형태입니다.

이로부터, ' $A{\bf x} = {\bf b}$ 의 해가 유일하다 $\iff$ $A^TA{\bf \hat x} = A^T{\bf b}$ 의 해가 유일하다 $\iff$ $A^TA{\bf x} = 0$ 의 해가 유일하다' 는 것을 이끌어낼 수 있습니다.

그리고 $A^TA{\bf x} = 0$ 의 해가 유일하기 위해서는 정방행렬 $A^TA$ 가 invertible 해야함을 알 수 있습니다.
따라서, 위 Theorem의 1과 3이 equivalent함을 확인하였습니다.

(1. $\iff$ 2.)
이제 1과 2가 equivalent함을 보이겠습니다.
$\hat {\bf b} = proj_{col(A)}{\bf b}$ 라 하면, $A{\bf x} = {\bf b}$ 의 최소제곱해는 $A{\bf x} = {\hat {\bf b}}$ 의 해입니다.
따라서, $A{\bf x} = {\bf b}$ 의 해가 유일하면, $A{\bf x} = {\hat {\bf b}}$ 도 유일한 해를 갖습니다.
$A{\bf x} = {\hat {\bf b}}$ 가 유일한 해를 갖는다 $\iff$ columns of $A$ is linearly independent 이므로, 1과 2가 equivalent함을 알 수 있습니다.

참고문헌

The least square problems
https://textbooks.math.gatech.edu/ila/least-squares.html
The best approximation theorem
https://math.berkeley.edu/~limath/Su14Math54/0717.pdf
The orthogonal decomposition theorem
https://math.dartmouth.edu/~m22f19/math22_lecture22_f19.pdf

반디

꾸준히!

최소제곱해 (least squares solution), 정규방정식(normal equation)

선형대수

최소제곱해 (least squares solution)

정규방정식 (normal equation)

0개의 댓글

관련 채용 정보