[optimization] Dual Problem

YJCho·2024년 11월 4일

Optimization

목록 보기

2/2

원 문제(primal problem) : 제약 있는 최적화 문제
→ 쌍대 문제(dual problem)라고 하는 깊게 관련된 다른 문제로 표현 가능

*일반적으로 쌍대 문제의 해는 원 문제 해의 하한값
But!! 어떤 조건 하에서는 원 문제와 똑같은 해 제공함.
→ SVM에서 이 조건 만족
→ 원 문제 또는 쌍대 문제 중 선택(골라골라~)

QP 솔버로 식을 최소화하는 벡터 $\hat{\alpha}$ 을 찾았다면 원 문제 식 최소화하는 $\hat{\mathbf{w}}$ 과 $\hat{b}$ 을 계산 가능
쌍대 문제에서 구한 해로 원 문제의 해 계산
$\hat{\mathbf{w}} = \sum_{i=1}^{m} \hat{\alpha}^{(i)} t^{(i)} \mathbf{x}^{(i)}$

\hat{b} = \frac{1}{n_s} \sum_{i=1}^{m} \left( t^{(i)} - \hat{\mathbf{w}}^T \mathbf{x}^{(i)} \right)_{\hat{\alpha}^{(i)} > 0}

\phi(\mathbf{x}) = \phi\left( \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} \right) = \begin{pmatrix} x_1^2 \\ \sqrt{2} x_1 x_2 \\ x_2^2 \end{pmatrix}

Equation 5-9. Kernel trick for a 2^{nd}-degree polynomial mapping

\phi(\mathbf{a})^T \phi(\mathbf{b}) = \begin{pmatrix} a_1^2 \\ \sqrt{2} a_1 a_2 \\ a_2^2 \end{pmatrix}^T \begin{pmatrix} b_1^2 \\ \sqrt{2} b_1 b_2 \\ b_2^2 \end{pmatrix} = a_1^2 b_1^2 + 2 a_1 b_1 a_2 b_2 + a_2^2 b_2^2

= (a_1 b_1 + a_2 b_2)^2 = \begin{pmatrix} a_1 \\ a_2 \end{pmatrix}^T \begin{pmatrix} b_1 \\ b_2 \end{pmatrix}^2 = (\mathbf{a}^T \mathbf{b})^2

결론 : $\phi$ 가 2차 다항식 변환이라면 백터 접곱을 간단하게 바꿀 수 있음
→ 실제로 후련 샘플 변환할 필요 X

장점 : 계산 측에서 효율적

2차 다항식 커널 : $K(\mathbf{a}, \mathbf{b}) = (\mathbf{a}^T \mathbf{b})^2$

머신러닝에서 커널은 변환 $\phi$ 를 계산하지 않고도 또는 아예 모르더라도
원래 벡터에 기반하여 점곱을 계산할 수 있는 함수다.

일반적인 커널들
Equation 5-10. Common kernels

Linear: 선형
$K(\mathbf{a}, \mathbf{b}) = \mathbf{a}^T \mathbf{b}$
Polynomial: 다항식
$K(\mathbf{a}, \mathbf{b}) = \left( \gamma \mathbf{a}^T \mathbf{b} + r \right)^d$
Gaussian RBF: 가우스 RBF
$K(\mathbf{a}, \mathbf{b}) = \exp \left( -\gamma \| \mathbf{a} - \mathbf{b} \|^2 \right)$
Sigmoid: 시그모이드
$K(\mathbf{a}, \mathbf{b}) = \tanh \left( \gamma \mathbf{a}^T \mathbf{b} + r \right)$

커널 SVM으로 예측
Equation 5-11. Making predictions with a kernelized SVM

h_{\hat{\mathbf{w}}, \hat{b}} \left( \phi(\mathbf{x}^{(n)}) \right) = \hat{\mathbf{w}}^T \phi(\mathbf{x}^{(n)}) + \hat{b} = \left( \sum_{i=1}^{m} \hat{\alpha}^{(i)} t^{(i)} \phi(\mathbf{x}^{(i)}) \right)^T \phi(\mathbf{x}^{(n)}) + \hat{b}

= \sum_{i=1}^{m} \hat{\alpha}^{(i)} t^{(i)} \left( \phi(\mathbf{x}^{(i)})^T \phi(\mathbf{x}^{(n)}) \right) + \hat{b}

= \sum_{i=1}^{m} \hat{\alpha}^{(i)} t^{(i)} K(\mathbf{x}^{(i)}, \mathbf{x}^{(n)}) + \hat{b} \quad \text{for } \hat{\alpha}^{(i)} > 0

커널 트릭으로 사용한 편향 계산
Equation 5-12. Computing the bias term using the kernel trick

\hat{b} = \frac{1}{n_s} \sum_{i=1}^{m} \left( t^{(i)} - \hat{\mathbf{w}}^T \phi(\mathbf{x}^{(i)}) \right)_{\hat{\alpha}^{(i)} > 0} = \frac{1}{n_s} \sum_{i=1}^{m} \left( t^{(i)} - \left( \sum_{j=1}^{m} \hat{\alpha}^{(j)} t^{(j)} \phi(\mathbf{x}^{(j)}) \right)^T \phi(\mathbf{x}^{(i)}) \right)_{\hat{\alpha}^{(i)} > 0}

= \frac{1}{n_s} \sum_{i=1}^{m} \left( t^{(i)} - \sum_{j=1}^{m} \hat{\alpha}^{(j)} t^{(j)} K(\mathbf{x}^{(i)}, \mathbf{x}^{(j)}) \right)_{\hat{\alpha}^{(i)} > 0}

호로록