Theory of Unconstained Optimization
Optimality Conditions
Lemma:
Suppose that f:Rn→R is differentiable at xˉ. If there is a vector d∈R such that ∇f(xˉ)Td<0, then d is a descent direction of f at xˉ.
- sketch of proof: f(x+λd)≊f(x)+λ∇f(x)Td≤f(x) for λ≥0.
추후 나올 Gradient Descent 또는 SGD의 기본 원리이다.
Theorem: First-Order Necessary Optimality Conditions (FONC)
Suppose f:Rn→R is differentiable at x∗.
If x∗ is a local minimum, then ∇f(x∗)=0.
Theorem: Second-Order Optimality Conditions
Suppose f:Rn→R is twice differentiable at x∗.\
[Necessary] If x∗ is a local minimum, then ∇f(x∗)=0 and ∇2f(x∗) is positive semidefinite.\
[Sufficient] If ∇f(x∗)=0 and ∇2f(x∗) is positive definite, then x∗ is a strict local minimum.
Determining Local Optima
-
Find the critical points of f(x,y) by solving the system of simultneous equations: fx=0,fy=0.
-
Let D(x,y)=fxxfyy−fxy2.
-
Then
-
D(a,b)>0 and fxx(a,b)<0 implies that f(x,y) has a local maximum at the point (a,b).
-
D(a,b)>0 and fxx(a,b)>0 implies that f(x,y) has a local minimum at the point (a,b).
-
D(a,b)<0 implies that f(x,y) has neither a local maximum nor a local minimum at the point (a,b), it has instead a saddle point.
-
D(a,b)=0 implies that the test is inconclusive, so some other technique must be used to solve the problem.
Line Search Strategy
Line Search
Line search는 numerical analysis에서 (근사) 해를 찾는 기법으로 복잡한 diffrentiable function에 대해 적절하게 사용될 수 있다.
기본적으로 주어진 점 xk에서 search direction pk를 계산하고, 해당 방향으로 positive scalar인 step length αk만큼 이동하여 새로운 점 xk+1을 찾는다.
xk+1=xk+αkpk
따라서, line search method는 적절한 search direction과 step length를 선택하는 것이 중요하다.
Step length는 일반적인 learning algorithm에서 learning rate와 같다.
pk는 descent direction, i.e. pkT∇fk<0, 으로 정하는 것이 합리적이며 많은 line search methods에서, pk는 다음과 같은 form을 갖는다.
pk=−Bk−1∇fk
where Bk is a symmetric and nonsingular matrix.
The Wolfe Conditions
The Wolfe condition은 inexact line search 방법에서 step length를 결정하기 위한 기준을 제공한다.
- [Armijo condition : f(xk+αkdk)≤f(xk)+c1αk∇f(xk)Tdk]
새로운 점에서의 함수 값이 현재 점에서의 값보다 충분히 감소해야 한다.
- [curvature condition : ∇f(xk+αkdk)Tdk≥c2∇f(xk)Tdk]
새로운 점에서의 경사가 원래 경사의 c2배 이상이어야 한다. (너무 멀리 가지 않고, 너무 평평한 구역에 위치하지 않도록 한다.)
Line Search Methods
- Steepest Descent
- xk+1=xk−αk∇fk
- The rate-of-convergence is linear.
- Global convergence if αk satisfies the Wolfe conditions.
우리가 흔히 말하는 Gradient Descent 방법이다.
-
Quasi-Newton Method
- xk+1=xk−αkBk−1∇fk
- The rate-of-convergence is superlinear.
- Global convergence if αk satisfies the Wolfe conditions.
- The BFGS method is the most popular.
-
Newton's Method
- xk+1=xk−(∇2fk)−1∇fk
- The rate-of-convergence is quadrtic.
- Local convergence.
