๐ŸงƒGradient, Jacobian, Laplacian, Hessian

MURRAIYAยท2022๋…„ 12์›” 3์ผ
0

optimization, deep learning, computer vision ๋“ฑ ์–ด์จ‹๋“  ์ž์œจ์ฃผํ–‰์ชฝ์„ ๊ณต๋ถ€ํ•˜๋ฉด์„œ ๊ณ„์† ๋“ฑ์žฅํ•˜๋Š” Jacobian, Hessian์— ๋Œ€ํ•ด ์งง๊ฒŒ ์ •๋ฆฌํ•˜๊ณ ์ž ํ•œ๋‹ค.

Math Formula

Jacobian๊ณผ Hessian์„ ์ดํ•ดํ•  ๋•Œ ์ด๋ฏธ ์ต์ˆ™ํ•œ Gradient์™€ Laplacian์„ ๋– ์˜ฌ๋ฆฌ๋ฉด ์ดํ•ดํ•˜๊ธฐ ํŽธํ•˜๋‹ค.
๊ฐ๊ฐ์˜ ํ˜•ํƒœ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

Gradient

โˆ‡f=[โˆ‚fโˆ‚x1โˆ‚fโˆ‚x2โ‹ฎโˆ‚fโˆ‚xn]\nabla f=\begin{bmatrix} \partial f \over \partial x_1 \\\partial f \over \partial x_2\\\vdots\\\partial f \over \partial x_n\end{bmatrix}

Jacobian Matrix

J=[โˆ‚f1โˆ‚x1โˆ‚f1โˆ‚x2โ€ฆโˆ‚f1โˆ‚xnโˆ‚f2โˆ‚x1โˆ‚f2โˆ‚x2โ€ฆโˆ‚f2โˆ‚xnโ‹ฎโ‹ฎโ‹ฑโ‹ฎโˆ‚fmโˆ‚x1โˆ‚fmโˆ‚x2โ€ฆโˆ‚fmโˆ‚xn]J=\begin{bmatrix} \partial f_1 \over \partial x_1 & \partial f_1 \over \partial x_2 & \dots& \partial f_1 \over \partial x_n\\ \partial f_2 \over \partial x_1 & \partial f_2 \over \partial x_2 & \dots & \partial f_2 \over \partial x_n\\ \vdots & \vdots & \ddots & \vdots \\ \partial f_m \over \partial x_1 & \partial f_m \over \partial x_2 & \dots & \partial f_m \over \partial x_n\end{bmatrix}

Laplacian

โˆ‡2f=โˆ‚2fโˆ‚x12+โˆ‚2fโˆ‚x22+โ‹ฏ+โˆ‚2fโˆ‚xn2\nabla^2f = {\partial^2f \over \partial x_1^2} + {\partial^2f \over \partial x_2^2}+ \dots+{\partial^2f \over \partial x_n^2}

Hessian Matrix

H=[โˆ‚2fโˆ‚x12โˆ‚2fโˆ‚x1โˆ‚x2โ€ฆโˆ‚2fโˆ‚x1โˆ‚xnโˆ‚2fโˆ‚x2โˆ‚x1โˆ‚2fโˆ‚x22โ€ฆโˆ‚2fโˆ‚x2โˆ‚xnโ‹ฎโ‹ฎโ‹ฑโ‹ฎโˆ‚2fโˆ‚xnโˆ‚x1โˆ‚2fโˆ‚xnโˆ‚x2โ€ฆโˆ‚2fโˆ‚xn2]H=\begin{bmatrix} \partial^2 f \over \partial x_1^2 & \partial^2 f \over \partial x_1\partial x_2 & \dots& \partial^2 f \over \partial x_1\partial x_n\\ \partial^2 f \over \partial x_2\partial x_1 & \partial^2 f \over \partial x_2^2 & \dots& \partial^2 f \over \partial x_2\partial x_n\\ \vdots & \vdots & \ddots & \vdots \\ \partial^2 f \over \partial x_n\partial x_1 & \partial^2 f \over \partial x_n\partial x_2 & \dots& \partial^2 f \over \partial x_n^2\\ \end{bmatrix}

Gradient

โˆ‡f=[โˆ‚fโˆ‚x1โˆ‚fโˆ‚x2โ‹ฎโˆ‚fโˆ‚xn]\nabla f=\begin{bmatrix} \partial f \over \partial x_1 \\\partial f \over \partial x_2\\\vdots\\\partial f \over \partial x_n\end{bmatrix}

โˆ‡f\nabla f๋Š” multi variable scalar valued f(x1,x2,โ€ฆ,xn)f(x_1, x_2, \dots, x_n)์˜ ์ผ์ฐจ ํŽธ๋ฏธ๋ถ„ ๋ฒกํ„ฐ์ด๋‹ค.
๊ทธ ์ž์ฒด๋กœ๋Š” multi variable vector valued funtion

  • ์ฐธ๊ณ : mult/uni variable์€ input, vector/scalar valued๋Š” output์„ ์˜๋ฏธํ•จ

๊ฐ ๋ณ€์ˆ˜๋กœ์˜ ์ผ์ฐจ ํŽธ๋ฏธ๋ถ„์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค.
์ด ๋ฒกํ„ฐ๋Š” f๊ฐ€ ๊ฐ€์žฅ ๊ฐ€ํŒŒ๋ฅด๊ฒŒ ์ฆ๊ฐ€ํ•˜๋Š” ๋ฐฉํ–ฅ๊ณผ ๊ทธ ์ •๋„๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.
์•ž์— -๋ฅผ ๋ถ™์ธ โˆ’โˆ‡f-\nabla f ๋Š” f๊ฐ’์ด ๊ฐ€์žฅ ๊ฐ€ํŒŒ๋ฅด๊ฒŒ ๊ฐ์†Œํ•˜๋Š” ์ชฝ์„ ๋‚˜ํƒ€๋‚ด๊ฒŒ ๋œ๋‹ค.

์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ์•„๋ž˜ ๋ฐฉ๋ฒ•์œผ๋กœ ๋งŽ์ด ์ด์šฉ๋œ๋‹ค.

  • Gradient Descent for Optimization
    โˆ’โˆ‡f(x)-\nabla f(x)์ชฝ์œผ๋กœ updateํ•˜์—ฌ ์ตœ์†Œ๊ฐ’ ํƒ์ƒ‰

  • First Order Taylor Expansion
    f(x)โ‰ˆf(p)+โˆ‡f(p)(xโˆ’p)f(x)\approx f(p)+\nabla f(p)(x-p)

  • Edge Detection in Image

Jacobian Matrix๐Ÿšฉ

J=[โˆ‚f1โˆ‚x1โˆ‚f1โˆ‚x2โ€ฆโˆ‚f1โˆ‚xnโˆ‚f2โˆ‚x1โˆ‚f2โˆ‚x2โ€ฆโˆ‚f2โˆ‚xnโ‹ฎโ‹ฎโ‹ฑโ‹ฎโˆ‚fmโˆ‚x1โˆ‚fmโˆ‚x2โ€ฆโˆ‚fmโˆ‚xn]J=\begin{bmatrix} \partial f_1 \over \partial x_1 & \partial f_1 \over \partial x_2 & \dots& \partial f_1 \over \partial x_n\\ \partial f_2 \over \partial x_1 & \partial f_2 \over \partial x_2 & \dots & \partial f_2 \over \partial x_n\\ \vdots & \vdots & \ddots & \vdots \\ \partial f_m \over \partial x_1 & \partial f_m \over \partial x_2 & \dots & \partial f_m \over \partial x_n\end{bmatrix}

JJ๋Š” F:Rnโ†’RmF:\mathbb{R}^n\rightarrow\mathbb{R}^m ์ธ multi variable vector valued F(x1,x2,โ€ฆ,xn)F(x_1,x_2,\dots,x_n)์— ๋Œ€ํ•œ ์ผ์ฐจ ํŽธ๋ฏธ๋ถ„์œผ๋กœ ์ •์˜๋œ๋‹ค.
gradient์™€ ์ฐจ์ด์ ์ด๋ผ๋ฉด F๊ฐ€ vector valued๋ผ๋Š” ๊ฒƒ.
gradient๋ฅผ vector valued ํ•จ์ˆ˜์— ๋Œ€ํ•ด์„œ ํ™•์žฅํ–ˆ๋‹ค๊ณ  ๋ณด๋ฉด ๋œ๋‹ค.

F(x1,x2,โ€ฆ,xn)=[f1(x1,x2,โ€ฆ,xn)f2(x1,x2,โ€ฆ,xn)โ‹ฎfm(x1,x2,โ€ฆ,xn)]F(x_1,x_2,\dots,x_n)= \begin{bmatrix} f_1(x_1,x_2,\dots,x_n)\\ f_2(x_1,x_2,\dots,x_n)\\ \vdots\\ f_m(x_1,x_2,\dots,x_n)\\ \end{bmatrix}

entry ๊ฐ๊ฐ์€ ํ•จ์ˆ˜์˜ ํ˜•ํƒœ๋ฅผ ๋„์ง€๋งŒ ์›๋ž˜ ํ•จ์ˆ˜๊ฐ€ quadratic form์ธ ๊ฒฝ์šฐ์— ํ•œํ•ด ์ƒ์ˆ˜๋กœ ๋‚˜์˜ฌ ์ˆ˜๋„ ์žˆ๋‹ค.

Jacobian๋„ ์ผ์ฐจ๋ฏธ๋ถ„์˜ ์ •์˜์— ๋”ฐ๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ์„ ํ˜• ๊ทผ์‚ฌ๋‚˜ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์— ๊ทธ๋Œ€๋กœ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

  • First Order Taylor Expansion
    F(x)โ‰ˆF(p)+J(p)(xโˆ’p)F(x)\approx F(p)+J(p)(x-p)

Laplacian

โˆ‡2f=โˆ‚2fโˆ‚x12+โˆ‚2fโˆ‚x22+โ‹ฏ+โˆ‚2fโˆ‚xn2\nabla^2f = {\partial^2f \over \partial x_1^2} + {\partial^2f \over \partial x_2^2}+ \dots+{\partial^2f \over \partial x_n^2}

โˆ‡2f\nabla^2f๋Š” gradient์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ multi variable scalar valued f(x1,x2,โ€ฆ,xn)f(x_1,x_2,\dots,x_n)์— ๋Œ€ํ•œ ์ด์ฐจ ํŽธ๋ฏธ๋ถ„ ๊ฐ’์˜ ํ•ฉ์œผ๋กœ ์ •์˜๋œ๋‹ค.
Laplacian ์ž์ฒด๋กœ๋Š” multi variable scalar valued ๋กœ, output์€ 2์ฐจ ํŽธ๋ฏธ๋ถ„ ๊ฐ’์˜ ํ•ฉ์ธ scalar์ด๋‹ค.

์ด๋Š” 2์ฐจ๋ฏธ๋ถ„์ด๋‹ˆ ๋ณ€ํ™”๋Ÿ‰์˜ ๊ธ‰๊ฒฉํ•œ ์ •๋„๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

์˜์ƒ์ฒ˜๋ฆฌ์˜ edge detection์—์„œ ๋ฐฐ์›Œ๋ณธ ์‚ฌ๋žŒ๋„ ์žˆ์„ํ…๋ฐ ์ด๋ฏธ์ง€์˜ pixel๊ฐ’ ํ•จ์ˆ˜ I=f(x,y)I=f(x,y)์— ๋Œ€ํ•œ Laplacian์€ ๋ณ€ํ™”๊ฐ€ ์žˆ๋”๋ผ๋„ ๊ท ์ผํ•œ ๋ถ€๋ถ„์€ ์ž‘์€ ๊ฐ’์„, ๋ณ€ํ™”๊ฐ€ ๊ธ‰๊ฒฉํ•œ ๋ถ€๋ถ„์€ ํฐ ๊ฐ’์„ ๊ฐ–์œผ๋ฏ€๋กœ ๊ทธ๋ƒฅ ๋ฏธ๋ถ„๊ฐ’์— ๋น„ํ•ด ๋” edge์Šค๋Ÿฌ์šด ๋ถ€๋ถ„์„ ์ฐพ๊ฒŒ ํ•ด ์ค€๋‹ค.
๋‹ค๋งŒ, ์„ฑ๋Šฅ์ฐจ์ด์— ๋น„ํ•ด์„œ 2์ฐจ๋ฏธ๋ถ„์— ๋“œ๋Š” computation ๋ถ€๋‹ด์ด ์ปค์„œ ์‹ค์ œ๋กœ laplacian์„ ์ž˜ ์ด์šฉํ•˜์ง€๋Š” ์•Š๋Š”๋‹ค๊ณ  ํ•œ๋‹ค.

Hessian Matrix๐Ÿšฉ

H=[โˆ‚2fโˆ‚x12โˆ‚2fโˆ‚x1โˆ‚x2โ€ฆโˆ‚2fโˆ‚x1โˆ‚xnโˆ‚2fโˆ‚x2โˆ‚x1โˆ‚2fโˆ‚x22โ€ฆโˆ‚2fโˆ‚x2โˆ‚xnโ‹ฎโ‹ฎโ‹ฑโ‹ฎโˆ‚2fโˆ‚xnโˆ‚x1โˆ‚2fโˆ‚xnโˆ‚x2โ€ฆโˆ‚2fโˆ‚xn2]H=\begin{bmatrix} \partial^2 f \over \partial x_1^2 & \partial^2 f \over \partial x_1\partial x_2 & \dots& \partial^2 f \over \partial x_1\partial x_n\\ \partial^2 f \over \partial x_2\partial x_1 & \partial^2 f \over \partial x_2^2 & \dots& \partial^2 f \over \partial x_2\partial x_n\\ \vdots & \vdots & \ddots & \vdots \\ \partial^2 f \over \partial x_n\partial x_1 & \partial^2 f \over \partial x_n\partial x_2 & \dots& \partial^2 f \over \partial x_n^2\\ \end{bmatrix}

HH (Hessian Matrix)๋Š” multi variable scalar valued f(x1,x2,โ€ฆ,xn)f(x_1, x_2, \dots,x_n)์— ๋Œ€ํ•œ ์ด์ฐจ ํŽธ๋ฏธ๋ถ„์œผ๋กœ ์ •์˜๋œ๋‹ค.

ํ•„์ž์˜ ๊ด€์‹ฌ๋ถ„์•ผ์—์„œ Hessian์€ optimization ๋ถ„์•ผ์— ๋“ฑ์žฅํ•˜๋Š” ๋†ˆ์ด๋‹ค.

  • Hessian์€ ์ด์ฐจ๋ฏธ๋ถ„, ์ฆ‰ ํ•จ์ˆ˜์˜ curvature(๊ณก๋ฅ )์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.
    ์ด ์„ฑ์งˆ์€ ์–ด๋–ค ํ•จ์ˆ˜์˜ ์ตœ์†Œ๊ฐ’์— ๋„๋‹ฌํ•˜๋Š” ๊ฒƒ์ด ์ฃผ ๋ชฉ์ ์ธ optimizatioin ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ๋ฐ ์ž์ฃผ ์ด์šฉ๋œ๋‹ค.

  • Hij=HjiH_{ij}=H_{ji}๋กœ symmetric matrix์ด๋‹ค. Symmetic matrix๊ฐ€ ๊ฐ€์ง€๋Š” ๋งค์šฐ ํŠน๋ณ„ํ•œ ์„ฑ์งˆ๋„ optimization์—์„œ ์ด์šฉ๋œ๋‹ค.

  • Second Order Taylor Expansion
    f(x)โ‰ˆf(p)+โˆ‡f(p)(xโˆ’p)+12(xโˆ’p)TH(p)(xโˆ’p)f(x)\approx f(p)+\nabla f(p)(x-p)+{1 \over 2}(x-p)^TH(p)(x-p)


โ—๏ธ ํ–‰๋ ฌ๊ณผ ๋ฒกํ„ฐ์˜ ์ œ๊ณฑ์„ ํ‘œํ˜„ํ•  ๋•Œ ํ–‰๋ ฌ ์•ž์— transpose, ๋’ค์— ๊ทธ๊ฒƒ์„ ๋‹ค์‹œ ๊ณฑํ•˜์—ฌ xTHxx^THx ํ˜•ํƒœ๋กœ ์“ด๋‹ค โ•

์ฐธ๊ณ ์ž๋ฃŒ:
์ธํ•˜๋Œ€ ๊น€๊ด‘๊ธฐ ๊ต์ˆ˜๋‹˜์˜ ์ˆ˜์น˜ํ•ด์„ ๊ฐ•์˜
https://darkpgmr.tistory.com/132
https://angeloyeo.github.io/2020/06/17/Hessian.html

์ถ”๊ฐ€์ •๋ณด, ์ง€์  ๋“ฑ ๋Œ“๊ธ€ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค.

profile
๐Ÿ™ƒSUJI KIM๐Ÿ™ƒ ๐Ÿšฉ Inha University Undergraduate ๐Ÿš— Autonomous Driving Robots ๐Ÿ“ท Computer Vision ๐Ÿ’ซ SLAM

0๊ฐœ์˜ ๋Œ“๊ธ€