Neural Tangent Kernel

stat._.jun·2025년 10월 25일

먼저, Gradient Flow라는 것에 대해서 이해할 필요가 있다. Gradient Descent를 조금 다르게 생각하는 것이다.

\theta_{t+1} = \theta_{t} - \eta \nabla_{\theta} \mathcal{L}(\theta)

간단한 식조작을 통해서 아래와 같이 바꿔보자.

\frac{\theta_{t+1} - \theta_{t}}{\eta} = -\nabla_{\theta}\mathcal{L}(\theta)

만약 $\eta \approx 0$ 이라면, 아래 처럼 생각할 수 있지 않을까?

\frac{d\theta(t)}{dt} = - \nabla_{\theta}\mathcal{L}(\theta)

이렇게 생각하는 것을 Gradient Flow라고 한다.

Empirical Risk가 다음과 같이 주어졌다고 하자.

\mathcal{L}(\theta) := \frac{1}{N} \sum_i \ell( f(x_i ; \theta), y_i)

Chain Rule을 통해 아래의 식을 알고 있다.

\frac{df(x ; \theta)}{dt} = \frac{\partial f(x; \theta)}{\partial \theta} \frac{d\theta}{dt}

\nabla_{\theta} \mathcal{L}(\theta) = \frac{1}{N} \sum_{i = 1}^{N} \nabla_{\theta} \ell(f(x_i; \theta), y_i) = \frac{1}{N} \sum_{i=1}^{N} \nabla_{\theta} f(x_i; \theta) \nabla_{f} \ell(f(x_i ;\theta),y_i)

\frac{df(x ; \theta)}{dt} = - \frac{1}{N} \sum_{i=1}^{N} \nabla_{\theta} f(x ; \theta)^{\top} \nabla_{\theta} f(x_i; \theta) \nabla_{f} \ell(f(x_i ;\theta),y_i)

이때, 이 $\nabla_{\theta} f(x ; \theta)^{\top} \nabla_{\theta} f(x_i; \theta)$ 부분을 Neural Tangent Kernel이라고 하고, 여러 문서에서 기호로는 $\Theta$ 으로 표기한다.

Gradient Based Learning Model을 Kernel Regression의 형태로 이해할 수 있는 관점이 흥미로운 것 같다.