[REVIEW] Meta-Learning with a Geometry-Adaptive Preconditioner

SHIN·2023년 6월 12일

1. Introduction

MAML, one of the most popular optimization based meta-learning algorithm.
Preconditioned MAML
- Imporved by Preconditioned Gradient Descent(PGD) for inner loop optimization
- Meta learns not only initialization parameters of models but also meta-parameters of preconditioner $P$ .
- $P$ was adapted with innder-step k or with individual task separately, thus Riemman metric condition cannot be satisfied.
- Riemman metric : A condition that the steepest gradient descent can be achieved on a given parameter space.
Proposing Geometry Adaptive Preconditioned gradient descent (GAP), which includes two unconsidered properties.
- $P_{GAP}$ is adapted with individual task and optimization path(path dependent i,e, innder-step dependent).
- $P_{GAP}$ is a Riemman metric.

Note that, $\Sigma_{\tau\sim P(\tau)}L_\tau$ has used instead of $E_{\tau}[L_{\tau}^{out}]$ , on the MAML paper.

$\bullet$ PGD often reduces the effect of pathological curvature and speed up the optimization.

$L$ : number of layers of NN
$\theta = \{\mathsf{W}^1,\cdots,\mathsf{W}^l,\cdots,\mathsf{W}^L\}$ : CNN parameters
$\phi = \{\mathbf{M}^1,\cdots,\mathbf{M}^l,\cdots,\mathbf{M}^L\}$ : preconditioner parameters
$\mathcal{T}_i\sim p(\mathcal{D})$ , $\mathcal{T}_i$ : batch of tasks
$\tau\in \mathcal{T}_i$ , $\tau$ : a task
$K$ : number of samples

L-layer neural network $f_\theta(\cdot)$ with parameters $\theta =\{$ W $^1,\cdots,$ W $^l,\cdots,$ W $^L\}$
Let gradient G $_{\tau,k}^l=\nabla$ W $_{\tau,k}^l L_{\tau}^{in}(\theta_{\tau,k};D^{tr})$
Reshape G $_{\tau,k}^l$ witn unfolding mode-1 into $G_{\tau,k}^l$ . (mode-1 performs the best)
Define additional parameters $\phi = \{M^l\}_{l=1}^L$ where $m_i^l \in R, \; Sp(x)=\frac{1}{2}log(1+exp(2*x))$
SVD $G_{\tau,k}^l$ into $U_{\tau,k}^l\Sigma_{\tau,k}^l(V_{\tau,k}^l)^T,$
and $\widetilde{G}_{\tau,k}^l = U_{\tau,k}^l(M_{\tau,k}^l\Sigma_{\tau,k}^l)(V_{\tau,k}^l)^T$
Reshape $\widetilde{G}_{\tau,k}^l$ back to it's original tensor form
Preconditioned gradient descent of GAP becomes: