Note that, has used instead of , on the MAML paper.
PGD often reduces the effect of pathological curvature and speed up the optimization.
Tensor and Tensor decompositions link
Riemannian manifold and metric
L-layer neural network with parameters WWW
Let gradient GW
Reshape G witn unfolding mode-1 into . (mode-1 performs the best)
Define additional parameters where
SVD into
and
Reshape back to it's original tensor form
Preconditioned gradient descent of GAP becomes: