[REVIEW] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

SHIN·2023년 5월 29일
0

1. Introduction

 Proposing general and model-agnostic meta-learning algorithm.

2. Model-Agnostic Meta-Learning(MAML)

 2.1. Meta learning scenario

  Meta-training part

  1. A task TiT_{i} is drawn from task distribution p(T)p(T).
  2. With TiT_i, the model is trained with only KK samples and feedback from LiL_i.
  3. Tested on new samples from TiT_i , this test error is considered when imporving model f(generaly expressed as 'model f' since meta-learnable object is differ from algorithms to parameters) with respect to the parameters.(Thus this test error serves as training error.)

  Meta-testing part

  1. New tasks are sampled from p(T)p(T), and meta-performance is measured after learning from K samples.(Tasks used for meta-testing are out held out during meta-training)

2.2. A Model-Agnostic Meta-Learning Algorithm

 Intuition

  • Some representations are more transferable.
    -> Let Neural networks learn features from such representations, thus broadly applicable to all tasks.

 Problems

  1. Existence of such representations.(Personal question)
  2. How can we tell which one is the one.

 Approach

  1. It is a relative matter. Not looking for an absolute transferable, but relatively more transferable representation.
  2. By finding the model that makes most rapid adjustment on new tasks from p(T)p(T). Thus able to evoke a large improvement from a small change. More transferable.

 Algorithm

  Inner Loop : Adapting to a new task

  • Compute θi\theta_i' for each task TiT_i (different set of parameters generated).

  Outer Loop : Filtering the most transferable representation

  • Compute loss for each task(with respect to each parameters) and add them up(across all tasks), and compute gradient, and update θ\theta.
  • θ\theta is updated with sum of loss's gradient, which is more likely to be affected greater by big gradients, thus adjusted θ\theta is affected by more transferable representations gain from the tasks.

    Concequently, the meta-object is as follows:

3. Species of MAML

 3-1. Supervised Regression and Classification

  MSE for regression

  CE-loss for classification  

  MAML for Few-Shot Supervised Learning

  • Note that, K-shot classification tasks use K input/output pairs from each class, thus NK data points for N-way classification.

3.2.Reinforcement Learning

  Definitions

  1. TT = { L(x1,a1,...,xH,aH),q(x1),q(xt+1xt,at),HL(x_{1}, a_{1}, . . . , x_{H}, a_{H}), q(x_{1}), q(x_{t+1}|x_{t}, a_{t}), H }
    TT : Task (Each learning problems)
  2. L(xt,at)L(x_{t},a_{t}) : Loss function with observation xtx_{t}, output ata_{t}
  3. q(xt)q(x_{t}) : A distribution of initial observations
  4. q(xt+1xt,at)q(x_{t+1}|x_{t}, a_{t}) : A transition distribution
  5. HH : An episode length. A cycle length of generating an output of a query set. (Each time tt, model generates samples of length H by chooosing an output ata_{t})

 Algorithm

Terminology

  • task : (Classification for example) Each work given specific classes to perform classification, where this specific classes may not include whole class range. Thus, there might be more than one tasks under one dataset.
profile
HAPPY the cat

0개의 댓글