[ECCV 2018] Memory aware synapses Learning what (not) to forget

Ruffy·2023년 2월 3일

transfer learning

1. Introduction

기존의 Continual Learning은 2가지 task 사이에서 수행되거나, 여유로운 model의 capacity에서 수행됨.
반면, 실생활에서는 제한된 capacity에서 여러가지 task가 계속해서 주어지는 상황을 수행해야 함.
이전 task들에 대해 model이 전부 기억하려고 하기 보다는, 덜 중요한 정보는 잊고 중요한 정보는 보존하는 방식을 제안함.
즉, network의 parameter weight들이 importance를 계산하고, 계산된 importance를 기반으로 Regularization term을 통해 중요한 weigh의 update를 방지함.

3. Background

Catastrophic forgetting

: Neural Network가 다른 종류의 task를 학습하면 이전에 학습했던 task에 대한 성능이 감소하는 현상

Continual Learning

: Catastrophic forgetting을 해결하기 위해 나온 방법으로, 하나의 모델을 조금씩 업그레이드 시키면서 여러 task를 처리할 수 있도록 만드는 방법

4. Our Approach

4.1. Estimating parameter importance

Importance weight는 netwrok의 parameter 변화에 대한 learned function의 sensitivity
Data point에 대해 parameter가 small perturbation으로 변했을 때, output function의 변화량을 다음과 같이 근사함.
또한 Importance weight는 모든 data point에서 parameter에 대한 output function의 변화량의 평균
신경망의 output function이 multi-dimension인 경우 계산해야 하는 gradient가 많아짐.
efficient alternative로, L2 norm을 취한 learned function output의 gradient를 계산

4.2. Learning a new task

N번째 task의 학습에 대해 loss function은 다음과 같이 정의됨
important weight = high importance weight -> 새로운 parameter가 기존 parameter에서 크게 변하지 못함.
forgettable weight = low importance weight -> 새로운 parameter가 기존 parameter에서 크게 변함

4.3. Connection to Hebbian learning

전체 network가 아닌, 각 layer의 output function F에 근사
시넵스라는 개념 응용함.

5. Experiments

5.1. Object Recognition

ImageNet으로 pretrain한 AlexNet을 기반으로 실험 수행
Task dataset: MIT scene, Caltech-UCSD, Oxford Flowers

5.3. Longer Sequence

Sequence of 8 tasks에서 실험 수행
Stanford Cars, FGVC-Aircraft, VOC Actions, Letters, SVHN 5가지 task dataset 추가 사용
Sequence: Flower → Scenes → Birds → Cars → Aircraft → Actions → Letters → SVHN

5.2. Facts Learning

Natural image에서 Fact learning을 수행해 embedding space를 학습하는 task에서 실험
Fact는 Subject(S), Object(O), Predicate(P)로 구성

6. Discussion

Continual Learning에서는 크게 3가지 기법이 사용될 수 있음
1. Regularization: weight를 보수적으로 수정
1. Dynamic Structure: task에 따라 변동을 가함.
2. Memory
중요한 변수를 가져가는 것, 중요한 이전 정보를 가져가는 것이 continual learning의 핵심인 듯
실험 파트를 보고 구체적인 실험 설계를 알기 어려웠음.

to be data scientist

이전 포스트

[2022 Information Sciences] Contrastive autoencoder for anomaly detection in multivariate time series

다음 포스트

[AAAI 22] Design of Explainability Module with Experts in the Loop for Visualization and Dynamic Adjustment of Continual Learning

0개의 댓글