[CV] Training Technique

흐어어·2023년 3월 28일

BoostAI Knowledge distillation augmentation transfer learning

BoostAI

목록 보기

1/3

Tranining Teqhnique in CV

Data Augmentation
Leveraging pre-trained information
Leveraging unlabed dataset for training

Data Agmentation

Training set과 Real data distribution간에는 gap이 존재한다. Train set의 경우 biased 된 경우가 대부분이다. 이들을 시각화해서 고려하면 실제 데이터의 경우 높은 dense를 이루면서 분포 상에 공백이 없을 것이다. 반면 훈련 데이터의 경우 point 형태로 공백이 많을 것이다. 모델의 경우 훈련 데이터 상의 point 들에 biased 돼서 학습된다. Augmentation의 경우 이러한 training datset’s distribution과 real data distribution 간의 gap을 줄이는 방법으로 볼 수 있다.

Image Augmentation 대표 예시

Brightness : 밝기 변환
Rotate, flip, Affine tranformation : 기하학적 변환
Crop : 일부분 자르기
CutMix : 서로 다른 이미지를 섞고 label 또한 섞는다. ex) [1,0]+[0,1] → [0.7, 0.3]

RandAugment

그 밖에 Image Augmentation 관련 기법은 다음과 같이 다양하다.

identity
rotate
posterize
sharpness
translate-x
autocontrast
solarize
xontrast
shear-x
translate-y
equalize
color
birghtness
shear-y

현재는 다양한 Augmentation 기법들을 랜덤하게 적용해서 최적의 조합을 찾아내는 Rand Augment 기법을 사용한다. Policy 라는 이름으로 Augmentation 방법들의 조합을 만들고 각 Policy 별로 Magnitude 즉 정도를 달리 설정해서 최적의 Policy와 Magnitude를 찾는 방식으로 RandomAugment를 적용한다.

(관련 paper : Randaugment: Practical automated data augmentation with a reduced search space)

Pre-trained model 사용 technique

Transfer learning

규모가 작은(개수가 적은) 데이터셋으로 학습해야 하는 상황에서 사용하는 learning 방법
사전에 다른 데이터셋으로 pre-trained 된 모델을 사용하는 방법이다. (pre-trained 모델의 knowledge를 사용하는 관점으로 볼 수 있다.)
다양한 데이터셋의 특징을 반영할 수 있다. (A 데이터셋으로 학습된 모델을 또 다른 B 데이터셋으로 학습하면 해당 모델은 A, B 두 가지 모두의 피쳐를 학습한다고 볼 수 있다.)

Transfer learning Approch

첫 번째 접근 방법은 pre-trained model의 가중치는 고정시킨 채 이후 레이어에 대해서만 학습을 진행하는 방식이다. 이는 pre-trained model을 feature extractioin 역할로만 사용하고 뒷단의 FC layer로 판단을 하는 방식이며 데이터 개수가 적을 때 사용할 수 있다.

두 번째 접근 방법은 첫 번째 방식과 달리 모든 레이어의 가중치를 업데이트 하는 것이다. 첫 번째 보다 상대적으로 데이터가 더 많을 때 사용할 수 있다. 대신 pre-trained 모델의 경우 learning rate를 작게, 이후 모델은 크게 설정해서 사전 학습된 모델은 학습을 통한 업데이트 폭을 크게 설정하지 않도록 한다.

Approch 1 : Freeze weight(pre-trained model) + Update weight(FC layer)
Approch 2 : Fine-tunning the whole model

Knowledge distillation

Teacher Model(pre-trained)
Student Model(Not trained)

pre-trained 된 Teacher Model의 knowledge를 Student Model이 학습할 수 있도록(kowledge만 뽑아오도록 : knowledge를 distillation함) Student Model을 학습하는 방법이다. Unlabeled, Labeled 데이터셋 모두 적용가능하다.

Knowledge distillation at Unlabeled Dataset

Unlabeled Dataset을 우선 Teacher Model에 입력한 후, ouput 값을 Student Model이 학습하도록 구성한다. 즉 Student Model은 Unlabeled Dataset과 Teacher Model의 output을 Labeleld로 사용하는 것이다. 이때 가중치 업데이트는 Student Model에서만 발생한다. 결과적으로 Teacher Model로 Unlabeled Dataset의 Label을 생성한 후, Student Model을 Supervised learning 하는 방식이다. (Loss의 경우 KLD(teacher, student)를 사용한다.- Distillation Loss)

Knowledge distillation at Labeled Dataset

Labeling 되어 있는 Dataset의 경우 Student Loss를 추가한다. 즉 Labeling 되어 있는 데이터셋을 먼저 Teacher Model에 입력해서 oupput 값을 구한 후 Studnet Model과 Distillation Loss를 구한다. 그리고 추가적으로 Labeling 값과 Student Model의 Output 값에 대한 Loss를 추가적으로 계산한다. 결과적으로 Disllation Loss, Student Loss 두 가지를 사용한다. 주의 할 점은 Teacher Model과 Student Model 모두 ouput 값이 softmax를 거치게 되며 Student model의 Student Loss를 구하는 부분을 제외하고는 T 값을 나눈 값으로 Softmax 함수를 취하도록 한다. (값이 극단적으로 쏠리는 것을 막는다.)

Semi-supervised learning

일부 Labeled Dataset과 그보다 개수가 더 많은 Unlabeled Dataset을 함께 사용하는 방식이다. 우선 1) Labeled Dataset으로 학습을 시킨 후 2) Unlabeled Dataset을 입력해서 Psedo-labeled dataset을 구축한다. 그리고 기존의 Labeled Dataset과 Psedo-lableed Dataset을 모두 활용해 Re-train 즉 재학습을 진행한다. 결과적으로 Labeled Dataset과 Unlableld Dataset을 함께 사용하는 방식임을 알 수 있다.

Self-training

Augmentation+Teacher-Student networks+semi-supervised learning

(관련 paper :Self-Training With Noisy Student Improves ImageNet Classification)

Augmentation, Teacher-Student netoworks, semi-supervised learning 방식을 모두 결합해서 사용하는 기법이다. Augmentation 기법을 적용하면서 , semi-supervised learning의 모델에 Teacher, Student 모델을 사용하는 흐름이다. 진행과정은 다음과 같다.

teacher model을 labeled data로 학습한다.
pre-trained teacher model로 unlabeled dataset에 대한 pseudo-labeled(teacher model의 output)을 만든다.
student model로 teacher model로 구축된 pseudo-labeled dataset과 기존의 labeled dataset으로 학습 진행한다.
학습된 student model을 다시 teacher model로 사용한다.
2~4 과정을 반복한다. → 기존과 달리 student model이 계속해서 커지게 된다.

Reference

부스트코스 부스트캠프 AI 강의

흐어어

이것저것...

다음 포스트