Relevance-CAM: Your Model Already Knows Where to Look

이은상·2024년 4월 8일

논문리뷰

목록 보기

7/23

📄Relevance-CAM: Your Model Already Knows Where to Look

written by Jeong Ryong Lee, Sewon Kim, Inyong Park, Taejoon Eo, and Dosik Hwang

Introduction

NN이 발전하고 적용분야가 넓어짐에 따라 모델을 설명하는(해석할 수 있는) 능력 또한 중요해짐. 컴퓨터비전 분야에서는 Class Activation Map(CAM) based methods and the decomposition based methods와 같은 다양한 analyzing model이 있음.

CAM based method는 모델의 decision을 시각화하기 위하여 weighted linear summation of the last convolutional feature map을 계산함.

그러나 gradient based CAMs는 몇 가지 단점들을 지니고 있음

shattered gradient problem에 취약함
layer의 위치에 따라 분석력이 달라짐

본 논문에서는 LRP를 사용하여 gradient based CAMs의 단점을 극복한 Relevance-weighted Class Activation Map(Relevance-CAM)을 소개함

Background

CAM

explanation method for visualizing class specific regions through a linearly weighted combination of the last convolutional layer output before the global pooling layer

각 채널 별 내용? 요약하고 이를 계산하여 output 도출

Grad-CAM

CNN의 architecture에 상관 없이 사용될 수 있도록 하고자 고안됨

motivation

activation maps는 certain layer로 extracted된 feature map
activation map to a class의 중요도는 activation maps의 기울기로 정의

A: activation map in the k-th chnnel of the last convolutional layer
y: the model output for the class c
alpha: weighting component of Grad-CAM
GP: Global Pooling function

Layer-wise Relevance Propagation(LRP)

explains a model through layer wise decomposition(층별 분해) of its structure, propagating the relevance score from the output to the input in layerwise manner

점수의 propagation은 definitions로 진행됨

a relevance score is conservative if the sum of assigned relevance in the pixel space corresponds to the total relevance detected by the model
a relevance score is positive if all values forming the heatmap are greater or equal to zero

도출되는 relevance

Contrastive Layer-wise Relevance Propagation(CLRP)

LRP's drawback 해결
최종 레이어에서 타겟이 아닌 클래스의 관련성 감소(타겟 클래스에 대한 히트맵의 민감도 증가

Gradient Issue

noisiness and discontinuity
network가 깊어지면 기울기가 noisy하고 discontinuous해짐
= shattered gradient problem 발생
Explanation to sensitivity
Grad-CAM은 sensitivity 측정은 가능하지만 중요도를 할당하는 activation value는 측정이 불가함 → false confidence

이러한 문제들을 LRP를 사용하여 해결
(LRP의 relevance score를 class activation mapping의 weighting component로 사용)

Relevance-weighted Class Activation Map

오직 맨 마지막 convolution layer만 모델의 output에 영향을 미치는 것은 아님
그러나 기존의 CAM들은 last layer에 대한 분석만 가능했음

Relevance-CAM은 이러한 기존의 CAM들과 달리 얕은 깊이에 있는 layer에서도 정보를 얻을 수 있음

gradient issue를 해결하기 위해 relevance score obtained through LRP를 weighting component로 사용하고, class sensitivity를 얻기 위하여 CLRP를 적용함.

Relevance-CAM equation

alpha는 relevance의 합으로 k-th channel activation map to the target class output score의 importance/contribution.

Relevance-CAM은 only one forward propagation and one backpropagation으로 계산 될 수 있음
relevance value 스스로는 target class output의 contribytion으로 해석될 수 있음

Experiment

Depth-wise visualization

layer 별 heatmap을 그려본 결과,
layer 2에서 Grad-CAM과 Grad-CAM++은 특정 구역을 localize하지 못했지만, Score-CAM과 Relevance-CAM은 잘 하는 것을 확인할 수 있음
더하여 Score-CAM보다 Relevance-CAM의 heatmap이 더 깔끔하게 그려짐

Relevance-CAM의 heatmap을 통해 얕은 레이어에서도 클래스의 특정 정보를 얻어낼 수 있다는 것 또한 알 수 있음

Average Drop(A.D.)와 Average Increase(A.I.)를 사용하여 objective faithfulness를 측정한 결과

layer4에서는 다들 성능이 비슷하나 layer2에서는 Relevance-CAM의 성능이 눈에 띄게 좋음을 알 수 있음
AD는 낮을수록 좋고, AI는 높을수록 좋음

나아가 ResNet이 VGG보다 낮은 layer에서 class의 feature를 잘 뽑아냄을 알 수 있음

Evaluation for selectivity

Grad-CAM에서 낮은 레이어의 경우 shattered gradient problem이 발생함을 확인할 수 있음, 그러나 Relevance-CAM은 shattered gradient problem에 robustness함

그리고 히트맵을 보았을 때, noisy weighting component of Grad-CAM은 important feature maps를 뽑아내지 못함을 알 수 있음. 반면에 Relevance-CAM은 good weights for the important channel에 대하여 제공함

Evaluation for Localization

localization ability of attention map은 aliency map이 localization task에 제공될 수 있기 때문에 중요하여 이에 대하여 측정해봄

Relevance-CAM은 배경과 객체를 잘 분리해냄(객체의 크기가 작아도)

IoU metric으로도 성능 측정을 해본 결과, Relevance-CAM은 layer가 shallower해져도 성능에 차이가 적었음

IoU는 높을수록 good at localization이라는 뜻

Class Sensitivity Test

Relevance-CAM은 ResNet50의 layer 1의 히트맵부터 객체를 잘 분리해냄
이를 통해 낮은 layer부터 local features만 extracted된 것이 아닌, specific information까지 추출해낼 수 있었다는 것을 알 수 있음

Sanity check for Relevance-CAM

Relevance-CAM은 cascading randomization test를 통해 evaluated.
그러나 saliency map은 parameter randomization으로 인해 destroyed.

즉, 이 method는 sensitive to model parameter

Evidence that class specific information is extracted from shallow layers

계속 말했다시피, 낮은 단계의 레이어에서도 다른 클래스들의 객체들은 localized separately 가능함
그러나 class specific features는 shallow layers에서도 추출될 수 있지만, 깊이가 깊어질수록 higher level features가 추출됨

Conclusion

Relevance-CAM은 shattered gradient problem and False Confidence와 같은 문제들에 강함

이러한 장점들 덕분에 shallow layers에 대하여 분석이 가능하고 해당 layers에서 class specific features를 추출할 수 있음

이 점들을 통해 transfer learning, model pruning 등과 같은 분야에서 사용될 수 있을 것임.
Relevance-CAM을 통해 deep learning model에 대한 깊은 분석이 가능할 것임.

이은상

이전 포스트

DialogueRNN: An Attentive RNN for Emotion Detection in Conversations

다음 포스트