[논문 리뷰] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

한의진·2024년 9월 22일
0

스터디_리뷰

목록 보기
3/15
post-thumbnail

Goal

Producing ‘visual explanations’ for decisions from CNN-based models without decreasing performance of the model

Motivation

Having ability to explain why they predict what they predict in order to build trust in intelligent systems

CAM work trades off model complexity and performance for more transparency into the working of the model

Contribution

Proposing Grad-CAM model that generates visual explanations for CNN-based network without requiring architectural changes

Applying Grad-CAM to existing top-performing classification, captioning, VQA models

Proposing proof-of-concept of how interpretable Grad-CAM visualizations help in diagnosing failure modes by uncovering biases in datasets

Presenting Grad-CAM visualizations for ResNets applied to image classification and VQA

Help untrained users discern a ‘stronger’ network with others even they make identical predictions

Grad-CAM

Last convolutional layers are expected to have the compromise between high-level semantics and spatial information

Using the gradient information flowing into the last convolutional layer of the CNN
Assign importance values to each neuron for a particular decision

Computing gradient of the score for each class c and feature map activations of a convolution layer

Flow back to obtain the neuron importance weights
Weight represents a partial linearization of the deep network from A, captures the feature map k for target class c



Linear Combination with ReLU activation function
Preventing from highlighting more than desired class and performing worse
Grad-CAM generalizes CAM

Proving Grad-CAM generalizes CAM

CAM feature maps are spatially pooled using GAP and linearly transformed to produce a score Y(c) for each class c

F(k) is GAP output

CAM computes the final scores by multiplying weights

Taking gradient score with respect to map F(k) and substitute δF(k) as 1/Z

W(k) respect to class c can substitute to δY(c)/ δA(k)(ij) * z

Grad-CAM generalizes CAM

Rewrite and substitute ∑ ∑w(k)c to Z
Z is the number of pixels in the feature map
Grad-CAM is a strict generalization of CAM

Allowing to generate visual explanations from CNN-based models that cascade convolutional layers with more complex interactions(Image captioning, VQA)

Evaluating Localization Ability

Weakly-supervised Localization

Obtaining class predictions from network and generate Grad-CAM maps for each of the predicted classes and binarize them
Grad-CAM localization with pre-trained VGG-16: Grad-CAM localization errors are better than the error created by c-MWP(ILSVRC-15)

Weakly-supervised Segmentation

Segmenting objects with image-level annotation
New loss function for training weakly-supervised image segmentation models was proposed
Algorithms is sensitive to the choice of weak localization seed
Previously, CAM maps are used, replaced the CAM maps with Grad-CAM obtained from a standard VGG-16 network
Obtained a Intersection over Union (IoU) score of 49.6(44.6 obtained from CAM)

Evaluating Visualizations

Evaluating Class Discrimination

Obtaining category-specific visualizations using Deconvolution, Guided Backpropagation, and Deconvolution Grad-CAM and Guided Grad-CAM
Evaluating Trust
Guided Backpropagation received a slightly higher score for VGG-16 and Guided Grad-CAM received a higher score indicating that VGG-16 is clearly more reliable


실험 결과


AI의 탐지 결과 시각화의 초석이 되는 의미심장한 논문이다. 각종 이미지 데이터셋으로 실험을 해 보며, CheXNet 모델과 파인 튜닝하는 방법을 찾기 위해 노력해야겠다.

0개의 댓글

관련 채용 정보