[논문 리뷰] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

한의진·2024년 9월 22일

스터디_리뷰

목록 보기

3/15

Goal

Producing ‘visual explanations’ for decisions from CNN-based models without decreasing performance of the model

Motivation

Having ability to explain why they predict what they predict in order to build trust in intelligent systems

CAM work trades off model complexity and performance for more transparency into the working of the model

Contribution

Proposing Grad-CAM model that generates visual explanations for CNN-based network without requiring architectural changes

Applying Grad-CAM to existing top-performing classification, captioning, VQA models

Proposing proof-of-concept of how interpretable Grad-CAM visualizations help in diagnosing failure modes by uncovering biases in datasets

Presenting Grad-CAM visualizations for ResNets applied to image classification and VQA

Help untrained users discern a ‘stronger’ network with others even they make identical predictions

Grad-CAM

Last convolutional layers are expected to have the compromise between high-level semantics and spatial information

Using the gradient information flowing into the last convolutional layer of the CNN
Assign importance values to each neuron for a particular decision

Computing gradient of the score for each class c and feature map activations of a convolution layer

Flow back to obtain the neuron importance weights
Weight represents a partial linearization of the deep network from A, captures the feature map k for target class c

Linear Combination with ReLU activation function
Preventing from highlighting more than desired class and performing worse
Grad-CAM generalizes CAM

Proving Grad-CAM generalizes CAM

CAM feature maps are spatially pooled using GAP and linearly transformed to produce a score Y(c) for each class c

F(k) is GAP output

CAM computes the final scores by multiplying weights

Taking gradient score with respect to map F(k) and substitute δF(k) as 1/Z

W(k) respect to class c can substitute to δY(c)/ δA(k)(ij) * z

Grad-CAM generalizes CAM

Rewrite and substitute ∑ ∑w(k)c to Z
Z is the number of pixels in the feature map
Grad-CAM is a strict generalization of CAM

Allowing to generate visual explanations from CNN-based models that cascade convolutional layers with more complex interactions(Image captioning, VQA)

Evaluating Localization Ability

Weakly-supervised Localization

Obtaining class predictions from network and generate Grad-CAM maps for each of the predicted classes and binarize them
Grad-CAM localization with pre-trained VGG-16: Grad-CAM localization errors are better than the error created by c-MWP(ILSVRC-15)

Weakly-supervised Segmentation

Segmenting objects with image-level annotation
New loss function for training weakly-supervised image segmentation models was proposed
Algorithms is sensitive to the choice of weak localization seed
Previously, CAM maps are used, replaced the CAM maps with Grad-CAM obtained from a standard VGG-16 network
Obtained a Intersection over Union (IoU) score of 49.6(44.6 obtained from CAM)

Evaluating Visualizations

Evaluating Class Discrimination

Obtaining category-specific visualizations using Deconvolution, Guided Backpropagation, and Deconvolution Grad-CAM and Guided Grad-CAM
Evaluating Trust
Guided Backpropagation received a slightly higher score for VGG-16 and Guided Grad-CAM received a higher score indicating that VGG-16 is clearly more reliable

실험 결과

AI의 탐지 결과 시각화의 초석이 되는 의미심장한 논문이다. 각종 이미지 데이터셋으로 실험을 해 보며, CheXNet 모델과 파인 튜닝하는 방법을 찾기 위해 노력해야겠다.

한의진

이전 포스트

[논문 리뷰] Emerging Properties in Self-Supervised Vision Transformers (DINO)

다음 포스트

[논문 리뷰] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

스터디_리뷰

Goal

Motivation

Contribution

Grad-CAM

Last convolutional layers are expected to have the compromise between high-level semantics and spatial information

Computing gradient of the score for each class c and feature map activations of a convolution layer

Proving Grad-CAM generalizes CAM

CAM feature maps are spatially pooled using GAP and linearly transformed to produce a score Y(c) for each class c

CAM computes the final scores by multiplying weights

Taking gradient score with respect to map F(k) and substitute δF(k) as 1/Z

W(k) respect to class c can substitute to δY(c)/ δA(k)(ij) * z

Grad-CAM generalizes CAM

Evaluating Localization Ability

Weakly-supervised Localization

Weakly-supervised Segmentation

Evaluating Visualizations

Evaluating Class Discrimination

실험 결과

[논문 리뷰] Emerging Properties in Self-Supervised Vision Transformers (DINO)

[논문 리뷰] DINOv2: Learning Robust Visual Features without Supervision

0개의 댓글

관련 채용 정보

[논문 리뷰] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

스터디_리뷰

Goal

Motivation

Contribution

Grad-CAM

Last convolutional layers are expected to have the compromise between high-level semantics and spatial information

Computing gradient of the score for each class c and feature map activations of a convolution layer

Proving Grad-CAM generalizes CAM

CAM feature maps are spatially pooled using GAP and linearly transformed to produce a score Y(c) for each class c

CAM computes the final scores by multiplying weights

Taking gradient score with respect to map F(k) and substitute δF(k) as 1/Z

W(k) respect to class c can substitute to δY(c)/ δA(k)(ij) * z

Grad-CAM generalizes CAM

Evaluating Localization Ability

Weakly-supervised Localization

Weakly-supervised Segmentation

Evaluating Visualizations

Evaluating Class Discrimination

실험 결과

[논문 리뷰] Emerging Properties in Self-Supervised Vision Transformers (DINO)

[논문 리뷰] DINOv2: Learning Robust Visual Features without Supervision

0개의 댓글

관련 채용 정보

실험 결과