[논문 리뷰]Explaining the black box model: A survey of local interpretation methods for deep neural networks

흔한 직딩이의 벨로그·2023년 11월 23일

XAI

Paper Review

목록 보기

1/4

논문 링크(출판정보)
https://www.sciencedirect.com/science/article/abs/pii/S0925231220312716
저널 Neurocomputing 419 (2021)

Summary

Global interpretation : 모델의 로직에 관련된 이해를 바탕으로, 모델이 예측하는 모든 결과를 설명하는 모델
- 모델의 구조로부터 모든 예측 결과에 대한 설명이 가능해야함(Decision Tree) -> 모델 자체적으로 이미 해석력을 확보
Local interpretation : 특정한 의사 결정 또는 하나의 예측 결과만 설명
- 설명할 범위가 global에 비해 적어 cost가 작음

Local interpretation 분류
- Data-driven : 모델 자체는 black-box. input data를 조정하여 output 변화에 대해 해석
  - Perturbation-based method : input image에 masking -> 모델에 넣어 prediction(ZFnet, LIME, SHAP, Cxplain 등)
  - Adversarial-based method : 중요하다 생각되는 픽셀을 다른 adversarial image와 합성 -> 모델에 넣어 prediction(Anchor 등)
  - Concept-based method : 모든 object는 (사람이 정한)고유한 concept을 가지고 있다. (ex. 호랑이 -> 호랑이의 줄무늬) (TCAV, Network Dissection)
- Model-driven : 모델의 구성요소를 직접 보고 분석
  - Gradient-based method : 내부 구성요소(ex. 뉴런 및 가중치) 분석(scoring function, guided backpropagation 등)
  - Correlation-score : Input pixel의 상관관계 점수를 구함(LRP, DeepLIFT)
  - Class activation map : CAM
결론
- Evaluation standard
- 대다수의 interpretation은 CNN구조에 집중되어 있음

발전 방향
- Fine-grained interpretation
- Model innovation
- Robustness
- Expansion of derivatives

Introduction

Global interpretation

모델의 로직 관련된 이해를 바탕으로, 모델이 예측하는 모든 결과를 설명
장점 : DNN model의 training process 분석가능
단점 : 많은 양의 연산으로 인해 특정 단순 model에 한정되어 분석

Local interpretation

특정한 의사 결정 또는 하나의 예측 결과만 설명
Data driven interpretation, Model driven interpretation 으로 분류

Data driven Interpretation

model의 working mechanism 이해X, Input data만 분석

A. Perturbation based Interpretation

original data를 masking
masking된 데이터를 모델에 집어넣어 prediction 뽑아내기

(1)ZFnet

Deconvolution visualization으로 알려져 있지만 perturbation based interpretation의 original idea.
모델 test과정에서 gray square로 테스트 이미지에 masking -> masking된 곳이 결과에 미치는 영향을 분석
단점 : lost information & 잘못된 interpretation, Alexnet에만 적용

(2)LIME

데이터 분류 모델을 local한 영역만 봐서 그 local한 영역을 통해 prediction의 이유에 대해 파악
image, text 등에 다양하게 적용 가능
과정
- input -> result y1, 왜 y1이 나왔을까?
- input -> superpixel(전처리과정) -> input과 비슷한 perturbated image
- perturbated image -> 원본 모델에 넣어 logit 추출
- 데이터셋 (perturbated image, logit) -> 새로운 선형 모델에 학습 -> y1

(3)ZFnet VS LIME

local surrogate model 사용
perturbed data에 서로 다른 weight를 줌

(4)SHAP

게임이론에서 중요하게 생각하는 Shapley value(game : ML model, players : feature included in ML)
- Shapley value와 feature간 독립성을 핵심 아이디어로 사용
- Shapley value는 전체 성과를 창출하는데 각 참여자가 얼마나 공헌했는지 수치로 표현 가능
- 각 feature의 기여도는 그 feature의 기여도를 제외했을 때 전체 성과의 변화 정도
  - ex)A씨가 찾은 주택이 강을 끼고 있기때문에 집값이 높다고 추정된다면 강가로부터의 거리를 강제로 늘렸을 때 집값이 어떻게 변할지 예측한 다음, 이를 원래 집값에서 빼면 그차이가 집값에 이바지 하는 정도라고 추론
LIME은 original instance와의 유사도에 따라 instance에 가중치 부여
SHAP는 Sample의 instance를 얻기 위해 Shapley value 평가

(5)Cxplain

전체적 아이디어는 LIME, SHAP와 비슷
Disturbing the area 한 후에 marginal contribution을 정의하기 위해 Causal objective function을 제공 👎
Causal objective function으로 계산량 감소, more expressive
Only image data에만 사용가능

(6)Fong의 three obvious proxies-replacing the region

원본 이미지의 입력이 출력에 가장 안좋은 방향으로 영향을 미치는 mask를 얻는 것이 목표
Constant value, injecting noise, Gaussian blur operation
- I : average color
- u : 각 pixel
- m : mask, [0,1], m(u) : scalar value
- ${\mu}$ : pixel의 평균값
- ${\eta}$ (u) : gaussian noise samples for each pixel
- ${g_\theta}$ : the Gaussian blur kernel

1에 가까운 m을 찾아 input이미지를 최소한으로 삭제하게끔

(7)Dabkowski의 SSR, SDR

Deletion, reservation에 대해 명확한 묘사를 하기위해 SSR(Smallest Sufficient area) & SDR(Smallest disruption area)
SSR : 이미지의 제일 작은 영역, 이 영역만으로 Confident Classification이 가능
SDR : 이미지의 제일 작은 영역, 이 영역이 제거되면 Confident classification이 방해

(8)FIDO

SSR과 SDR의 Concept에 근거
더 나은 perturbation을 생성하기 위해 generative model 사용
Perturbed image를 marginalizing한 상태로 두고 나머지 영역에 대해서 generation model을 통해 조정하여 FIDO가 filling👎
filling한 후 classifier의 결정을 가장 잘 바꿀 수 있는 영역을 찾는 것이 목표

B. Adversarial based interpretation

Perturbation based method에서 파생
Robust한 interpretation을 주는 것이 목적(작은값이나 이상값에 영향을 덜 받는 interpretation)

cf)poor generalization?

image가 매우 민감하기 때문에 some perturbation은 결과에 큰 impact를 줌
즉, 매우 작은 perturbation이 classifier의 prediction을 바꿈
- Perturbation data가 자체적으로 갖고 있는 interference information 때문에

(1)Fong의 robust interpretation을 위한 두가지 method

single-learned mask를 표준으로 사용X
- optimal solution을 위해 large number of random mask를 사용
Total-variation norm으로 마스크를 정규화하고, 낮은 resolution version에서 upsample 수행
장점
- 적절한 mask size를 세팅하는 것에 도움이 됨
- high-frequency noise가 결합된 random constant color를 뽑아내기 때문에 bias를 효과적으로 감소시킴

(2)Wagner의 filtering gradients

Fong의 두가지 방법은 large number of hyperparameter를 발생시킴, 수동으로 parameter를 조절해야 효과를 볼 수 있음
- fine-grained되기 힘들고, 애매모호한 방법
optimized interpretation method기반의 filtering gradients 제안 👎
confrontation data를 통해 core feature과 ordinary feature를 구분 👎
- gradient filtering을 통해 valuable feature를 보존해줌

(3) Anchor의 rule based interpretation

많은 양의 adversarial data
anchor 이외의 것은 변경해도 결과에 영향X
Candidate interpretation, Adversarial adaption
Anchor가 제공하는 interpretation을 세분화하기 위해 많은 양의 interpretation 후보 세트를 multiple method로 공동 생성
생성된 candidate interpretation이 robust하다면, 서로 다른 adversarial 환경에서도 interpretation의 prediction은 변하지 않음
- interpretation 하고자 하는 대상을 다양한 adversarial 환경에 융합하고 다양한 상황에서 Prediction이 변하지 않는다면 interpretation이 highly robust

C. Concept - based interpretation

concept : class가 가지는 어떠한 특징
사람이 지정
TCAV, Network Dissection

(1) TCAV(Post-training explanation)

class(object)가 가지는 특징을 표현하는 사람이 지정한 image dataset & 랜덤 image dataset
모델이 사람이 선택한 concept에 대한 중요도를 수치적으로 측정하겠다.
과정
- 줄무늬, 랜덤이미지 label1, label2 두클래스
- train된 모델에 넣어 중간에 있는 featuremap 추출
- feature map의 label1과 label2를 구분하는 linear classifier를 train
- 그 경계선의 직교하면서 stripe를 가리키는 방향의 벡터를 unit vector(concept activation vector)
  -> feature map이 unti vector방향으로 움직였을때 점점 stripe이 가지고 있는 특성으로 수렴(컨셉을 설명해주는 vector)
  
  즉, concept를 수학적으로 정의하는 measurement

(2) Network Dissection

hidden unit(neuron)과 다양한 concept의 연관성
Dataset : Broden(broadly and densely labeled dataset)
각각의 CNN의 hidden unit(neuron)과 concept의 binary segmentation끼리 IOU를 계산
과정

Step1 : Identify a broad set of human-labeled visual concepts(Broden)
- human-labeled visual concepts
- pixel-wise binary segmentation map으로 labeled된 visual concept
- 이미지에는 픽셀 단위로 라벨링(texture와 scene 제외)
  - multiple label(left front black cat leg -> cat, leg, black)
- 이미지의 모든 픽셀에는 색상 라벨
- 63,305 images with 1197 visual concepts(textures, colors, materials, parts, objects, and scenes)
Step2 : Gather the response of the hidden variables to known concepts
- convolutional hidden unit의 binary segmentaion 얻기
  - Broden을 모델에 forward pass한 후 each convolutional unit의 binary segmentation map
Step3 : Quantify alignment of hidden variable-concept pairs
- human labeleld concept의 binary map과 hidden unit의 binary map의 관계를 IOU를 통해 계산

Intersection over Union score(IOU)
${sum |M_k(x) \cap L_c(x)| \over sum |M_k(x) \cup L_c(x)|}$
- x : input image
- $M_k(x)$ : upsample 된 unit(bilinear interpolation, 이중선형보간법)

Model-Driven interpretation

모델 전체의 working mechanism 분석X
내부 구성요소(ex.뉴런 및 가중치) 분석
구현 easy, 작은 연산량

Gradient-based interpretation

backpropagation 기반
- Saliency map
- ZFnet

CNN모델을 가시화하여 CNN의 중간과정을 보고 개선 방안을 파악해보자
CNN의 중간 layer의 feature map은 그 자체로는 해석하기 어렵기 때문에 입력 이미지에 mapping해서 분석하겠다.
- Conv : feature map -> Convolution(filter) -> activation(Relu) -> pooling -> 이미지 축소
- Deconv : Unpooling -> activation(ReLU) -> Deconvolution
  - maxpooling 하면 위치정보 알 수 없음 -> switch라는 개념을 도입해 위치정보 저장(unpooling 할 때 활용)
  - relu 하게 되면 0을 복원할 방법이 사라짐. 하지만 0을 복원하지 않더라도 큰 문제가 되지 않음
  - Deconvolution

(1) Simonyan의 Scoring function

non-linear한 상황을 고려하지 않고 scoring function을 이용하여 각 pixel의 score를 구함
non-linear -> Taylor formula로 근사
gradient크기가 pixel의 중요도

${f_c(I) = w_c^TI + b_c}$

(2) Springenberg의 guided backpropagation 기반 Visualization algorithm

(3) Yosinski의 open source, regularized optimization(L2 decay, gaussian blur, clipping pixel with small norm, clipping pixel with small contirbution)

(4) Smilkov의 Visually sharpen gradient based sensitivity maps

흥미있는 이미지 추출 $\rightarrow$ noise 추가 $\rightarrow$ 비슷한 이미지 sampling $\rightarrow$ Sample 이미지의 평균 sensitivity map 계산

Correlation-score interpretation

gradient-based와 correlation-score 둘다 back propagation에 기반함
하지만 correlation-score는 gradient 계산이 아니라 각 input pixel의 correlation score를 계산

(1) LRP

각 뉴런은 어느 정도의 기여도(relevance)를 가짐
relevance는 top-down 방식으로 output에서 input방향으로 재분배
재분배될때 기여도는 보존
분해한 요소들이 원본 이미지까지 도달했을 때 원본 이미지에 상대적 기여도를 표시함으로써 모델 해석

input sample x = (x1, x2, ... , xi, ... , xd)
f(x) = ${\sum_{i=1}^d R_i}$
- logit값을 각각의 입력에 대한 기여도(relevance score)로 분해

(2) DeepLIFT

Class activation map

CAM
- Convolution layer는 layer를 거친 후에 공간적 정보를 보존하지만, FC는 flatten 과정을 거쳐서 공간적 정보를 손실
  - 그래서 마지막 convolutional layer에는 풍부한 공간적 의미와 구체적 정보가 담겨져 있는데 이 정보를 활용하기 어려움
  - GAP(global average pooling)을 통해 활용

Discussion

interpretation을 평가하는 방법이 모호함
즉, interpretation에 대한 평가 기준이 없음
interpretation method를 정량화된 점수로 비교할 수 있는 정량화 표준이 필요
발전방향(Direction)
- Fine-grained interpretation
  - only general scope interpretation -> fine-grained interpretation
- Model innovation
  - interpretation은 내부 구조의 이해 뿐만이 아니라 새로운 DNN모델의 탄생이 될 수 있음
- Robustness
  - 다양한 아이디어를 통해 견고한 interpretation을 제공하지만 여전히 불안정성 및 변동성에 관한 문제 발생
  - More robustness interpretation method
- Expansion of derivatives
  - Time-delay neural network and reinforcement learning network 는 interpretable method에서 중요한 트랜드

흔한 직딩이의 벨로그

주피터...

다음 포스트