A Unified Approach to Interpreting Model Predictions

Eun·2023년 2월 1일

모델의 결과를 설명하는 explainable AI
LIME: 개별적인 prediction에 대한 설명을 할 수 있는 방법론
SHAP: 모델 전체의 Feature Importance를 합리적인 방식으로 설명하는 방법론

2023년 1월 24일 기준 인용수 10,823회 (짱이당)

모델이 특정 예측을 하는 이유를 이해하는 것은 아주 중요합니다.
(모델의 accuracy만큼 중요함!)

하지만 sota를 찍는 대부분의 대규모 모델들은 해석 가능성이 희박합니다.

XAI 방법론들이 많이 제시되고 있지만 이런 방법론들이 어떻게 관련되어 있고 언제 어떤 방법론이 더 적절한지는 불문명합니다.

이런 문제를 해결하고자 interpreting prediction의 framework인 SHAP를 제안합니다.

SHAP는 각 feature에 특정 prediction에 대한 importance value를 할당합니다.

novel component
1) ?
2) ?

새로운 class는 6개의 기존 방법을 통합합니다.

SHAP는 이전 접근법보다 향상된 계산 성능과 인간 직관과의 일관성을 보여주는 새로운 방법을 제시합니다.

1. Introduction

예측 모형의 output을 올바르게 해석하는 능력은 매우 중요합니다.
1) user trust
2) 모델 개선 방법에 대한 통찰력 제공
3) 모델링 중인 프로세스에 대한 이해도

일부 application에서는 해석의 용이성 때문에 정확도가 낮더라도 단순한 모델(linear model)을 선호하는 경우가 많습니다.

그러나 빅 데이터의 availability가 증가함에 따라 복잡한 모델을 사용하는 이점이 증가하여 모델의 정확도와 해석 가능성 사이의 균형이 중요해졌습니다.

그래서 모델 예측을 해석하는 새로운 통합 접근법을 제시합니다.
1) 모델의 예측에 대한 모든 설명을 모델 자체로 봄.
이를 explanation model이라고 함.
이를 통해 additive feature attribution method의 class를 정의할 수 있음.

2) unique solution을 보장하는 Game theory 결과가 전체 클래스의 additive feature attribution method에 적용된다는 것을 보여줌.
다양한 방법이 근사하는 feature importance의 unified measure로 SHAP value를 제안함

3) 새로운 SHAP value estimation method를 제안함.
실험 결과

인간의 직관과 더 잘 일치함.
기존의 여러 방법보다 모델 output class를 더 효과적으로 구별함.

2. Additive Feature Attribution Method

앙상블이나 deep network 같은 복잡한 모델의 경우 그 자체로 설명하기 어렵기 때문에 더 단순한 모델을 사용하여 설명합니다.
단순한 모델을 explanation model이라고 정의합니다.
본 논문에서는 explanation model에 대한 새로운 분류인 additivie feature attribution method(AFAM)를 제안합니다.

Notation

$f$ : 설명이 필요한 원래 모델 (original prediction model to be explained)

$g$ : $f$ 를 설명하기 위한 단순화된 explanation model

$x$ : $f$ 에 들어가는 원래 input

$f(x)$ : input $x$ 에 대한 output

$x'$ : $g$ 의 input으로 들어가는 $x$ 의 단순화된 형태

$h_x$ : $x'$ 을 $x$ 로 매핑하는 함수, $x=h_x(x')$

explanation model = surrogate model

LIME에서 제안된 local method
: single input $x$ 를 기반으로 예측 $f(x)$ 를 설명하도록 설계된 방법
~~Explanation model은 mapping function $h_x$ 를 통해 original input $x$ 로 mapping 되는 simplified input $x'$ 을 주로 사용합니다.~~
$f(x)\approx g(x')$ 을 만족
따라서 $z'\approx x'$ ( $x'$ 와 가까운 $z'$ 들)이 $g(z')\approx f(h_x(z'))$ 을 만족해야 g가 국소적으로 합리적인 explanation model입니다.

Definition 1
Additive feature attribution methods는 다음을 만족하는 이진 변수에 대한 선형 Explanation model 를 갖는다.
$g(z')=\phi_0+\sum_{i=1}^{M}{\phi_iz_i'}$
$where$ $z' \in \{0,1\}^M, \phi_i \in \mathbb{R}$
$M$ is the number of simplified input features

$g(z')$ 가 $f(x)$ 에 근사한다.
z'은 0 또는 1의 값
즉, 0이 아닌 feature 값들을 모두 1로 만들어서 input의 feature 수를 줄이고 이 중에서 어떤 변수가 중요한지 판단합니다.

기존의 방법론들이 additivie feature attribution method에 포함됩니다.

2.1 LIME

2.2 DeepLIFT

2.3 Layer-Wise Relevance Propagation

2.4 Classic Shapley Value Estimation

cooperative game theory를 사용한 방법들
1) Shapley regression values
2) Shapley sampling values
3) Quantitative Input Influence

1) Shapley regression values
다중공선성(multicollinearity)이 존재하는 선형 모델에서의 변수 중요도입니다.
각 변수들이 학습에 포함되었을 때 얼마나 모델의 성능에 영향을 미치는지에 따라 importance value를 부여합니다.
즉, 모든 feature의 subsets S로 모델을 retraining 합니다.

$\phi_i = \sum_{S \subseteq F \setminus \left\{ i \right\}} \frac{|S|! (|F| - |S| - 1)!}{|F|!} \left[ f_{S \cup \left\{ i \right\}} (x_{S \cup \left\{ i \right\}}) - f_S(x_S) \right]$

$F$ : 모든 변수의 집합
$S$ : $S\subseteq F$ , F의 부분 집합
$f_s$ : S에 포함된 변수만 포함한 input

$h_x$ 는 original input space에서 1 또는 0으로 mapping
1은 model에서 포함된 input을 의미
0은 model에서 포함되지 않은 input을 의미
(즉, feature의 포함 여부를 의미)
$\phi_0=f_{ \varnothing}( \varnothing)$ 일 때, Equation 1에 해당되고 additive feature attribution method임

2) Shapley sampling values
Shapley sampling values는 (1) 위 식에 sampling approximations를 적용하고,
(2) 훈련 데이터 세트의 샘플을 통합하여 모델에서 변수를 제거하는 효과를 근사화하여
모델을 설명하는 것을 의미한다.
이를 통해 모델을 재교육할 필요가 없으며 $2^{|F|}$ 미만의 계산이 차이난다.
Shapley 표본 추출 값의 설명 모형 형식은 Shapley 회귀 분석 값의 설명 모형 형식과 동일하므로,
additive feature attribution method이기도 합니다.

3) Quantitative Input Influence
Quantitative Input Influence는 feature attribution을 다루는 더 넓은 프레임워크이다.
그러나 방법의 일부로 섀플리 샘플링 값과 거의 동일한 섀플리 값에 대한 샘플링 근사치를 독립적으로 제안한다. 따라서 이것은 또 다른 additive feature attribution method이다.

3. Simple Properties Uniquely Determine Additive Feature Attributions

AFAM의 class의 특성은 세 가지 property를 가진 single unique solution이 존재한다.
classical Shapley value estimation methods만 이런 특성을 만족한다.

Property 1. Local accuracy
$f(x) =g(x')=\phi_0+\sum_{i=1}^M\phi_ix_i'$
The explanation model $g(x')$ matches the original model $f(x)$ when $x = h_x(x')$

Property 2. Missingness
$x_i'=0 \Rightarrow \phi_i=0$
Missingness constrains features where $x'_i = 0$ to have no attributed impact.

Property 3. Consistency
Let $f_x(z')=f(h_x(z'))$ and $z'-i$ denote setting $z'_i=0$ . For any two models $f$ and $f'$ , if
$f_x'(z')-f_x'(z'-\{i\}) \geq f_x(z')-f_x(z'-\{i\}), \forall \; z'\in \{0,1\}^M$
for all inputs $z' \in \{0,1\}^M$ , then $\phi_i(f', x) \geq \phi_i(f, x)$

Theorem 1
Only one possible explanation model $g$ follows Definition 1 and satisfies Properties 1, 2, and 3
$\phi_i(f, x) = \sum_{z' \subset x'} \frac{|z'|!(M-|z'|-1)!}{M!}[f_x(z')-f_x(z'-\{i\})]$
$z'\subset x'$ : a subset of the non-zero entries in $x'$

Theorem 1은 combined cooperative game thery에서 나온 것이다.
$\phi_i$ 가 shapley value이다.
shapley value를 사용하지 않는 방법은 property 1, 3을 위반한다.

4. SHAP (SHapley Additive exPlanation) Values

SHAP Value: unified measure of feature importance

SHAP values provide the unique additive feature importance measure that adheres to Properties 1-3 and uses conditional expectations to define simplified inputs.

Implicit in this definition of SHAP values is a simplified input mapping, hx(z0) = zS, where zS has missing values for features not in the set S.

Since most models cannot handle arbitrary patterns of missing input values, we approximate f(zS) with E[f(z) | zS].
-> 결측값이 무엇을 의미하는 걸까용?

The exact computation of SHAP values is challenging.
SHAP 값의 정확한 계산은 어렵다.
However, by combining insights from current additive feature attribution methods, we can approximate them.
그러나 현재의 추가 기능 속성 방법의 통찰력을 결합하여 근사화할 수 있다.

4.1 Model-Agnostic Approximations

Kernel SHAP (Linear LIME + Shapley values)

Theorem 2 Shapley kernel

new!

Shapley sampling values

4.2 Model-Specific Approximations

While Kernel SHAP improves the sample efficiency of model-agnostic estimations of SHAP values, by restricting our attention to specific model types, we can develop faster model-specific approximation methods.
(커널 SHAP은 모델에 구애받지 않는 SHAP 값 추정의 샘플 효율성을 향상시키지만, 특정 모델 유형으로 주의를 제한함으로써 더 빠른 모델별 근사 방법을 개발할 수 있다.)