XAI | Explainable AI 의 개념과 분류

standing-o·2022년 8월 23일
0

XAI | Explainable AI 의 개념과 분류

  • 본 포스팅은 설명가능한 인공지능 (XAI) 의 개념과 분류 방법에 대한 내용을 포함하고 있습니다.
  • Keyword : XAI, CAM, LIME, RISE
  • 👉 Click

Supervised (deep) learning

  • Supervised (deep) learning has made a huge progress but deep learning models are extremely complex
    • End-to-end learning becomes a black-box
    • Problem happens when models applied to make critical decisions

What is explainability & Interpretability?

  • Interpretability is the degree to which a human can understand the cause of a decision
  • Interpretability is the degree to which a human can consistently predict the model's resutls.
  • An explanation is the answer to why-question.

Taxonomy of XAI methods

  • Local vs. Global
    • Local : describes an individual prediction
    • Global : describes entire model behavior
  • White-box vs. Black-box
    • White-box : explainer can assess the inside of model
    • Black-box : explainer can assess only the output
  • Intrinsic vs. Post-hoc
    • Intrinsic : restricts the model complexity before training
    • Post-hoc : Applies after the ML model is trained
  • Model specific vs. Model agnostic
    • Model-specific : some methods restricted to specific model classes
    • Model-agnostic : some methods can be used for any model

Examples

  • Linear model, simple decision tree

➔ Global, white-box, intrinsic, model-specific

  • Grad-CAM

➔ Local, white-box, post-hoc, model-agnostic

Simple Gradient method

  • Simple use the gradient as the explanation (importance)

    • Interpretation of f at x0 (for the i-th input/feature/pixel)
    Ri=(f(x)x0)iR_i=(\nabla{f(x)}|_{x_0})_i
    • Shows how sensitive a function value is to each input
  • Examples : the gradient maps are visualized for the highest-scoring class

  • Strength : easy to compute (via back-propagation)

  • Weakness : becomes noisy (due to shattering gradient problems)

SmoothGrad

  • A simple method to address the noisy gradients

    • Add some noise to the input and average
    smoothf(x)=EϵN(0,σ2I)[f(x+ϵ)]\nabla_{\text{smooth}}f(x)=\mathbb{E}_{\epsilon\sim\mathcal{N}(0,\sigma^{2}I)}[\nabla{f(x+\epsilon)}]
    • Averaging gradients of slightly perturbed input would smoothen the interpretation
    • Typical heuristics
      • Expectation is approximated with Monte-Carlo (around 50 runs)
      • σ is set to be 10~20% of xmax-xmin
  • Strength

    • Clearer interpretation via simple averaging
    • Applicable to most sensitive maps
  • Weakness

    • Computationally expensive

Class activation map (CAM)

  • Method
    • Upsample the CAM to match the size with the input image
    • Global average pooling (GAP) should be implemented before the softmax layer
  • Alternative view of CAM
    • The logit of the class c for CAM is represented by:

(GAP-FC model)

Yc=kwkc1ZijAijkY^c=\sum_k{w_k^c}\frac{1}{Z}\sum_{ij}{A^k_{ij}}
  • Result
    • CAM can localize objects in image
    • Segment the regions that have the value above 20% of the max value of the CAM and take the bounding box of it
  • Strength
    • It clearly shows what objects the model is looking at
  • Weakness
    • Model-specific: it can be applied only to models with limited architecture
    • It can only be obtained at the last convolutional layer and this makes the interpretation resolution coarse

Grad-CAM

  • Method
    • To calculate the channel-wise weighted sum, Grad-CAM substitute weights by average pooled gradient
  • Strength
    • Model agnostic: can be applied to various output models
  • Weakness
    • Average gradient sometimes is not accurate
  • Result
    • Debugging the training with Grad-CAM

LIME

  • Local interpretable model-agnostic explanations (LIME)
    • Can explain the predictions of any classfier by approximating it locally with an interpretable model
    • Model-agnostic, black-box
    • General overview of the interpretations
  • Perturb the super-pixels and obtain the local interpretation model near the given example
  • Explaining an image classification prediction made by Google's inception neural network
  • Strength
    • Black-box interpretation
  • Weakness
    • Computationally expensive
    • Hard to apply to certain kind of models
    • When the underlying model is still locally non-linear

RISE

  • Randomized input sampling for explanation (RISE)
    • Sub-sampling the input image via random masks
    • Record its response to each of the masked images
  • Comparison to LIME
    • The saliency of LIME is relied on super-pixels, which may not capture correct regions
  • Strength
    • Much clear saliency-map
  • Weakness
    • High computational complexity
    • Noisy due to sampling
  • RISE, sometimes, provides noisy importance maps
    • It is due to sampling approximation (Monte Carlo) expecially in presence of objects with varying sizes

Understanding black-box predictions via influence functions

  • Different approach for XAI
    • Identify most influential training data point for the given prediction
  • Influence function
    • Measure the effect of removing a training sample on the test loss value
  • Influence function-based explanation can show the difference between the models

Metrics

Human-based visual assessment

  • AMT (Amazon mechanical turk) test
    • Want to know: Can human predict a model prediction via interpretation?
  • Weakness
    • Obtaining human assessment is very expensive

Human annotation

  • Some metrics employ human annotation (localization and semantic segmentation) as a ground truth, and compare them with interpretation

    • Pointing game
    • Weakly supervised semantic segmentation
  • Pointing game

    • For given human annotated bounding box Bii=1,...,N{B^i}_{i=1,...,N} and interpretations hI=1,...,Nih^i_{I=1,...,N}, a mean accuracy of pointing game is defined by:
    Acc=1Ni=1N1[p(i)B(i)]Acc=\frac{1}{N}\sum^N_{i=1}1_{[p^{(i)}\in{B^{(i)}}]}
    • Where pip_i is a pixel s.t. pi=argmaxp(hpi)p_i=argmax_p(h_p^i)
    • 1[piBi]1_{[p^i∈B^i]} is an indicator function that the value is; if the pixel of highest interpretation score is loacted in the bounding box
  • Weakly supervised semantic segmentation

    • Setting : Pixel-level label is not given during training
    • This metric measures the mean IoU between interpretation and semantic segmentation label
  • Weakness

    • Hard to make the human annotations
    • Such localization and segmentation labels are not a ground truth of interpretation

Pixel perturbation

  • Motivation
    • If we remove an important area in image, the logit value for class would be decreased
  • AOPC (Area over the MoRF perturbation curve)
    • AOPC measures the decreases of logits from the replacement of the input patch in MoRF (most relevant first) order
AOPC=1L+1Exp(x)[k=0Lf(xMoRF(0)xMoRF(k))]AOPC=\frac{1}{L+1}\mathbb{E}_{x\sim{p(x)}}[\sum^L_{k=0}f(x^{(0)}_{MoRF}-x^{(k)}_{MoRF})]

where f is the logit for true label

  • Insertion and deletion
    • Both measure the AUC of each curve
      • In deletion curve, x axis is the percentage of the removed pixels in the MoRF order, and y axis is the class probability of the model
      • In insertion curve, x axis is the percentage of the recovered pixels in the MoRF order, starting grom gray image
  • Weakness
    • Violates one of the key assumptions in ML that the training and evaluation data come from the same distribution
    • The perturbed input data is different from the model of interest which is deployed and explained at test time
    • Perturbation can generate another feature for model, i.e., the model tends to predict perturbed input as Balloon

ROAR

  • ROAR removes some portion of pixels in train data in the order of high interpretation values of the original model, and retrains a new model
  • Weakness
    • Retraining everytime is computationally expensive

Sanity checks

Model randomization

  • Interpretation = Edge detector?
    • Some interpretation methods produce saliency maps that strikingly similar as the one created by edge detector
  • Model randomization test
    • This experiment randomly re-initialize the parameters in a cascading fashion or single independent layer fashion
    • Some interpretation does not sensitive to this randomization, i.e., Guided-backprop, LRP, and pattern attribution

Adversarial attack

  • Geometry is to blame
    • Proposed the adversarial attack on interpretation:
      L=h(xadv)ht2+γg(xadv)g(x)2\mathbb{L}=||h(x_{adv})-h^t||^2+\gamma||g(x_{adv})-g(x)||^2
    • Proposed a smoothing method to undo attack
      • Using a softplus activation with high beta can undo the interpretation attack
    • Provided theoretical bound of such attack
  • Results
    • In the right figure, the visualization of manipulated image is attacked with target interpretation ht.
    • For both gradient and LRP, the manipulated interpretation of for network with ReLU activation is similar as target interpretation, but the one with softplus is not manipulated.

Adversarial model manipulation

  • Adversarial model manipulation
    • Two models could produce totally different interpretations, while have similar accuracy.
  • Attack on the input
    • Negligible model accuracy drop
    • Fooling generalizes across validation set
    • Fooling transfers to different interpretations
    • AOPC analysis confirms true foolings

References

  • 본 포스팅은 LG Aimers 프로그램에 참가하여 학습한 내용을 기반으로 작성되었습니다. (전체내용 X)

LG Aimers 바로가기

[1] LG Aimers AI Essential Course Module 4.설명가능한 AI(Explainable AI), 서울대학교 문태섭 교수
profile
공부를 마무리 할 때까지 👉👉https://standing-o.github.io/👈👈

0개의 댓글