XAI | Explainable AI 의 개념과 분류
- 본 포스팅은 설명가능한 인공지능 (XAI) 의 개념과 분류 방법에 대한 내용을 포함하고 있습니다.
- Keyword : XAI, CAM, LIME, RISE
- 👉
Click
Supervised (deep) learning
- Supervised (deep) learning has made a huge progress but deep learning models are extremely complex
- End-to-end learning becomes a black-box
- Problem happens when models applied to make critical decisions
What is explainability & Interpretability?
- Interpretability is the degree to which a human can understand the cause of a decision
- Interpretability is the degree to which a human can consistently predict the model's resutls.
- An explanation is the answer to why-question.
Taxonomy of XAI methods
- Local vs. Global
- Local : describes an individual prediction
- Global : describes entire model behavior
- White-box vs. Black-box
- White-box : explainer can assess the inside of model
- Black-box : explainer can assess only the output
- Intrinsic vs. Post-hoc
- Intrinsic : restricts the model complexity before training
- Post-hoc : Applies after the ML model is trained
- Model specific vs. Model agnostic
- Model-specific : some methods restricted to specific model classes
- Model-agnostic : some methods can be used for any model
Examples
- Linear model, simple decision tree
➔ Global, white-box, intrinsic, model-specific
➔ Local, white-box, post-hoc, model-agnostic
Simple Gradient method
-
Simple use the gradient as the explanation (importance)
- Interpretation of f at x0 (for the i-th input/feature/pixel)
Ri=(∇f(x)∣x0)i
- Shows how sensitive a function value is to each input
-
Examples : the gradient maps are visualized for the highest-scoring class
-
Strength : easy to compute (via back-propagation)
-
Weakness : becomes noisy (due to shattering gradient problems)
SmoothGrad
Class activation map (CAM)
- Method
- Upsample the CAM to match the size with the input image
- Global average pooling (GAP) should be implemented before the softmax layer
- Alternative view of CAM
- The logit of the class c for CAM is represented by:
(GAP-FC model)
Yc=k∑wkcZ1ij∑Aijk
- Result
- CAM can localize objects in image
- Segment the regions that have the value above 20% of the max value of the CAM and take the bounding box of it
- Strength
- It clearly shows what objects the model is looking at
- Weakness
- Model-specific: it can be applied only to models with limited architecture
- It can only be obtained at the last convolutional layer and this makes the interpretation resolution coarse
Grad-CAM
- Method
- To calculate the channel-wise weighted sum, Grad-CAM substitute weights by average pooled gradient
- Strength
- Model agnostic: can be applied to various output models
- Weakness
- Average gradient sometimes is not accurate
- Result
- Debugging the training with Grad-CAM
LIME
- Local interpretable model-agnostic explanations (LIME)
- Can explain the predictions of any classfier by approximating it locally with an interpretable model
- Model-agnostic, black-box
- General overview of the interpretations
- Perturb the super-pixels and obtain the local interpretation model near the given example
- Explaining an image classification prediction made by Google's inception neural network
- Strength
- Weakness
- Computationally expensive
- Hard to apply to certain kind of models
- When the underlying model is still locally non-linear
RISE
- Randomized input sampling for explanation (RISE)
- Sub-sampling the input image via random masks
- Record its response to each of the masked images
- Comparison to LIME
- The saliency of LIME is relied on super-pixels, which may not capture correct regions
- Strength
- Weakness
- High computational complexity
- Noisy due to sampling
- RISE, sometimes, provides noisy importance maps
- It is due to sampling approximation (Monte Carlo) expecially in presence of objects with varying sizes
Understanding black-box predictions via influence functions
- Different approach for XAI
- Identify most influential training data point for the given prediction
- Influence function
- Measure the effect of removing a training sample on the test loss value
- Influence function-based explanation can show the difference between the models
Metrics
Human-based visual assessment
- AMT (Amazon mechanical turk) test
- Want to know: Can human predict a model prediction via interpretation?
- Weakness
- Obtaining human assessment is very expensive
Human annotation
-
Some metrics employ human annotation (localization and semantic segmentation) as a ground truth, and compare them with interpretation
- Pointing game
- Weakly supervised semantic segmentation
-
Pointing game
- For given human annotated bounding box Bii=1,...,N and interpretations hI=1,...,Ni, a mean accuracy of pointing game is defined by:
Acc=N1i=1∑N1[p(i)∈B(i)]
- Where pi is a pixel s.t. pi=argmaxp(hpi)
- 1[pi∈Bi] is an indicator function that the value is; if the pixel of highest interpretation score is loacted in the bounding box
-
Weakly supervised semantic segmentation
- Setting : Pixel-level label is not given during training
- This metric measures the mean IoU between interpretation and semantic segmentation label
-
Weakness
- Hard to make the human annotations
- Such localization and segmentation labels are not a ground truth of interpretation
Pixel perturbation
- Motivation
- If we remove an important area in image, the logit value for class would be decreased
- AOPC (Area over the MoRF perturbation curve)
- AOPC measures the decreases of logits from the replacement of the input patch in MoRF (most relevant first) order
AOPC=L+11Ex∼p(x)[k=0∑Lf(xMoRF(0)−xMoRF(k))]
where f is the logit for true label
- Insertion and deletion
- Both measure the AUC of each curve
- In deletion curve, x axis is the percentage of the removed pixels in the MoRF order, and y axis is the class probability of the model
- In insertion curve, x axis is the percentage of the recovered pixels in the MoRF order, starting grom gray image
- Weakness
- Violates one of the key assumptions in ML that the training and evaluation data come from the same distribution
- The perturbed input data is different from the model of interest which is deployed and explained at test time
- Perturbation can generate another feature for model, i.e., the model tends to predict perturbed input as Balloon
ROAR
- ROAR removes some portion of pixels in train data in the order of high interpretation values of the original model, and retrains a new model
- Weakness
- Retraining everytime is computationally expensive
Sanity checks
Model randomization
- Interpretation = Edge detector?
- Some interpretation methods produce saliency maps that strikingly similar as the one created by edge detector
- Model randomization test
- This experiment randomly re-initialize the parameters in a cascading fashion or single independent layer fashion
- Some interpretation does not sensitive to this randomization, i.e., Guided-backprop, LRP, and pattern attribution
Adversarial attack
- Geometry is to blame
- Proposed the adversarial attack on interpretation:
L=∣∣h(xadv)−ht∣∣2+γ∣∣g(xadv)−g(x)∣∣2
- Proposed a smoothing method to undo attack
- Using a softplus activation with high beta can undo the interpretation attack
- Provided theoretical bound of such attack
- Results
- In the right figure, the visualization of manipulated image is attacked with target interpretation ht.
- For both gradient and LRP, the manipulated interpretation of for network with ReLU activation is similar as target interpretation, but the one with softplus is not manipulated.
Adversarial model manipulation
- Adversarial model manipulation
- Two models could produce totally different interpretations, while have similar accuracy.
- Attack on the input
- Negligible model accuracy drop
- Fooling generalizes across validation set
- Fooling transfers to different interpretations
- AOPC analysis confirms true foolings
References
- 본 포스팅은
LG Aimers
프로그램에 참가하여 학습한 내용을 기반으로 작성되었습니다. (전체내용 X)
➔ LG Aimers
바로가기
[1] LG Aimers AI Essential Course Module 4.설명가능한 AI(Explainable AI), 서울대학교 문태섭 교수