Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors(TCAV) Review

AIDuck·2022년 5월 6일
0

Paper_Review

목록 보기
5/9

Abstract

Traditional Convolutional Neural Network operate on low-level features. The problem is that low-level features are not human friendly. To handel these problem, author suggest Concept Activation Vectors(CAV)(CAV) and Testings with CAVCAVs(TCAV)(TCAV) method.

Introduction

Most ML models operate on low-level features, such as pixel values, that do not correspond to high-level concepts that humans easily understood. Also, a model's internal values, such as neural activations, can seem incomprehensible. To handle these problem, author first express this problem as a mathematically. Machine Learning model's vector space is denoted EmE_m spanned by basis vector eme_m. And Human's understandable vector space is denotted as EhE_h spanned by basis vector ehe_h. From this point, an "interpretation" of an ML model can be seen as function g:EmEhg : E_m \rightarrow E_h

This paper show us new concept : Concept Activation Vector(CAV)Concept\ Activation\ Vector(CAV) as a way of translating beetween EmE_m and EhE_h. In other words, CAVCAV is that make low-level features to human understandable concepts. After generating CAVCAVs, then check that how such image much using CAVCAV for prediction. This method called Testing with Concept Activation Vectors(TCAV)(TCAV). TCAVTCAV was pursed with the following goals.

AccessibilityAccessibility : Requires little to no ML expertise of user.
CustomizationCustomization : Adapts to any concept ans is not limited to concepts considered during training.
PluginreadinessPlug-in readiness : Works without any retraining or modification of the ML model.
GlobalquantificationGlobal quantification : Can interpret entire classes or sets of examples with a single quantiatative measure, and not just explain individual data inputs.

Backgrounds

There are two ways to interpret deep neural networks. First is only use interpretable models. And second is post-process our models in way that yields insights. With increasing demands for more explainable ML, there is an growing need for methods that can be applied without retraining or modifying the network. TCAVTCAV is capable of interpreting networks without modifying them.

Saliency methods are one of the most popular local explanation methods for image classification. However, Saliency map method shows limitation when compare two diffrent sailency maps. If one of cat image shows cat's ear much brighter then other picture, could we assess how important the ears were in the prediction of "cats"?

Linear combinations of neurons include meaningful and insightful information. TCAVTCAV extends this idea and computes directional derivatives along these learned directions in order to gather the importance of each direction for a model's prediction.

Methods

In order to understand the CAVCAV method, a concept that users can understand must be defined first. As a concept, edge information such as color or texture of an image can be used. For example, when analyzing a picture of a zebra, we can use the concept of stripes. A set of data that conforms to a specific concept is called PCP_C. And the set of random data different from the concept is called NN.

In order to separate these two concepts for a specific network, the distribution of vectors of PCP_C and NN should be divided in the llth layer. Learn a linear classifier that divides the distribution of these two vectors, and define a direction orthogonal to the linear classifier as CAV(vCl)CAV(v^l_C). Here, the orthogonal direction defines the direction of PCP_C as the positive direction of the linear classifier.

Through the above process, the vector CAV(vCl)CAV(v^l_C) that classifies the specific concept we are interested in can be obtained.

The new linear interpretation method using CAVCAV presented here is called TCAVTCAV(Testing with Conceptual Activation Vector). TCAVTCAV calculates the prediction sensitivity of the model for the defined PCP_C concept using the directional derivative.

The saliency map method calculates the effect of each pixel point on logit hk(x)h_k(x) and uses the following formula.

hk(x)xa,b\frac{\partial h_k(x)}{\partial x_{a,b}}

Similarly, Conceptual Sensitivity SC,k,l(x)S_{C,k,l}(x) through CAVCAV is obtained as follows.

SC,k,l(x) =  limϵ0hl,k(fl(x) + ϵvCl)  hl,k(fl(x))ϵ=hl,k(fl(x))vClS_{C,k,l}(x) \ = \ \ \lim_{\epsilon \to 0} \frac{h_{l,k}(f_l(x)\ + \ \epsilon v^l_C) \ - \ h_{l,k}(f_l(x))}{\epsilon} =\nabla h_{l,k}(f_l(x)) \cdot v^l_C

Through the dot product between hl,k(fl(x))h_{l,k}(f_l(x)) and vClv^l_C obtained after passing the input image through the network, SC,l,k(x)S_{C,l,k}(x) between the concept CC and the input image k}(x)$ can be found.

And when SC,l,k(x)S_{C,l,k}(x) obtained in this way is 'striped' and class kk is 'zebra', XkX_k is the entire input image related to class kk. After finding SC,l,k(x)S_{C,l,k}(x) for all the related images, the total TCAVTCAV can be obtained with a positive value over 0.

TCAVQC,k.l={xXk:SC,k,l(x)>0}Xk\textrm{TCAV}_{Q_{C,k.l}} = \frac{|\{x \in X_k : S_{C,k,l}(x) > 0\}|}{|X_k|}

If the ratio of Conceptual Sensitivity data related to concept CC is obtained from this XkX_k data, the global influence of concept CC on the label can be calculated.

Results

We can classify images by concept using CAVCAV and cosine similarity. In Figure 2 above, the concept of "Stripes" is classified using CAVCAV learned with the concept of "CEO". If you check the picture, you can see that the most similar striped images have patterns suitable for the suit or tie that the CEO will use, and the least similar striped images have patterns that are unlikely to match the CEO.

The right side is the result of classifying "Necktie" using CAVCAV trained as "Model Women". Again, it can be seen that a woman wearing a tie appears in the most similar necktie images, and a man wearing a tie appears in the least similar images.

You can also use CAVCAV to check your Empirical Deepdream. Empirical Deepdream is a method of optimizing the pattern that activates CAVCAV as much as possible and comparing it with the semantic concept of the concept. In Figure 3 above, the first image is "Knitted Texture", the second image is "Corgis", and the last image is Deepdream using "Siberian Huskey". From the above figures, it can be seen that CAVCAV can define and visualize features or patterns within an image.

TCAVTCAV confirms that the results we thought are important. For example, when considering a fire engine, it can be confirmed that the expected result and the actual result that red is likely to act as an important cup concept are the same. TCAVTCAV not only obtains a relative value for each concept, but also shows a result that is sensitive to gender and race even without explicit training. If you look at the ping-pong ball in the picture above, you can see that the eastasian has a higher TCAVTCAV value than the african or latino. It can also be seen that Apron has a higher TCAVTCAV value in women than in Caucasians or babies.

Figure 5 is a graph organized about which layer each concept is well learned. It can be seen that high level concepts are learned well in the rear layer, and low level concepts such as colors and patterns are learned well in the front layer.

Conclusion

Using a new concept called CAVCAV, TCAVTCAV, which calculates how much influence the concept has on predicting the model's results, numerically tells the result of how well the appropriate domain concept was selected. The TCAVTCAV method can be considered to have adequate interpretability as it provides an explanation that is convincing enough even for people who have not majored in artificial intelligence.

Own Review

This paper presented a new concept to explain what part of the model's prediction process is based on which prediction is made. I think this is a very well-written thesis in that it expands new concepts with appropriate logic and allows for quantitative calculation of ambiguous concepts. In addition, this paper demonstrated the validity of the logic by presenting appropriate experimental results to support the logic. Personally, I was most impressed with the way the logic was developed so that the reader could fully understand it using the concept of CAVCAV.

References

[1] Been Kim, Martin Wattenberg, Justin Glimer, Carrie Cai, James Wexler, Fernanda Viegas and Rory Sayres. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors(TCAV)

profile
머신러닝/딥러닝/컴퓨터비전

0개의 댓글