Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors(TCAV) Review

AIDuck·2022년 5월 6일

Paper_Review

목록 보기

5/9

Abstract

Traditional Convolutional Neural Network operate on low-level features. The problem is that low-level features are not human friendly. To handel these problem, author suggest Concept Activation Vectors $(CAV)$ and Testings with $CAV$ s $(TCAV)$ method.

Introduction

Most ML models operate on low-level features, such as pixel values, that do not correspond to high-level concepts that humans easily understood. Also, a model's internal values, such as neural activations, can seem incomprehensible. To handle these problem, author first express this problem as a mathematically. Machine Learning model's vector space is denoted $E_m$ spanned by basis vector $e_m$ . And Human's understandable vector space is denotted as $E_h$ spanned by basis vector $e_h$ . From this point, an "interpretation" of an ML model can be seen as function $g : E_m \rightarrow E_h$

This paper show us new concept : $Concept\ Activation\ Vector(CAV)$ as a way of translating beetween $E_m$ and $E_h$ . In other words, $CAV$ is that make low-level features to human understandable concepts. After generating $CAV$ s, then check that how such image much using $CAV$ for prediction. This method called Testing with Concept Activation Vectors $(TCAV)$ . $TCAV$ was pursed with the following goals.

$Accessibility$ : Requires little to no ML expertise of user.
$Customization$ : Adapts to any concept ans is not limited to concepts considered during training.
$Plug-in readiness$ : Works without any retraining or modification of the ML model.
$Global quantification$ : Can interpret entire classes or sets of examples with a single quantiatative measure, and not just explain individual data inputs.

Backgrounds

There are two ways to interpret deep neural networks. First is only use interpretable models. And second is post-process our models in way that yields insights. With increasing demands for more explainable ML, there is an growing need for methods that can be applied without retraining or modifying the network. $TCAV$ is capable of interpreting networks without modifying them.

Saliency methods are one of the most popular local explanation methods for image classification. However, Saliency map method shows limitation when compare two diffrent sailency maps. If one of cat image shows cat's ear much brighter then other picture, could we assess how important the ears were in the prediction of "cats"?

Linear combinations of neurons include meaningful and insightful information. $TCAV$ extends this idea and computes directional derivatives along these learned directions in order to gather the importance of each direction for a model's prediction.

Methods

In order to understand the $CAV$ method, a concept that users can understand must be defined first. As a concept, edge information such as color or texture of an image can be used. For example, when analyzing a picture of a zebra, we can use the concept of stripes. A set of data that conforms to a specific concept is called $P_C$ . And the set of random data different from the concept is called $N$ .

In order to separate these two concepts for a specific network, the distribution of vectors of $P_C$ and $N$ should be divided in the $l$ th layer. Learn a linear classifier that divides the distribution of these two vectors, and define a direction orthogonal to the linear classifier as $CAV(v^l_C)$ . Here, the orthogonal direction defines the direction of $P_C$ as the positive direction of the linear classifier.

Through the above process, the vector $CAV(v^l_C)$ that classifies the specific concept we are interested in can be obtained.

The new linear interpretation method using $CAV$ presented here is called $TCAV$ (Testing with Conceptual Activation Vector). $TCAV$ calculates the prediction sensitivity of the model for the defined $P_C$ concept using the directional derivative.

The saliency map method calculates the effect of each pixel point on logit $h_k(x)$ and uses the following formula.

\frac{\partial h_k(x)}{\partial x_{a,b}}

Similarly, Conceptual Sensitivity $S_{C,k,l}(x)$ through $CAV$ is obtained as follows.

S_{C,k,l}(x) \ = \ \ \lim_{\epsilon \to 0} \frac{h_{l,k}(f_l(x)\ + \ \epsilon v^l_C) \ - \ h_{l,k}(f_l(x))}{\epsilon} =\nabla h_{l,k}(f_l(x)) \cdot v^l_C

Through the dot product between $h_{l,k}(f_l(x))$ and $v^l_C$ obtained after passing the input image through the network, $S_{C,l,k}(x)$ between the concept $C$ and the input image k}(x)$ can be found.

And when $S_{C,l,k}(x)$ obtained in this way is 'striped' and class $k$ is 'zebra', $X_k$ is the entire input image related to class $k$ . After finding $S_{C,l,k}(x)$ for all the related images, the total $TCAV$ can be obtained with a positive value over 0.

\textrm{TCAV}_{Q_{C,k.l}} = \frac{|\{x \in X_k : S_{C,k,l}(x) > 0\}|}{|X_k|}

If the ratio of Conceptual Sensitivity data related to concept $C$ is obtained from this $X_k$ data, the global influence of concept $C$ on the label can be calculated.

Results

We can classify images by concept using $CAV$ and cosine similarity. In Figure 2 above, the concept of "Stripes" is classified using $CAV$ learned with the concept of "CEO". If you check the picture, you can see that the most similar striped images have patterns suitable for the suit or tie that the CEO will use, and the least similar striped images have patterns that are unlikely to match the CEO.

The right side is the result of classifying "Necktie" using $CAV$ trained as "Model Women". Again, it can be seen that a woman wearing a tie appears in the most similar necktie images, and a man wearing a tie appears in the least similar images.

You can also use $CAV$ to check your Empirical Deepdream. Empirical Deepdream is a method of optimizing the pattern that activates $CAV$ as much as possible and comparing it with the semantic concept of the concept. In Figure 3 above, the first image is "Knitted Texture", the second image is "Corgis", and the last image is Deepdream using "Siberian Huskey". From the above figures, it can be seen that $CAV$ can define and visualize features or patterns within an image.

$TCAV$ confirms that the results we thought are important. For example, when considering a fire engine, it can be confirmed that the expected result and the actual result that red is likely to act as an important cup concept are the same. $TCAV$ not only obtains a relative value for each concept, but also shows a result that is sensitive to gender and race even without explicit training. If you look at the ping-pong ball in the picture above, you can see that the eastasian has a higher $TCAV$ value than the african or latino. It can also be seen that Apron has a higher $TCAV$ value in women than in Caucasians or babies.

Figure 5 is a graph organized about which layer each concept is well learned. It can be seen that high level concepts are learned well in the rear layer, and low level concepts such as colors and patterns are learned well in the front layer.

Conclusion

Using a new concept called $CAV$ , $TCAV$ , which calculates how much influence the concept has on predicting the model's results, numerically tells the result of how well the appropriate domain concept was selected. The $TCAV$ method can be considered to have adequate interpretability as it provides an explanation that is convincing enough even for people who have not majored in artificial intelligence.

Own Review

This paper presented a new concept to explain what part of the model's prediction process is based on which prediction is made. I think this is a very well-written thesis in that it expands new concepts with appropriate logic and allows for quantitative calculation of ambiguous concepts. In addition, this paper demonstrated the validity of the logic by presenting appropriate experimental results to support the logic. Personally, I was most impressed with the way the logic was developed so that the reader could fully understand it using the concept of $CAV$ .

References

[1] Been Kim, Martin Wattenberg, Justin Glimer, Carrie Cai, James Wexler, Fernanda Viegas and Rory Sayres. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors(TCAV)

AIDuck

머신러닝/딥러닝/컴퓨터비전

이전 포스트

Distilling the Knowledge in a Neural Network 논문 리뷰

다음 포스트