[논문리뷰]Attention U-Net: Learning Where to Look for the Pancreas

용권순·2023년 5월 3일

목록 보기

4/12

summary

Unet 구조에 Attention gate (AG)를 적용하여 의료 데이터 Segmentation에 좋은 성능을 제시한 논문

skip-connection에 AG를 적용

Latent vector를 Query로 skip connection을 key로 Attention을 사용한 방법으로, self-attention의 성질을 활용하였음

1x1x1 channel wise convolution

Structure

전체적인 구조는 위의 그림과 같다. Unet구조에서 Encoder에서 Decoder로 skip connection을 진행 할 때, Attention Gate라는 부분이 생겼다. AG가 어떻게 동작하는지를 중점적으로 살펴보자.

Attention Gate (AG)

우선 본 논문에서는 의료 영상분석에서 새로운 self-attention gating module을 소개한다고 하는데, self attention이 무엇일까? Attention is all you need 제시한 Self-attention의 개념은 Q,K,V가 같은 vector로 출발하였기에 self라는 개념을 붙였다. (seq2seq에서의 attention은 K,V가 encoder의 hidden state matrix, Q가 Decoder의 hidden state이다)

self-attention의 Q,K,V 같은 vector $x_i$ 를 사용하여 연산한다.

그렇다면 왜 Attention Unet의 AG가 Self-attention인가? 전체적인 구조와 밑의 그림을 보면

$\mathcal{g}$ : gating signal(그 전 layer의 latent vector이다)
$x^l$ : skip connection이다.
생각해보면, gating signal $g$ 는 $x^l$ 의 연산(Conv-> maxpool-> UpConv)으로부터 온 것이고, 다시 $x^l$ 을 사용하여 attention을 구하므로 self-attention의 관점으로 볼 수 있다.

다른 논문에서는 2개의 vector를 연결하여 한개의 vector로 만들어 연산하는 과정을 vector concatenation-based attention이라고 정의한다.

위의 그림을 수식으로 표현해보자.

notation을 먼저 정리를 해보자면 ,
$\sigma_1(x) = max(0,x)$ : Relu
$\sigma_2(x) = \frac{1}{1+exp(-x_{i,c})}$ : sigmoid
$x_i^l =\sigma_1(\sum_{i'\in F_l}x^{l-1}_i*k_{i,i'})$ : Convolution output with activation function
$x_i^l\in \mathbb{R}^{l\times N} N$ : batch size
$\psi\in \mathbb{R}^{F_{int}\times1}, \quad W_x \in \mathbb{R}^{F_l\times F_{int}} , W_g \in \mathbb{R}^{F_g\times F_{int}}$ : Linear leanable parameter
$b_g\in \mathbb{R},b_\psi \in \mathbb{R}^{F_{int}}$ bias term
$\theta_{att}$ : Symbol of learnable parameter $(\psi,W_x, W_g ,b_g,b_\psi)$
where $F_{int}, F_g, F_x, F_l$ : filter(channel) dimension 즉, image의 width, height등에 대해서 연산한 것이 아니라 filter에 대해서 연산을 진행한 것을 명심해야한다.

(1)부터 확인해보면,skip connection으로 부터 온 $x_i^l$ 과 gating signal(Latent vector) $g_i$ 를 각각 dot product (attention Score를 구함)를 진행한 다음, Learnable parameter $\psi$ 에 대해서 dot product를 진행한다(attention value의 개념)
이때 $q_{att}^l \in \mathbb{R}^N$ 이다. 즉, 모든 Sample에 대해서 각각 다른 Attention value를 가지게 되고 ,AG연산은 Filter에 대해서 진행하였으므로, $q_{att} \in \mathbb{R}^{N\times1\times W\times H}$ 가 된다. 즉, Image한장에 대하여 Pixel별로 attention value가 존재한다는 것
이를 두고 논문에서는 channel-wise $1 \times 1\times 1$ Convolution이라고 명칭한다.

(2)는 (1)에 sigmoid를 씌워 0~1로 만들어 준 값 $\alpha_i^l$ 과 skip-connection $x^l_i$ 을 elementwise연산을 진행한다.
즉, $\hat{x^l_i} = \alpha_i^l \otimes x_i^l$
※ $\alpha$ 가 sigmoid의 output이므로, $\hat{x^l_i}$ 는 attention으로 찾아낸 의미있는 값에 더 집중하게 된다

backpropagation

편미분 방정식은 추후에 정리를 해보자

용권순

수학계산학부 석사생입니다.

이전 포스트

[논문리뷰]Attention Unet++: A Nested Attention-Aware U-Net for Liver CT Image Segmentation

다음 포스트

[논문리뷰]PointNet: Deep Learning on Point Sets for 3D classification and segmentation

1개의 댓글

김고은

2023년 7월 27일

유익한 글 감사합니다 :) 혹시 편미분 방정식에 관한 내용은 정리 중이신가요?

답글 달기

[논문리뷰]Attention U-Net: Learning Where to Look for the Pancreas

논문

summary

Structure

Attention Gate (AG)

위의 그림을 수식으로 표현해보자.

backpropagation

[논문리뷰]Attention Unet++: A Nested Attention-Aware U-Net for Liver CT Image Segmentation

[논문리뷰]PointNet: Deep Learning on Point Sets for 3D classification and segmentation

1개의 댓글

관련 채용 정보