📦 LimitNet: Progressive, Content-Aware Image Offloading for Extremely Weak Devices & Networks

Bard·2025년 6월 25일

RTCL

목록 보기

15/15

풀고자 하는 문제는 Progressive Neural Compression에서의 문제와 동일합니다.
저전력 무선 네트워크 (LPWAN) 환경에서, 이미지 전송을 위한 제한된 대역폭과 높은 지연을 고려하여, 이미지의 중요한 특징을 우선적으로 전송하고, 덜 중요한 특징은 나중에 전송하는 것을 목표로 합니다.

Lightweight saliency detector와 새로운 Gradual Scoring 메커니즘을 통합한 progressive content-aware encoder인 LimitNet을 제안합니다.
LimitNet은 단 15K 파라미터로 이루어진 매우 가벼운 이미지 인코더이며, ARM Cortex-M 시리즈와 같은 극도로 제한된 디바이스에서도 실행될 수 있습니다.
ARM Cortex-M33과 M7에서 classification과 object detection에 대한 성능을 측정하고, RAM, Flash, CPU 사용량, 전력 사용량 등을 평가하여 SOTA와 비교합니다.

본 논문에서는 가벼운 encoder $ENC_{\theta_{ENC}}$ 와 크고 깊은 decoder $DEC_{\theta_{DEC}}$ 로 이루어진 비대칭 autoencoder를 사용합니다.

X^{C \times H \times W} \to \text{ENC}_{\theta_{\text{ENC}}}(X^{C \times H \times W}) \to Z^{L \times K \times K}\tag{1}

Saliency detection은 이미지에서 중요한 특징을 식별하는 데 사용됩니다.
그러나 embedded 디바이스에서는 ROI detection, explainable ai, saliency detection같은 복잡한 모델을 실행하기 어렵습니다.
따라서 본 논문에서는 이 복잡한 saliency detector의 output을 흉내내는 가벼운 모델을 사용합니다.
4레이어로 이루어진 $SalDet_{\theta_{SalDet}}$ 를 사용하고, saliency detection model의 SOTA 중 하나인 BASNet로 지식증류를 수행합니다.
이 가지는 잠재 텐서 $Z^{L \times K \times K}$ 를 입력받아, saliency map $I^{K\times K}$ 를 출력합니다. $Z^{L \times K \times K} \rightarrow \text{SalDet}_{\theta_{\text{SalDet}}}(Z^{L \times K \times K}) \rightarrow I^{K \times K} \tag{2}$
이 saliency detector는 오직 5K 파라미터만 사용합니다.
이는 원래 saliency detection model의 0.001%에 불과합니다.

Saliency based scoring의 경우 중요한 부분(Foreground)에만 집중하여 먼저 보내지기 때문에, decoder를 위한 contextual information이 부족합니다.
이를 효율적으로 이용하기 위해 Gradual Scoring을 제안합니다.
Gradual Scoring을 적용한 경우 a3 기준으로 40%p나 좋은 성능을 보여줍니다.

I^{K \times K} \rightarrow \text{GS}(I^{K \times K}) \rightarrow S^{L \times K \times K} \tag{3}

Gradual Scoring mechanism은 $G_{Factor}$ 를 도입하여, 점점 감소하는 점수를 saliency map에 적용합니다.

S_{i,[0:K],[0:K]} = 𝐼_{[0:K],[0:K]} + G_{Factor} × i, \quad ∀i ∈ \{0, 1, 2, . . . , L\} \top{4}

이는 학습 시에도 적용되며, 0에서 100사이 랜덤한 $p$ 를 뽑고, top- $p$ %만큼을 제외한 latent data를 0으로 채워넣습니다.

Z^{L \times K \times K}, S^{L \times K \times K} \xrightarrow{\text{Dropping}} Z'^{L \times K \times K} \tag{5}

Z'_{i,j,k} = \begin{cases} Z_{i,j,k} & \text{ if } S_{i,j,k} \geq p^{\text{th}}\text{ largest value of } S^{L \times K \times K} \\ 0 & \text{otherwise} \end{cases} \tag{6}

인코딩 후, latent data를 6비트로, saliency map은 5비트로 양자화합니다.
그리고, saliency map을 encoded data를 보내기전 먼저 보냅니다. (이 map은 최대 40B로, 오버헤드는 무시할만 합니다.)
그리고, decoder와 classifier는 아래와 같이 수행됩니다.