Remote Sensing Image Change Detection With Transformers 제1부

이준석·2022년 10월 21일

TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization

목록 보기

2/2

Remote Sensing Image Change Detection With Transformers
변압기를 사용한 원격 감지 이미지 변경 감지

Abstract

Modern change detection (CD) has achieved remarkable success by the powerful discriminative ability of deep convolutions. However, high-resolution remote sensing CD remains challenging due to the complexity of objects in the scene.
최신 CD(변경 감지)는 딥 컨볼루션의 강력한 판별 능력으로 놀라운 성공을 거두었습니다. 그러나 고해상도 원격 감지 CD는 장면에 있는 물체의 복잡성으로 인해 여전히 어렵습니다.

Objects with the same semantic concept may show distinct spectral characteristics at different times and spatial locations. Most recent CD pipelines using pure convolutions are still struggling to relate long-range concepts in space-time. Nonlocal self-attention approaches show promising performance via modeling dense relationships among pixels, yet are computationally inefficient.
동일한 의미 개념을 가진 객체는 다른 시간과 공간 위치에서 뚜렷한 스펙트럼 특성을 보일 수 있습니다. 순수 컨볼루션을 사용하는 가장 최근의 CD 파이프라인은 여전히 시공간에서 장거리 개념을 연관시키는 데 어려움을 겪고 있습니다. Nonlocal self-attention 방식은 픽셀 간의 조밀한 관계 모델링을 통해 유망한 성능을 보여주지만 계산적으로 비효율적입니다.

Here, we propose a bitemporal image transformer (BIT) to efficiently and effectively model contexts within the spatial-temporal domain. Our intuition is that the high-level concepts of the change of interest can be represented by a few visual words, that is, semantic tokens.
여기에서는 공간-시간 영역 내에서 컨텍스트를 효율적이고 효과적으로 모델링하기 위한 BIT(bitemporal image Transformer)를 제안합니다. 우리의 직관은 관심의 변화에 대한 높은 수준의 개념이 몇 개의 시각적 단어, 즉 의미론적 토큰으로 표현될 수 있다는 것입니다.

To achieve this, we express the bitemporal image into a few tokens and use a transformer encoder to model contexts in the compact token-based space-time. The learned context-rich tokens are then fed back to the pixel-space for refining the original features via a transformer decoder. We incorporate BIT in a deep feature differencing-based CD framework.
이를 달성하기 위해 우리는 bitemporal 이미지를 몇 개의 토큰으로 표현하고 변환기 인코더를 사용하여 컴팩트 토큰 기반 시공간에서 컨텍스트를 모델링합니다. 학습된 컨텍스트가 풍부한 토큰은 변환기 디코더를 통해 원래 기능을 개선하기 위해 픽셀 공간으로 피드백됩니다. 우리는 심층 기능 구분 기반 CD 프레임워크에 BIT를 통합합니다.

Extensive experiments on three CD datasets demonstrate the effectiveness and efficiency of the proposed method. Notably, our BIT-based model significantly outperforms the purely convolutional baseline using only three times lower computational costs and model parameters.
3개의 CD 데이터 세트에 대한 광범위한 실험은 제안된 방법의 효과와 효율성을 보여줍니다. 특히, 우리의 BIT 기반 모델은 계산 비용과 모델 매개변수를 3배만 사용하여 순전히 컨볼루션 기준을 훨씬 능가합니다.

Based on a naive backbone (ResNet18) without sophisticated structures (e.g., feature pyramid network (FPN) and UNet), our model surpasses several state-of-the-art CD methods, including better than four recent attention-based methods in terms of efficiency and accuracy. Our code is available at https://github.com/justchenhao/BIT_CD.
정교한 구조(예: FPN(Feature Pyramid Network) 및 UNet)가 없는 순진한 백본(ResNet18)을 기반으로 하는 우리 모델은 효율성과 정확성. 우리 코드는 https://github.com/justchenhao/BIT_CD에서 사용할 수 있습니다.

I. INTRODUCTION

이준석

인공지능 전문가가 될레요

이전 포스트

Remote Sensing Image Change Detection With Transformers 제1부

TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization

Abstract

I. INTRODUCTION

TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization 제1부

0개의 댓글