[논문 리뷰] CPO: Change Robust Panorama to Point Cloud Localization

김경준·2022년 8월 24일

논문

목록 보기

30/37

변화가 있는 3D map을 매번 최신 정보로 업데이트하는 것은 많은 비용이 들기 때문에 이러한 변화에도 robust한 localization을 하는 모델이 필수적이다.
따라서, 본 논문에서는 regional color distribution을 이용하여 빠르고 변화에 robust한 알고리즘을 제안한다.

Point cloud $P=\{X,C\}$ 가 주어져있을 때 query panorama image의 camera pose(rotation, translation)을 추정하고자 한다.
전체적인 흐름은 다음과 같다.
1. Point cloud의 다양한 pose에서의 color histogram과 query image의 color histogram을 비교한다.
2. 두 histogram의 consistency value를 2D score map $M_{2D} \in \mathbb{R}^{N \times 1}$ 과 3D score map $M_{3D} \in \mathbb{R}^{N \times 1}$ 로 기록한다.
3. Score map을 통해 선택한 Candidate poses를 refine한 후 최적의 pose를 찾는다.

Illumination의 변화나 camera white balance에 의한 color distribution 차이를 보정하기 위해 color histogram matching을 활용해 전처리 해준다.
전처리 후에 query image와 point cloud의 synthetic projection의 동일한 patch에 대한 color histogram을 만들어 비교한다.
Synthetic projection은 camera pose에 따라 아주 다양하게 만들 수 있기 때문에 다른 view에서의 pre-computed histogram을 재사용하여 연산량을 줄인다.
새로운 view $I_n$ 의 patch centroid $c_i^n$ 을 original view $I_o$ 에 projection한 $p_i$ 가 $c_*$ 와 가장 가까우므로 patch $c_*$ 의 histogram으로 추정할 수 있다.
$R_{rel}, t_{rel}$ 은 relative pose, $\Pi^{-1}(\cdot):\mathbb{R}^2 \rightarrow\mathbb{R}^3$ 은 2D coordinate을 3D coordinate으로 mapping하는 inverse projection을 의미한다.
$\mathcal{S}_o=\{S_i^o\}$ 가 $I_o$ 의 image patches를, $\mathcal{C}_o=\{c_i^o\}$ 가 patch centroids를 의미하고 $I_n$ 에 대해서도 동일하다 할 때 $S_i^n$ 의 color histogram은 $p_i$ 와 가장 가까운 $I_o$ 의 patch centroid의 color histogram으로 할당된다.
$c_* = arg min_{_c\in\mathcal{c}_o} ||c -p_i||_2$

Query image와 synthetic view의 patch color histogram을 비교하여 2D score map과 3D score map을 만들고 aggregation 한다.
Query imaged와 synthetic view의 patch를 각각 $\mathcal{S}_Q=\{S_i^Q\}, \mathcal{S}_Y=\{S_i^Y\}$ 로 정의하고 patch $i$ 에서의 color histogram은 각 채널에 대해서 $B$ 개의 bins로 나타낸다.
$h_i(\cdot):\mathbb{R}^{H \times W\times 3} \rightarrow \mathcal{S}_i \rightarrow\mathbb{R}^{B\times3}$

2D score map $M_{2D}$ 는 동일 위치의 patch에 대한 maximum histogram intersection으로 계산한다.
$M_{2D}$ 에서 patch들의 score $\mathcal{M}=\{M_i\}$ 은 다음과 같이 표현할 수 있다.
Scene change가 있는 경우 2D score는 모든 synthetic view에 대해 낮은 점수를 가지게 될 것이기 때문에 3D score map을 함께 활용한다.

3D score map은 각 point의 color를 비교하여 측정하며 이를 back-projection하여 point cloud location으로 보낸다.
주어진 synthetic view $Y\in\mathcal{Y}$ 에 대해 $B_Y\in\mathbb{R}^N$ 은 $Y$ 와 $I_Q$ 사이의 patch-based intersection score를 의미하며 point들의 평균 값이 3D score map이 된다.

Pose를 optimize 하기 전에 initial pose를 color distribution을 통해 효율적으로 선택하는 과정이 필요하며 다음과 같이 진행된다.
1. $N_t$ 개의 3D location을 골라 $N_t$ 개의 synthetic view를 rendering 한다.
2. $N_t \times N_r$ 개의 pose 중 histogram intersection이 가장 큰 $K$ 개의 candidate pose를 찾는다. Rotation은 uniform하게 sampling하며 rendering한 single view로부터 patch-wise histogram을 얻는다.
$N_t \times N_r$ poses에 대해 histogram intersection이 작은 patch는 scene change가 있을 확률이 높으므로 작은 가중치를 주어 계산하며 query image $I_Q$ 와 synthetic view $Y$ 의 유사도는 weighted score의 합으로 표현된다.

$K$ 개의 pose에 대해 weighted sampling loss를 적용하여 optimize 한다.
$\Pi$ 는 point cloud를 2D에 mapping하는 함수, $\Gamma$ 는 2D coordinate를 $I_Q$ 로부터 sampling한 pixel value에 mapping하는 함수를 의미한다.
Scene change가 있는 것 같은 points에 대해서는 작은 가중치를 주도록 3D map score $M_{3D}$ 를 element wise product 해준다.

Scene과 lighting에 변화가 있는 OmniScenes와 Structured3D에 대해 좋은 성능을 보이고 있다.
2D, 3D score map을 visualize 한 결과 scene change가 존재하는 영역의 score가 낮게 나타나는 것을 확인할 수 있다.
Scene change가 없는 경우에도 SOTA를 찍었으며 semantic input에 대해서도 성능이 좋다.