Poselib Library 소개 및 p3P, p6L 구현 코드 분석

HEEJOON MOON·2023년 2월 13일

PoseLib

Introduction

Poselib Library는 카메라 pose estimation을 위한 minimal solver를 모아둔 collection입니다. Main focus는 여러 종류의 correspondences(e.g point-point, point-line, line-point, line-line)을 이용하여 calibrated된 카메라의 absolute pose estimation입니다.

위 프로젝트의 목표로는

Fast, Robust한 최신 SOTA solvers의 구현

Different solvers간의 일관적인 calling 인터페이스

프로그램 의존성을 낮추면서, 다른 frameworks에 통합되기 쉽도록 만든다

Lo-RANSAC을 기반으로 하여 Robust estimate를 진행

Camera models

Poselib은 COLMAP과 호환되는 카메라 모델들을 사용합니다.(자세한 건 colmap_models.h를 참고) 지원되는 종류로는

SIMPLE PINHOLE
PINHOLE
SIMPLE RADIAL
RADIAL
OPENCV

가 지원됩니다. Camera 구조체에는 width, height을 포함하지만, 코드에서 사용할 일은 없으며, 기본적으로 COLMAP과 동일하게 받아온다고 합니다. Camera class는 helper function initialize_from_txt(str)함수를 가지고 있으며, camera.txt에 있는 정보(line by line)을 읽어옵니다. 또한 simple pinhole과 pinhole의 차이점으로는 simple pinhole을 distortion(radial-distortion 등)을 고려하지 않는다는 점입니다.

Python binding

Poselib은 기본적으로 c++로 작성되었습니다. 이를 python에서 사용하기 위해 pybind을 이용하며, 모든 solver에 대해서 python binding이 되었다고 합니다. Minimal solver 뿐만 아니라 robust estimator도 가능하다고 합니다. Robust estimator의 예시는 아래와 같습니다.

camera = {'model': 'SIMPLE_PINHOLE', 'width': 1200, 'height': 800, 'params': [960, 600, 400]}

pose, info = poselib.estimate_absolute_pose(p2d, p3d, camera, {'max_reproj_error': 16.0}, {})

혹은

F, info = poselib.estimate_fundamental_matrix(p2d_1, p2d_2, {'max_epipolar_error': 0.75, 'progressive_sampling': True}, {})

info: Return 값은 robust estimation(inliers, iteration 횟수등)에 대한 정보를 담고 있습니다. 마지막 2개의 option으로는 RansacOptions and BundleOptions이 있으며, 나머지 생략된 정보는 default로 설정이 되어있다고 합니다.

poselib.CameraPose

python binding은 poselib.CameraPose class를 드러내며, 이는 많은 method의 return type입니다. 즉, 카메라 포즈에 관한 클래스이며, pose를 q, t를 이용하여 표현합니다. 또한 R(3x3), Rt(3x4)의 표현을 보면, pose.R=Rnew처럼 계속 update할 수 있습니다.

struct CameraPose {
   Eigen::Vector4d q;
   Eigen::Vector3d t;
};

Minimal Solvers

Naming

Solver method의 이름은 다음과 같은 rule을 따랐다고 합니다

Xp - 2D point to 3D point, Ex) p3p
Ypl - 2D point to 3D line, Ex) p5pl
Zlp - 2D line to 3D point,
Wll - 2D line to 3D line.

또한 prefix u는 upright solver를, g는 generalized camera solvers을 의미합니다. Solvers 중에서 focal length를 측정하는 것은 f가, similarity는 s가 붙어있습니다.

Solvers basis

q, t는 [R, t]를 나타내며, world coordinate을 camera coordinate system으로 변환합니다.

2D point to 3D point

2D-3D correspondences image points들은 단위길이 벡터로 표현이 가능하며, R,t를 이용하면 아래 식을 만족합니다

$\lambda * x[i] = R * X[i] + t$

$\lambda$ 는 depth를 의미하며, $x[i]$ 는 2D normalized coordinate을 의미하고, $X[i]$ 는 world frame에서 정의된 3차원 point를 가리킨다. 즉, R, t에 의해 X[i]가 변환된 점은 normalized vector $x[i]$ 에 depth를 곱한 값과 같다.
p3p의 예제 경우, lambda 값이 음수인 것을 filtering한다.

2D lines

2D lines에서 constraints가 있는 경우, lines은 homogenous coordinates로 표현되어 있다. 2D line to 3D point contraints에서는, 반환된 카메라 pose는 아래의 식을 만족한다

$l[i].transpose() * (R * X[i] + t) = 0$ : 3차원 점이 직선 l을 지나는 것이다.

3D lines

3D lines에서 constraints가 있는 경우, lines은 world좌표계로 표현된 3D point x와 방향 벡터 v로 표현되어 있다. 2D point to 3D point contraints에서는, 반환된 카메라 pose는 아래의 식을 만족한다

$\lambda * x[i] = R * (X[i] + mu * V[i]) + t$ : 직선위의 점을 표시한 것이다

Generalized Cameras

Generalize 카메라에서는 ralys를 offset p와 bearing vector v로 나타낼 수 있다. point-to-point 대응관계에서 다음과 같은 식이 만족한다. offset p는 개인적은 추측으로는 perspective point가 없기 때문에(ray가 모이지 않음), 각 ray가 나오는 point를 p[i]로 명명한 것 같다

$p[i] + \lambda * x[i] = R * X[i] + t$

Unknown scale $\alpha$ 를 고려하면
$\alpha * p[i] + lambda * x[i] = R * X[i] + t$

예를 들어 4-points를 이용하여 얻은 generalized pose & scale solver는

int gp4ps(const std::vector<Eigen::Vector3d> &p, const std::vector<Eigen::Vector3d> &x,
              const std::vector<Eigen::Vector3d> &X, std::vector<CameraPose> *output);

p3p

$\lambda \times x = R*X +t$ 식을 기반으로 camera pose(R,t)를 추정. 이때, $x$ 는 normalized된 2D points(x,y,1 형태)를 가정한다.
ECCV 2018에 발표된 Lambda Twist: An Accurate Fast Robust Perspective Three Point (P3P) Solver를 재구현하였다.
Python binding을 이용하여 사용하면 아래와 같다

poselib.p3p(points_2D, points_3D)

<p3p.h>

int p3p(const std::vector<Eigen::Vector3d> &x, const std::vector<Eigen::Vector3d> &X, std::vector<CameraPose> *output);}

Input
x(points_2D): normalized된 2D points(x,y,1)이므로 (N,3) numpy array가 들어간다
X(points_3D): World coordinate에 정의된 point가 들어간다
Output
-> result(pose) list: quartanin과 translation vector [q,t] list가 출력된다. Solution이 여러개 있을 수 있으므로 2개 이상의 리스트가 있을 수 있다.

gen_realpose_6pt

2개의 generalized camera의 6 point correspondences를 이용하여 두 카메라간의 상대적인 pose를 추정한다
$p[i] + lambda * x[i] = R * X[i] + t$ 를 기반한 solver. p는 offset을 의미하며, x는 bearing
Larsson et al. CVPR 2017에 있는 method를 이용
Python binding을 이용하여 사용하면 아래와 같다

poselib.gen_relpose_6pt(p1, x1, p2, x2)

<gen_relpose_6pt.h>

int gen_relpose_6pt(const std::vector<Eigen::Vector3d> &p1, const std::vector<Eigen::Vector3d> &x1,
                    const std::vector<Eigen::Vector3d> &p2, const std::vector<Eigen::Vector3d> &x2,
                    std::vector<CameraPose> *output);
};
// Return: output.size()

Input
p1: camera1의 offset을 의미한다
x1(points_2D): 1번째 generalized camera에서 관측되는 normalized된 2D points(x1,y1,1)
p2: camera2의 offset을 의미한다
x2(points_2D): 2번째 generalized camera에서 관측되는 normalized된 2D points(x2,y2,1)
Output
-> result(pose): quartanin과 translation vector [q,t]가 출력된다.