Computer Vision 개념 정리!

김민솔·2024년 4월 27일

Vision

목록 보기

2/3

Computer vision 개념들을 해당 포스트에 지속적으로 업데이트하며 정리하겠습니다! 필요한 개념이 있으시다면, 검색하면서 찾아보시는 것이 좋을 것 같아요!

2D Points

2D points in inhomogeneous coordinates

\mathbf{x} = \begin{pmatrix} x \\ y \end{pmatrix} \in \mathbb{R}^2

2D points in homogeneous coordinates

\tilde{\mathbf{x}} = \begin{pmatrix} \tilde{x} \\ \tilde{y} \\ \tilde{w} \end{pmatrix} \in \mathbb{R}^2

homogeneous 좌표계에선 scale만 다른 벡터는 동일한 벡터임! ex) (2, 2, 2) = (1, 1, 1)

2D points in homogeneous coordinates ↔ inhomogeneous coordinates

\bar{\mathbf{x}} = \begin{pmatrix} \mathbf{x} \\ 1 \end{pmatrix} = \tilde{\mathbf{x}} = \begin{pmatrix} x \\ y \\ 1 \end{pmatrix} = \frac 1 {\tilde{w}}\tilde{\mathbf{x}} = \frac 1 {\tilde{w}} \begin{pmatrix} \tilde{x} \\ \tilde{y} \\ \tilde{w} \end{pmatrix} = \begin{pmatrix} {\tilde{x}} / \tilde{w} \\ \tilde{y} / \tilde{w} \\ 1 \end{pmatrix}

2D lines

\{\bar{\mathbf{x}}|\tilde{\mathbf{l}}^T\bar{\mathbf{x}}=0 \} \iff \{x, y|ax+by+c=0 \}

$\tilde{\mathbf{l}} = (a, b, c)^T$
normalize! → $\tilde{\mathbf{l}} = (n_x, n_y, d)^T = (\mathbf{n}, d)^T$ , $||\mathbf{n}||_2 = 1$

2D Line Arithmetic

intersection of two lines: $\tilde{\mathbf{x}} = \tilde{\mathbf{l}}_1 \times \tilde{\mathbf{l}}_2$
line joining two points: $\tilde{\mathbf{l}} = \bar{\mathbf{x}}_1 \times \bar{\mathbf{x}}_2$

2D conics

평면과 3D 콘의 교차평면!

multi-view geometry, camera calibraion에 유용!

\{\bar{\mathbf{x}}|\bar{\mathbf{x}}^T{\mathbf{Q}}\bar{\mathbf{x}}=0 \}

3D Points

3D points in inhomogeneous coordinates

\mathbf{x} = \begin{pmatrix} x \\ y \\ z \end{pmatrix} \in \mathbb{R}^3

3D points in homogeneous coordinates

\tilde{\mathbf{x}} = \begin{pmatrix} \tilde{x} \\ \tilde{y} \\ \tilde{w} \\ \tilde{z} \end{pmatrix} \in \mathbb{R}^3

3D plains

\{\bar{\mathbf{x}}|\tilde{\mathbf{m}}^T\bar{\mathbf{x}}=0 \} \iff \{x, y,z|ax+by+cz+d=0 \}

$\tilde{\mathbf{m}} = (a, b, c)^T$
normalize! → $\tilde{\mathbf{l}} = (n_x, n_y, n_z, d)^T = (\mathbf{n}, d)^T$ , $||\mathbf{n}||_2 = 1$

3D lines

\{{\mathbf{x}}|(1 - \lambda){\mathbf{p}}+\lambda {\mathbf{q}} \land \lambda \in \mathbb{R} \}

선에 놓인 two points p,q의 선형 결합으로 표현 가능.
two-plane parameterization으로 표현하는 것도 가능.

3D Qudadrics

2D conics의 3D 아날로그가 quadric surface임. multi-view geometry에서 사용

\{\bar{\mathbf{x}}|\bar{\mathbf{x}}^T{\mathbf{Q}}\bar{\mathbf{x}}=0 \}

2D Transformations

Translation

좌표 이동 / 2 DoF(Degrees of Freedom)
homogeneous(투영) 좌표계를 사용하여 chain, invert transformation 가능!

\mathbf{x}' = \mathbf{x} + \mathbf{t} \iff \bar{\mathbf{x}}' = \begin{bmatrix} \mathbf{I} & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix} \bar{\mathbf{x}}

Euclidean

좌표 이동 + 회전 / 3 DoF(Degrees of Freedom)
$\mathbf{R} \in SO(2)$ : 직교 회전 행렬! ( $\mathbf{R}\mathbf{R}^T = \mathbf{I}$ , $det(\mathbf{R}) = 1$ )
유클리디언 변환은 유클리디언 거리를 보존함!

\mathbf{x}' = \mathbf{x} + \mathbf{R}\mathbf{x} \iff \bar{\mathbf{x}}' = \begin{bmatrix} \mathbf{R} & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix} \bar{\mathbf{x}}

Similarity

좌표 이동 + 회전 with scale / 4 DoF(Degrees of Freedom)
두 선 사이의 각도 보존!

\mathbf{x}' = \mathbf{x} + s\mathbf{R}\mathbf{x} \iff \bar{\mathbf{x}}' = \begin{bmatrix} s\mathbf{R} & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix} \bar{\mathbf{x}}

Affine

모든 2차원 선형 변환! / 6 DoF(Degrees of Freedom)
$\mathbf{A} \in \mathbb{R}^{2\times2}$ : 임의의 2 x 2 행렬
평행한 선은 평행하게 보존!

\mathbf{x}' = \mathbf{x} + \mathbf{A}\mathbf{x} \iff \bar{\mathbf{x}}' = \begin{bmatrix} \mathbf{A} & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix} \bar{\mathbf{x}}

Projective

Homography! / 8 DoF(Degrees of Freedom)
$\mathbf{H} \in \mathbb{R}^{3\times3}$ : 임의의 homogeneous 행렬
일직선(straight)은 일직선으로 보존!
전치 역행렬을 통해서 co-vector의 Projective 변환을 나타내는 것도 가능함!
- $\tilde{\mathbf{l}}' = \tilde{\mathbf{H}}^{-T}\tilde{\mathbf{l}}$

\tilde{\mathbf{x}}' = \tilde{\mathbf{H}}\tilde{\mathbf{x}}

3D Transformations

(2D 변환과 유사하기 때문에 생략!)

Direct Linear Transform

두 homogeneous 벡터에 대해서 Homograpy를 추정하는 방법.

$\mathcal{X} = \{\tilde{\mathbf{x}}_i, \tilde{\mathbf{x}}_i' \}^N_{i=1}$ : a set of N개의 2D-to-2D correspondences by $\tilde{\mathbf{x}}_i'= \tilde{\mathbf{H}}\tilde{\mathbf{x}}_i$ T (Homography)
이때 두 벡터는 방향이 같으므로, $\tilde{\mathbf{x}}_i'\times \tilde{\mathbf{H}}\tilde{\mathbf{x}}_i = 0$ 의 방정식을 가짐. → 아래의 식으로 표현 가능!

\begin{bmatrix} \mathbf{0}^T & -\tilde{w}'_i\tilde{\mathbf{x}}_i'^T & \tilde{{y}}_i'\tilde{\mathbf{x}}_i'^T \\ \tilde{w}'_i\tilde{\mathbf{x}}_i'^T & \mathbf{0}^T & -\tilde{{x}}_i'\tilde{\mathbf{x}}_i'^T \\ -\tilde{{y}}_i'\tilde{\mathbf{x}}_i'^T & \tilde{{x}}_i'\tilde{\mathbf{x}}_i'^T & \mathbf{0}^T \end{bmatrix} \begin{bmatrix} \tilde{\mathbf{h}}_1 \\ \tilde{\mathbf{h}}_2 \\ \tilde{\mathbf{h}}_3 \end{bmatrix} = 0

$\mathbf{A}_i = \begin{bmatrix} \mathbf{0}^T & -\tilde{w}'_i\tilde{\mathbf{x}}_i'^T & \tilde{{y}}_i'\tilde{\mathbf{x}}_i'^T \\ \tilde{w}'_i\tilde{\mathbf{x}}_i'^T & \mathbf{0}^T & -\tilde{{x}}_i'\tilde{\mathbf{x}}_i'^T \\ -\tilde{{y}}_i'\tilde{\mathbf{x}}_i'^T & \tilde{{x}}_i'\tilde{\mathbf{x}}_i'^T & \mathbf{0}^T \end{bmatrix}$
$\tilde{\mathbf{h}} = \begin{bmatrix} \tilde{\mathbf{h}}_1 \\ \tilde{\mathbf{h}}_2 \\ \tilde{\mathbf{h}}_3 \end{bmatrix}$ / $\tilde{\mathbf{h}}_k^T$ : Homography 행렬의 k번째 행!

A행렬을 2N x 9 행렬로 쌓은 후, least square 문제를 해결하는 것으로 Homography를 추정할 수 있음!! + 위 방정식의 optimal solution은 SVD의 가장 작은 특이값에 해당하는 고유벡터와 같다! 궁금하다면, 블로그 내용 참고 특이값 분해(SVD) - gaussian37

\tilde{\mathbf{h}}^∗ = \argmin _{\tilde{\mathbf{h}}} ||{\mathbf{A}}\tilde{\mathbf{h}}||^2_2 + \lambda(||\tilde{\mathbf{h}}||^2_2-1) \\ = \argmin _{\tilde{\mathbf{h}}} \tilde{\mathbf{h}}^T\tilde{\mathbf{A}}^T{\mathbf{A}}\tilde{\mathbf{h}} + \lambda(\tilde{\mathbf{h}}^T\tilde{\mathbf{h}}-1)

Projection Models

Orthographic Projection

Focal length가 길 경우에 사용(천체 관측)
각 ray가 카메라 좌표계(z축)와 평행함

Perspective Projection

카메라 센터, focal point로 빛이 통과
Object size 바뀜

Orthographic Projection

orthographic projection은 좌표의 z 성분을 드랍하는 것으로 적용됨.

{\mathbf{x}}_s = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} {\mathbf{x}}_c \iff \bar{\mathbf{x}}_s = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \bar{\mathbf{x}}_c

scale 적용하는 것도 가능. $s$ : px/m or px/mm (for 3D points → pixels)

Perspective Projection

perspective projection은 z 성분을 나누는 것으로 사영하는 방법임.

\begin{pmatrix} x_s \\ y_s \end{pmatrix} = \begin{pmatrix} fx_c/z_c \\ fy_c/z_c \end{pmatrix} \iff \tilde{\mathbf{x}}_s = \begin{bmatrix} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \bar{\mathbf{x}}_c

Principal Point Offset

pixel 좌표계를 positive(0~255)로 유지하기 위하여, offset $c$ 설정! → image plane의 코너로 좌표계 옮김

\begin{pmatrix} x_s \\ y_s \end{pmatrix} = \begin{pmatrix} fx_c/z_c + sy_c/z_c + c_x \\ fy_c/z_c + c_y \end{pmatrix} \iff \tilde{\mathbf{x}}_s = \begin{bmatrix} f_x & s & c_x & 0 \\ 0 & f_y & c_y & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \bar{\mathbf{x}}_c

$\mathbf{K} = \begin{bmatrix} f_s & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}$ : calibration matrix K , K의 파라미터 = camera intrinsics

Transformations Chain

$\mathbf{K}$ : calibration matrix(intrinsics)
$[\mathbf{R}|t]$ : camera pose(extrinsics)

\tilde{\mathbf{x}}_s = \begin{bmatrix} {\mathbf{K}} & {\mathbf{0}} \end{bmatrix} \bar{\mathbf{x}}_c = \begin{bmatrix} {\mathbf{K}} & {\mathbf{0}} \end{bmatrix} \begin{bmatrix} \mathbf{R} & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix} \bar{\mathbf{x}}_w = \mathbf{P}\bar{\mathbf{x}}_w

$\bar{\mathbf{x}}_s = \tilde{\mathbf{x}}_s / {z}_s = (x_s/z_s, y_s/z_s,1,1/z_s)^T$ → 4번째 값: inverse depth
- inverse depth 알고 있으면 pixel 좌표로부터 3D point 얻을 수 있음!

Lens Distortion

카메라 렌즈의 왜곡 현상 때문에, linear projection의 추정이 달라지게 됨.

\mathbf{x}' = (1 + \kappa_1\gamma^2+\kappa_2\gamma^4)\begin{pmatrix} x \\ y \end{pmatrix} + \begin{pmatrix} 2\kappa_3xy+\kappa_4(\gamma^2+2x^2) \\ 2\kappa_4xy+\kappa_3(\gamma^2+2y^2) \end{pmatrix} = \begin{pmatrix} x' \\ y' \end{pmatrix}

\mathbf{x}_s = \begin{pmatrix} f_xx' + c_x \\ f_yy' + c_y \end{pmatrix}

Radial Distortion: $(1 + \kappa_1\gamma^2+\kappa_2\gamma^4)$
Tangential Distortion: $\begin{pmatrix} 2\kappa_3xy+\kappa_4(\gamma^2+2x^2) \\ 2\kappa_4xy+\kappa_3(\gamma^2+2y^2) \end{pmatrix}$

Photometric Image Formation

위에서는 픽셀과 컬러에 대해서 이미지를 살펴보았다면, 지금부턴 반사 or 굴절 등에 의해 빛이 방출되어 발생하는 효과들에 대한 개념임.

Rendering Equation

L_{out}(p, v, \lambda) = L_{emit}(p,v,\lambda) + \int_\Omega BRDF(p,s,v,\lambda) \cdot L_{in}(p,s,\lambda) \cdot (-n^Ts)ds

$\Omega$ : 단위 반구 at normal n
BDRF: Bidirectional Reflectance Distribution Function → 빛이 불투명한 표면에서 어떻게 반사되는지 정의
$(−n^Tr)$ - attenuation equation (if light arrives exactly perpendicular, there is no reflectance at all. or if it arrives at a shallow angle, there is less light reflected).
$L_{emit} > 0$ : 표면이 빛을 방출할 때만!

위와 같은 BDRF의 특성들도 존재함.

Fresnel Effect

바라보는 각도에 따라 표면으로부터 빛이 반사되는 양

Global Illumination

Occulusion 등의 현상으로 하나의 빛으로는 렌더링이 불충분할 때 사용

Camera Lenses

렌즈 없이 핀홀만을 사용하게 되면, 이미지에 blur 효과가 발생함! (핀홀 크기 클 경우 → averaging, 핀홀 크기 작을 경우 → diffraction + shutter time 매우 길어져 motion blur 유발)
따라서 렌즈로 충분한 양의 빛을 얻어 이미지를 얻어냄!
하지만 focus, vignetting, aberration은 조정해야 함

Thin Lens Model

초록색 박스: 빨간색 삼각형 비율
빨간색 박스: 초록색 삼각형 비율
$f$ : focal length of the lens.

\frac {x_s} {x_c} \rightarrow \frac {z_s-f} {f} = \frac {z_s} {z_c} \rightarrow \frac {1} {z_s} + \frac 1 {z_c} = \frac {1} {f}

thin lens model은 approximation에 주로 사용함.
z축과 평행한 rays가 focal point 통과함

Depth of Field

$\frac {1} {z_s} + \frac 1 {z_c} = \frac {1} {f}$ 를 만족할 때 이미지가 in focus(초점)인 상태임.
out of focus일 때, 3D points가 circle of confusion $c$ 로 사영됨.
이 $c$ 를 제한하는 depth variation이 depth of field이며, focus distance와 lens aperture의 함수로도 볼 수 있음. → lens aperture(노출도)를 조절하여 $c$ 의 사이즈를 조정!

N = \frac f d \quad \text{(often denoted as f/N)}

Chromatic Aberration

다른 색들에 대해 빛이 약간 다른 거리에서 초점이 맞춰지는 경향을 의미함.
해결방법: 다양한 유리 재질로 렌즈 구성하기.

Vignetting

이미지 edge에서 밝기가 떨어지는 현상임.
- 자연적 요인: 피사체의 표면을 무한하게 볼 수 없음 + 렌즈 노출도
- 기계적 요인: 광선의 그늘진 부분(그림 참고)이 이미지에 도달 불가능
calibration으로 해결 가능.

Image Sensing Pipeline

Image sensing pipeline은 다음 세 가지 단계로 구성됨.

Physical light transport (in the camera body)
Photon measurement (on the sensor chip)
Image signal processing

Shutter

셔터 스피드 → 빛이 센서에 얼마나 들어올지 조정
셔터를 통해 이미지의 밝기, 흐림, 노이즈 정도가 결정됨.

Sensor

CCD: 전하를 pixel-to-pixel로 전달.
output node에서 전압으로 전환
CMOS: 전하를 전압으로 전환 in each pixel

Color Filter Arrays

이미지의 color 값 얻기 위해 사용! (센서만으로는 불가능.)

Gamma Compression

인간은 어두운 환경에서 눈이 민감하게 반응함 → 비선형적으로 강도나 색을 변형해야 함.

Image Compression

우리가 흔히 아는 이미지 압축
8 x 8 patch-based discrete cosine or wavelet transforms 사용!
Discrete Cosine Transform: 자연 이미지에 PCA 적용하는 것과 유사

Reference

[1] Andreas Geiger, Computer vision (2023)

김민솔

Interested in Vision, Generative, NeRFs

이전 포스트

Structure-from-Motion (COLMAP)

다음 포스트