[code] Grounding Image Matching in 3D with MASt3R

FSA·2024년 8월 8일

vision

목록 보기

23/25

1. 시작하기

1.1. 설치

MASt3R를 클론하세요.

git clone --recursive https://github.com/naver/mast3r
cd mast3r
# 이미 mast3r를 클론한 경우:
# git submodule update --init --recursive

conda를 사용하여 환경을 만드세요.

conda create -n mast3r python=3.11 cmake=3.14.0
conda activate mast3r
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia  # 시스템에 맞는 cuda 버전을 사용하세요.
pip install -r requirements.txt
pip install -r dust3r/requirements.txt
# 선택사항: 추가 패키지를 설치하여:
# - HEIC 이미지 지원 추가
# - visloc.py에 필요한 패키지 추가
pip install -r dust3r/requirements_optional.txt

옵션으로, RoPE의 CUDA 커널을 컴파일하세요 (CroCo v2에서와 같이).

# DUST3R는 RoPE 위치 임베딩을 사용하며, 이를 위해 CUDA 커널을 컴파일하여 실행 시간을 단축할 수 있습니다.
cd dust3r/croco/models/curope/
python setup.py build_ext --inplace
cd ../../../../

1.2. 체크포인트

체크포인트는 두 가지 방법으로 얻을 수 있습니다:
- 1. huggingface_hub 통합 기능을 사용하여 모델을 자동으로 다운로드합니다.
- 1. 또는 여러 사전 학습된 모델을 제공:

모델명	훈련 해상도	헤드	인코더	디코더
MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric	512x384, 512x336, 512x288, 512x256, 512x160	CatMLP+DPT	ViT-L	ViT-B

이 모델들을 훈련할 때 사용한 하이퍼파라미터는 'Our Hyperparameters' 섹션에서 확인할 수 있습니다.
- 사용한 데이터셋의 라이선스를 꼭 확인하세요.
특정 모델을 다운로드하려면, 예를 들어 MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth:

mkdir -p checkpoints/
wget https://download.europe.naverlabs.com/ComputerVision/MASt3R/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth -P checkpoints/

이 체크포인트에 대해, 사용된 모든 학습 데이터셋의 라이선스와 CC-BY-NC-SA 4.0을 준수해야 합니다.
특히 mapfree 데이터셋 라이선스는 매우 제한적입니다.
- 자세한 내용은 CHECKPOINTS_NOTICE를 확인하세요.

1.3. 인터랙티브 데모

huggingface 공간에서 작은 장면을 위한 새로운 스파스 글로벌 정렬을 실행하는 간단한 데모를 제공합니다:
- https://huggingface.co/spaces/naver/MASt3R
- 이미지 하나 또는 여러 개를 업로드하세요 (업로드가 완전히 완료될 때까지 기다린 후 실행 버튼을 누르세요).
- 우리는 18개 이미지까지 테스트했으며, 할당 시간 초과에 도달하기 전까지 3분이 걸렸습니다
- 사용자의 상황에 따라 다를 수 있습니다.
- 이 페이지 맨 아래에는 예제가 있습니다.
- 클릭하면 7개의 작은 네이버 랩스 유럽 타워 이미지를 캐시에서 불러와 3D 재구성을 할 수 있습니다.
- 더 큰 이미지 컬렉션을 시도하고 싶다면,
  - 이 데모의 더 완전한 버전을 로컬에서 실행할 수 있는 방법과 더 많은 세부 정보는 github.com/naver/mast3r에서 찾을 수 있습니다.
  - 이 데모에서 사용된 체크포인트는 huggingface.co/naver/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric에서 확인할 수 있습니다.
로컬에서 실행할 수 있는 두 가지 데모가 있습니다:
1. demo.py는 MASt3R를 위한 업데이트된 데모
- 더 큰 장면을 재구성할 수 있는 새로운 스파스 글로벌 정렬 방법을 사용

python3 demo.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric

# --weights를 사용하여 로컬 파일에서 체크포인트를 로드하세요, 예: --weights checkpoints/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth
# --local_network를 사용하여 로컬 네트워크에서 접근 가능하게 만들거나, --server_name을 사용하여 URL을 수동으로 지정하세요.
# --server_port를 사용하여 포트를 변경하세요, 기본적으로 7860부터 사용 가능한 포트를 검색합니다.
# --device를 사용하여 다른 장치를 사용하세요, 기본값은 "cuda"입니다.

demo_dust3r_ga.py는 dust3r에서의 동일한 데모이며, MASt3R 모델과 호환됩니다.
자세한 내용은 여기를 참조하세요.

1.4. Docker를 이용한 인터랙티브 데모

NVIDIA CUDA 지원을 포함하여 Docker를 사용하여 MASt3R를 실행하려면 다음 지침을 따르세요:
- 1. Docker 설치: 이미 설치하지 않았다면, Docker와 Docker Compose를 Docker 웹사이트에서 다운로드하고 설치하세요.
  - https://www.docker.com/get-started
- 1. NVIDIA Docker Toolkit 설치: GPU 지원을 위해 Nvidia 웹사이트에서 NVIDIA Docker Toolkit을 설치하세요.
  - https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
- 1. Docker 이미지를 빌드하고 실행하세요: ./docker 디렉토리로 이동하여 다음 명령을 실행하세요:

cd docker
bash run.sh --with-cuda --model_name="MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric"

CUDA 지원 없이 데모를 실행하려면 다음 명령을 실행하세요:

cd docker
bash run.sh --model_name="MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric"

기본적으로 demo.py는 --local_network 옵션과 함께 실행됩니다.
웹 UI에 접근하려면 http://localhost:7860/로 이동하세요 (또는 네트워크에서 접근하려면 localhost를 머신 이름으로 변경하세요).
run.sh는
- docker-compose-cuda.yml(https://github.com/naver/mast3r/blob/main/docker/docker-compose-cuda.yml) 또는 docker-compose-cpu.yml 설정 파일을 사용하여 docker-compose를 실행한 다음,
- entrypoint.sh를 사용하여 데모를 시작합니다.

1.5. 사용법

from mast3r.model import AsymmetricMASt3R
from mast3r.fast_nn import fast_reciprocal_NNs

import mast3r.utils.path_to_dust3r
from dust3r.inference import inference
from dust3r.utils.image import load_images

if __name__ == '__main__':
    device = 'cuda'
    schedule = 'cosine'
    lr = 0.01
    niter = 300

    model_name = "naver/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric"
    # 필요한 경우 로컬 체크포인트 경로를 model_name에 지정할 수 있습니다.
    model = AsymmetricMASt3R.from_pretrained(model_name).to(device)
    images = load_images(['dust3r/croco/assets/Chateau1.png', 'dust3r/croco/assets/Chateau2.png'], size=512)
    output = inference([tuple(images)], model, device, batch_size=1, verbose=False)

    # 이 단계에서, raw dust3r 예측을 가지고 있습니다.
    view1, pred1 = output['view1'], output['pred1']
    view2, pred2 = output['view2'], output['pred2']

    desc1, desc2 = pred1['desc'].squeeze(0).detach(), pred2['desc'].squeeze(0).detach()

    # 두 이미지 간의 2D-2D 매칭을 찾습니다.
    matches_im0, matches_im1 = fast_reciprocal_NNs(desc1, desc2, subsample_or_initxy1=8,
                                                   device=device, dist='dot', block_size=2**13)

    # 가장자리 주변의 작은 경계를 무시합니다.
    H0, W0 = view1['true_shape'][0]
    valid_matches_im0 = (matches_im0[:, 0] >= 3) & (matches_im0[:, 0] < int(W0) - 3) & (
        matches_im0[:, 1] >= 3) & (matches_im0[:, 1] < int(H0) - 3)

    H1, W1 = view2['true_shape'][0]
    valid_matches_im1 = (matches_im1[:, 0] >= 3) & (matches_im1[:, 0] < int(W1) - 3) & (
        matches_im1[:, 1] >= 3) & (matches_im1[:, 1] < int(H1) - 3)

    valid_matches = valid_matches_im0 & valid_matches_im1
    matches_im0, matches_im1 = matches_im0[valid_matches], matches_im1[valid_matches]

    # 몇 가지 매칭을 시각화합니다.
    import numpy as np
    import torch
    import torchvision.transforms.functional
    from matplotlib import pyplot as pl

    n_viz = 20
    num_matches = matches_im0.shape[0]
    match_idx_to_viz = np.round(np.linspace(0, num_matches - 1, n_viz)).astype(int)
    viz_matches_im0, viz_matches_im1 = matches_im0[match_idx_to_viz], matches_im1[match_idx_to_viz]

    image_mean = torch.as_tensor([0.5, 0.5, 0.5], device='cpu').reshape(1, 3, 1, 1)
    image_std = torch.as_tensor([0.5, 0.5, 0.5], device='cpu').reshape(1, 3, 1, 1)

    viz_imgs = []
    for i, view in enumerate([view1, view

2]):
        rgb_tensor = view['img'] * image_std + image_mean
        viz_imgs.append(rgb_tensor.squeeze(0).permute(1, 2, 0).cpu().numpy())

    H0, W0, H1, W1 = *viz_imgs[0].shape[:2], *viz_imgs[1].shape[:2]
    img0 = np.pad(viz_imgs[0], ((0, max(H1 - H0, 0)), (0, 0), (0, 0)), 'constant', constant_values=0)
    img1 = np.pad(viz_imgs[1], ((0, max(H0 - H1, 0)), (0, 0), (0, 0)), 'constant', constant_values=0)
    img = np.concatenate((img0, img1), axis=1)
    pl.figure()
    pl.imshow(img)
    cmap = pl.get_cmap('jet')
    for i in range(n_viz):
        (x0, y0), (x1, y1) = viz_matches_im0[i].T, viz_matches_im1[i].T
        pl.plot([x0, x1 + W0], [y0, y1], '-+', color=cmap(i / (n_viz - 1)), scalex=False, scaley=False)
    pl.show(block=True)

이 프로그램은 MASt3R 모델을 사용하여
- 두 이미지에서 추출한 특징을 기반으로 이미지 간의 대응 관계를 찾고,
- 이 매칭 결과를 시각화합니다.

1.5.1. 주요 단계와 역할

모델 로드 및 준비
- 사전 학습된 MASt3R 모델을 불러오고 device에 맞게 설정합니다.
- 예제 이미지를 로드합니다.
모델 추론
- MASt3R 모델을 사용하여 이미지에서 특징을 추출
- 추론 결과로부터 두 이미지의 특징 벡터를 가져옴
특징 매칭
- fast_reciprocal_NNs 함수를 사용하여 두 이미지 간의 2D-2D 특징 매칭을 수행
- 이미지 가장자리의 작은 경계를 무시하여 유효한 매칭을 필터링
매칭 결과 시각화
- 몇 가지 매칭 결과를 시각화하여 결과를 확인
- Matplotlib을 사용하여 두 이미지와 매칭된 특징들을 한 화면에 보여줌
- 색상 맵을 사용하여 매칭된 특징을 연결하는 선의 색상을 지정

2. Visual Localization

2.1. 데이터셋 준비

DUSt3R의 Visloc 섹션을 참조하세요. (https://github.com/naver/dust3r/blob/main/dust3r_visloc/README.md#dataset-preparation)

2.2. 예제 명령어

visloc.py를 사용하여 Aachen-Day-Night, InLoc, Cambridge Landmarks 및 7 Scenes에서 우리의 시각적 위치 추정 실험을 실행할 수 있습니다.

2.2.1. Aachen-Day-Night-v1.1:

scene은 'day', 'night', 또는 'all'로 설정할 수 있습니다.

python3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset "VislocAachenDayNight('/path/to/prepared/Aachen-Day-Night-v1.1/', subscene='${scene}', pairsfile='fire_top50', topk=20)" --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir /path/to/output/Aachen-Day-Night-v1.1/${scene}/loc

또는 coarse to fine 방식으로 실행:

python3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset "VislocAachenDayNight('/path/to/prepared/Aachen-Day-Night-v1.1/', subscene='${scene}', pairsfile='fire_top50', topk=20)" --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir /path/to/output/Aachen-Day-Night-v1.1/${scene}/loc --coarse_to_fine --max_batch_size 48 --c2f_crop_with_homography

2.2.2. InLoc

python3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset "VislocInLoc('/path/to/prepared/InLoc/', pairsfile='pairs-query-netvlad40-temporal', topk=20)" --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir /path/to/output/InLoc/loc

또는 coarse to fine 방식으로 실행:

python3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset "VislocInLoc('/path/to/prepared/InLoc/', pairsfile='pairs-query-netvlad40-temporal', topk=20)" --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir /path/to/output/InLoc/loc --coarse_to_fine --max_image_size 1200 --max_batch_size 48 --c2f_crop_with_homography

2.2.3. 7-scenes:

scene은 'chess', 'fire', 'heads', 'office', 'pumpkin', 'redkitchen', 'stairs'로 설정할 수 있습니다.

python3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset "VislocSevenScenes('/path/to/prepared/7-scenes/', subscene='${scene}', pairsfile='APGeM-LM18_top20', topk=1)" --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir /path/to/output/7-scenes/${scene}/loc

2.2.4. Cambridge Landmarks:

scene은 'ShopFacade', 'GreatCourt', 'KingsCollege', 'OldHospital', 'StMarysChurch'로 설정할 수 있습니다.

python3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset "VislocCambridgeLandmarks('/path/to/prepared/Cambridge_Landmarks/', subscene='${scene}', pairsfile='APGeM-LM18_top50', topk=20)" --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir /path/to/output/Cambridge_Landmarks/${scene}/loc

3. 실험 결과

FSA

모든 의사 결정 과정을 지나칠 정도로 모두 기록하고, 나중에 스스로 피드백 하는 것

[code] Grounding Image Matching in 3D with MASt3R

vision

1. 시작하기

1.1. 설치

1.2. 체크포인트

1.3. 인터랙티브 데모

1.4. Docker를 이용한 인터랙티브 데모

1.5. 사용법

1.5.1. 주요 단계와 역할

2. Visual Localization

2.1. 데이터셋 준비

2.2. 예제 명령어

2.2.1. Aachen-Day-Night-v1.1:

2.2.2. InLoc

2.2.3. 7-scenes:

2.2.4. Cambridge Landmarks:

3. 실험 결과

Feature Matching

[code] Dust3r

0개의 댓글

관련 채용 정보