grounded SAM 1, 2 사용

FSA·2024년 8월 22일
0

vision

목록 보기
25/25

1. grounded SAM 1

1.1. Install without Docker

  • environment variable 설정 (local GPU environment)
export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/usr/local/cuda-12.1/
  • Install Segment Anything:
python -m pip install -e segment_anything
  • Install Grounding DINO:
pip install --no-build-isolation -e GroundingDINO
  • The following optional dependencies are necessary for
    • mask post-processing,
    • saving masks in COCO format,
    • the example notebooks, and
    • exporting the model in ONNX format.
  • jupyter is also required to run the example notebooks.
pip install opencv-python pycocotools matplotlib onnxruntime onnx ipykernel

2. Grounded SAM 2

2.1. installation

  • Download the pretrained SAM 2 checkpoints:
cd checkpoints
bash download_ckpts.sh
  • Download the pretrained Grounding DINO checkpoints:
cd gdino_checkpoints
bash download_ckpts.sh

2.1.1. Installation without docker

  • Install PyTorch environment first. (in our environment to run this demo.)
    • python=3.10
    • torch >= 2.3.1
    • torchvision>=0.18.1
    • cuda-12.1
      pip3 install torch torchvision torchaudio (가장 추천하는 방법)
  • Since we need the CUDA compilation environment to compile the Deformable Attention operator used in Grounding DINO,
    • we need to check whether the CUDA environment variables have been set correctly
      • (which you can refer to Grounding DINO Installation for more details).
  • You can set the environment variable manually as follows
    • if you want to build a local GPU environment for Grounding DINO to run Grounded SAM 2:
export CUDA_HOME=/usr/local/cuda-12.1/
  • Install Segment Anything 2:
pip install -e .
  • Install Grounding DINO:
pip install --no-build-isolation -e grounding_dino


Grounded SAM 2 Demos

Grounded SAM 2 Image Demo (with Grounding DINO)

  • Grounding DINO 가 이미 Huggingface를 통해 지원됩니다.
  • 그래서 우리는 Grounded SAM 2 model을 돌리기 위해 2가지 선택지를 제공합니다.
  • [선택지 1] Use huggingface API to inference Grounding DINO (which is simple and clear)
python grounded_sam2_hf_model_demo.py

[!NOTE]
🚨 If you encounter network issues while using the HuggingFace model, you can resolve them by setting the appropriate mirror source as export HF_ENDPOINT=https://hf-mirror.com

  • Load local pretrained Grounding DINO checkpoint and inference with Grounding DINO original API
    • (make sure you've already downloaded the pretrained checkpoint)
python grounded_sam2_local_demo.py
  • TODO: 나중에 공부

Grounded SAM 2 Image Demo (with Grounding DINO 1.5 & 1.6)

  • most capable open-set detection model Grounding DINO 1.5 & 1.6,
  • You can apply the API token first and run Grounded SAM 2 with Grounding DINO 1.5 as follows:
  • Install the latest DDS cloudapi:
pip install dds-cloudapi-sdk
python grounded_sam2_gd1.5_demo.py

Grounded SAM 2 Video Object Tracking Demo

Based on the strong tracking capability of SAM 2, we can combined it with Grounding DINO for open-set object segmentation and tracking. You can run the following scripts to get the tracking results with Grounded SAM 2:

python grounded_sam2_tracking_demo.py
  • The tracking results of each frame will be saved in ./tracking_results
  • The video will be save as children_tracking_demo_video.mp4
  • You can refine this file with different text prompt and video clips yourself to get more tracking results.
  • We only prompt the first video frame with Grounding DINO here for simple usage.

Support Various Prompt Type for Tracking

We've supported different types of prompt for Grounded SAM 2 tracking demo:

  • Point Prompt: In order to get a stable segmentation results, we re-use the SAM 2 image predictor to get the prediction mask from each object based on Grounding DINO box outputs, then we uniformly sample points from the prediction mask as point prompts for SAM 2 video predictor
  • Box Prompt: We directly use the box outputs from Grounding DINO as box prompts for SAM 2 video predictor
  • Mask Prompt: We use the SAM 2 mask prediction results based on Grounding DINO box outputs as mask prompt for SAM 2 video predictor.

Grounded SAM 2 Tracking Pipeline

Grounded SAM 2 Video Object Tracking Demo (with Grounding DINO 1.5 & 1.6)

We've also support video object tracking demo based on our stronger Grounding DINO 1.5 model and SAM 2, you can try the following demo after applying the API keys for running Grounding DINO 1.5:

python grounded_sam2_tracking_demo_with_gd1.5.py

Grounded SAM 2 Video Object Tracking Demo with Custom Video Input (with Grounding DINO)

Users can upload their own video file (e.g. assets/hippopotamus.mp4) and specify their custom text prompts for grounding and tracking with Grounding DINO and SAM 2 by using the following scripts:

python grounded_sam2_tracking_demo_custom_video_input_gd1.0_hf_model.py

Grounded SAM 2 Video Object Tracking Demo with Custom Video Input (with Grounding DINO 1.5 & 1.6)

Users can upload their own video file (e.g. assets/hippopotamus.mp4) and specify their custom text prompts for grounding and tracking with Grounding DINO 1.5 and SAM 2 by using the following scripts:

python grounded_sam2_tracking_demo_custom_video_input_gd1.5.py

You can specify the params in this file:

VIDEO_PATH = "./assets/hippopotamus.mp4"
TEXT_PROMPT = "hippopotamus."
OUTPUT_VIDEO_PATH = "./hippopotamus_tracking_demo.mp4"
API_TOKEN_FOR_GD1_5 = "Your API token" # api token for G-DINO 1.5
PROMPT_TYPE_FOR_VIDEO = "mask" # using SAM 2 mask prediction as prompt for video predictor

After running our demo code, you can get the tracking results as follows:

Video Name

And we will automatically save the tracking visualization results in OUTPUT_VIDEO_PATH.

[!WARNING]
We initialize the box prompts on the first frame of the input video. If you want to start from different frame, you can refine ann_frame_idx by yourself in our code.

Grounded-SAM-2 Video Object Tracking with Continuous ID (with Grounding DINO)

In above demos, we only prompt Grounded SAM 2 in specific frame, which may not be friendly to find new object during the whole video. In this demo, we try to find new objects and assign them with new ID across the whole video, this function is still under develop. it's not that stable now.

Users can upload their own video files and specify custom text prompts for grounding and tracking using the Grounding DINO and SAM 2 frameworks. To do this, execute the script:

python grounded_sam2_tracking_demo_with_continuous_id.py

You can customize various parameters including:

  • text: The grounding text prompt.
  • video_dir: Directory containing the video files.
  • output_dir: Directory to save the processed output.
  • output_video_path: Path for the output video.
  • step: Frame stepping for processing.
  • box_threshold: box threshold for groundingdino model
  • text_threshold: text threshold for groundingdino model
    Note: This method supports only the mask type of text prompt.

After running our demo code, you can get the tracking results as follows:

Video Name

If you want to try Grounding DINO 1.5 model, you can run the following scripts after setting your API token:

python grounded_sam2_tracking_demo_with_continuous_id_gd1.5.py

Grounded-SAM-2 Video Object Tracking with Continuous ID plus Reverse Tracking(with Grounding DINO)

This method could simply cover the whole lifetime of the object

python grounded_sam2_tracking_demo_with_continuous_id_plus.py
profile
모든 의사 결정 과정을 지나칠 정도로 모두 기록하고, 나중에 스스로 피드백 하는 것

0개의 댓글