DETR

ODD·2024년 10월 8일

DETR/Quantization Papers models

Backbone: ResNet-50

Extracts features from input image
image -> lower-dimensional feature map

Transformer

models relationships between different parts of the image
(spatial relationship)

Multi-Head Attention

positional Encoding

where are objects located in the image?

Bounding Box Prediction

using linear layer

Object detection set prediction loss

DETR infers a fixed-size set of N predictions
N is the number of class padded with no object
Hungarian algorithm을 활용해 matching cost가 가장 적은 match를 찾음
matching cost는 class prediction과 box에 대해 각각 similarity를 계산
이 때, Bounding box loss가 박스의 크기에 따라 bias될 수 있으므로 IoU loss를 사용해 generalize하도록 함

Evaluation

Dataset: COCO 2017
Trained on 16 V100 GPUs 3 days

이전 포스트

논문 읽을 때 팁

다음 포스트

Velog 단축키 스크립트

0개의 댓글