※ 본 글에서는 VQA or 'text to bounding box' 관련 모델들의 연구 및 코드를 간단히 정리했으며, 2020~2021년의 일부 연구의 경우 추가로 성능도 비교하였습니다.
Paper : MAttNet: Modular Attention Network for Referring Expression Comprehension(2018, Cited by 265)
Code : github.com
Demo : Demo
Paper : Referring Expression Object Segmentation
with Caption-Aware Consistency(10 Oct 2019, Cited by 15)
Code : github
※ velog 내 시리즈 "Getting XAI Ideas for Object Detection..." 의 글에서도 한 번 다루었던 논문입니다.
Paper: MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding(12 Oct 2020, Cited by 3)
Code: github
Paper: Referring Expression Comprehension: A Survey of Methods and Datasets
Paper: VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Code : github.com
Paper : MDETR - Modulated Detection for End-to-End Multi-Modal Understanding
Code: github.com
Cite: Referring Expression Comprehension | Papers with code