시리즈

Multimodal Deep Learning

1.[관련연구]VQA - papers with code

※ 본 글은 Video object segmentation에 선행하는 모델로서 작동하는 VQA 모델을 찾기 위해 사용가능한 github code와 paper를 대략적으로 정리한 글입니다. 작성일 : 2021-05-30

2021년 7월 26일

※ Text를 이용해 Image 내의 물체를 탐색하는 연구에 관해 정리한 글이다. 즉, Text(문장, 구, 절, 단어 등)과 Image를 input으로 받아 Output으로 Bounding box를 반환해주는 Text-Object-Detection 연구.

2021년 7월 26일

※ 본 글은 VQA or 'text to bounding box' 관련 모델들의 연구 및 코드를 간단히 정리했으며, 2020~2021년의 일부 연구의 경우 추가로 성능도 비교하였습니다.

2021년 7월 26일

※ 본 글은 Image Captioning task 내에서 XAI적 접근법을 적용한 한 논문을 다룹니다. Paper: EXplainable AI (XAI) approach to image captioning]

2021년 7월 26일

"MDETR - Modulated Detection for End-to-End Multi-Modal Understanding(2021)"에 관한 리뷰입니다.

2021년 7월 27일

Survey paper for VQA(2020)

2021년 11월 26일

Paper:From Show to Tell: A Survey on Deep Learning-based Image Captioning

2022년 1월 7일

Paper review for "From Show to Tell: A Survey on Deep Learning-based Image Captioning"(Language Model을 중심으로)

2022년 1월 10일

Paper review for X-Linear Attention blocks

2022년 1월 29일