[ARV] 논문 공부 리스트

FSA·2024년 11월 20일
0

VIDEO FOUNDATION MODEL

1. VideoChat: Chat-Centric Video Understanding

2. Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

3. Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

4. MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

5. VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

6. mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

7. Unmasked Teacher: Towards Training-Efficient Video Foundation Models

8. AIM: ADAPTING IMAGE MODELS FOR EFFICIENT VIDEO ACTION RECOGNITION

EVA: Visual Representation Fantasies from BAAI

9. SVFormer: Semi-supervised Video Transformer for Action Recognition

MOMENT RETEREIVAL

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

UniVTG: Towards Unified Video-Language Temporal Grounding

Self-Chained Image-Language Model for Video Localization and Question Answering

profile
모든 의사 결정 과정을 지나칠 정도로 모두 기록하고, 나중에 스스로 피드백 하는 것

0개의 댓글