[Video foundation model] 논문 공부 리스트

FSA·2024년 11월 20일
0

VIDEO FOUNDATION MODEL

1. VideoChat: Chat-Centric Video Understanding

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

2. Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

VideoLLaMA 3: Frontier Multimodal Foundation

Models for Image and Video Understanding

Revisiting Feature Prediction for Learning Visual Representations from Video

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

World Model on Million-Length Video And Language With Blockwise RingAttention

3. Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

5. VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

6. mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

7. Unmasked Teacher: Towards Training-Efficient Video Foundation Models

8. AIM: ADAPTING IMAGE MODELS FOR EFFICIENT VIDEO ACTION RECOGNITION

EVA: Visual Representation Fantasies from BAAI

9. SVFormer: Semi-supervised Video Transformer for Action Recognition

MOMENT RETEREIVAL

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

UniVTG: Towards Unified Video-Language Temporal Grounding

Self-Chained Image-Language Model for Video Localization and Question Answering

profile
모든 의사 결정 과정을 지나칠 정도로 모두 기록하고, 나중에 스스로 피드백 하는 것

0개의 댓글

관련 채용 정보