FSA.log
로그인
FSA.log
로그인
[ARV] 논문 공부 리스트
FSA
·
2024년 11월 20일
팔로우
0
0
action recognition in videos
목록 보기
15/19
VIDEO FOUNDATION MODEL
1.
VideoChat: Chat-Centric Video Understanding
2024, 1
https://arxiv.org/pdf/2305.06355
https://github.com/OpenGVLab/Ask-Anything
3100 star
https://velog.io/@hsbc/VideoChat-Chat-Centric-Video-Understanding
2. Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
2023, 10
https://arxiv.org/pdf/2306.02858
https://github.com/DAMO-NLP-SG/Video-LLaMA
2800 stars
3. Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
2024, 6
https://arxiv.org/pdf/2306.05424
https://github.com/mbzuai-oryx/Video-ChatGPT
1200 star
4. MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
2024, 3
https://arxiv.org/pdf/2307.16449
https://github.com/rese1f/MovieChat
528 stars
5.
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
2023, 4
https://arxiv.org/pdf/2303.16727
https://github.com/OpenGVLab/VideoMAEv2
520 star
6. mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
2023, 2
https://arxiv.org/pdf/2302.00402
https://github.com/alibaba/AliceMind/tree/main/mPLUG
래포 자체는 2000 star
그 중 일부로, 해당 논문 구현체가 있음
7. Unmasked Teacher: Towards Training-Efficient Video Foundation Models
2024, 3
https://arxiv.org/pdf/2303.16058
https://github.com/OpenGVLab/unmasked_teacher
297 star
8. AIM: ADAPTING IMAGE MODELS FOR EFFICIENT VIDEO ACTION RECOGNITION
2023, 2
https://arxiv.org/pdf/2302.03024
https://github.com/taoyang1122/adapt-image-models
278 star
EVA: Visual Representation Fantasies from BAAI
https://github.com/baaivision/EVA
2300 star
eva
https://openaccess.thecvf.com/content/CVPR2023/papers/Fang_EVA_Exploring_the_Limits_of_Masked_Visual_Representation_Learning_at_CVPR_2023_paper.pdf
2023, 620회 인용
eva2
https://arxiv.org/pdf/2303.11331
2024, 192회 인용
eva-clip
https://arxiv.org/pdf/2303.15389
2023, 365회 인용
eva-clip 2
https://arxiv.org/pdf/2402.04252
2024, 24회 인용
9. SVFormer: Semi-supervised Video Transformer for Action Recognition
https://openaccess.thecvf.com/content/CVPR2023/papers/Xing_SVFormer_Semi-Supervised_Video_Transformer_for_Action_Recognition_CVPR_2023_paper.pdf
https://github.com/ChenHsing/SVFormer
: 84 stars
MOMENT RETEREIVAL
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
2024, 72회 인용
https://openaccess.thecvf.com/content/CVPR2024/papers/Ren_TimeChat_A_Time-sensitive_Multimodal_Large_Language_Model_for_Long_Video_CVPR_2024_paper.pdf
https://github.com/RenShuhuai-Andy/TimeChat
292 stars
UniVTG: Towards Unified Video-Language Temporal Grounding
2023, 82회 인용
https://openaccess.thecvf.com/content/ICCV2023/papers/Lin_UniVTG_Towards_Unified_Video-Language_Temporal_Grounding_ICCV_2023_paper.pdf
https://github.com/showlab/UniVTG
323 stars
Self-Chained Image-Language Model for Video Localization and Question Answering
2023, 104회 인용
https://proceedings.neurips.cc/paper_files/paper/2023/file/f22a9af8dbb348952b08bd58d4734b50-Paper-Conference.pdf
https://github.com/Yui010206/SeViLA
178 star
FSA
모든 의사 결정 과정을 지나칠 정도로 모두 기록하고, 나중에 스스로 피드백 하는 것
팔로우
이전 포스트
action recognition 논문들
다음 포스트
VideoChat: Chat-Centric Video Understanding
0개의 댓글
댓글 작성