시리즈

논문리뷰

1.[논문리뷰] ViLT Vision-and-Language Transformer Without Convolution or Region Supervision

\[2102.03334\] ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision 1. Introduction Vision-Language Model 순서 Vision-and-L

2024년 9월 23일

2.[논문리뷰] BEIT v1, v2, v3 정리 및 비교

Self-supervised Pre-trained Model이 이제는 표준이 된 시대다. 이 방식의 핵심 아이디어는 대량의 라벨링되지 않은 데이터로 Self-supervised Pre-training을 거쳐 BERT나 GPT 같은 모델을 얻고, 이후 간단한 Fine-t

2024년 10월 7일

3.[논문리뷰] LLaVA: Large Language and Vision Assistant (Visual Instruction Tuning)

Visual Instruction TuningImproved Baselines with Visual Instruction TuningLLaVA는 출시 당시 가장 우수한 오픈소스 Vision-Language 모델이었다. 8개의 A100 서버랙으로 하루만 학습해도 충분했다

2024년 10월 9일

4.[논문리뷰] HELPD: Mitigating Hallucination of LVLMs by Hierarchical FeedbackLearning with Vision-enhanced Penalty Decoding

HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding(Bai et al. 2024)LVLM(Large Vision-Lan

2024년 11월 13일

5.[논문리뷰] VisionZip: Longer is Better but Not Necessary in Vision Language Models

VisionZip: Longer is Better but Not Necessary in Vision Language Modelsdvlab-research/VisionZip: Official repo for "VisionZip: Longer is Better but No

2024년 12월 9일