Multi-Modal 논문 리뷰

1.[논문리뷰] CLIP: Learning Transferable Visual Models From Natural Lanugage Supervision

post-thumbnail

2.[논문리뷰] Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

post-thumbnail

3.[논문 리뷰] BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

post-thumbnail

4.[논문 리뷰] LLAVA:Visual Instruction Tuning

post-thumbnail