시리즈

[논문리뷰]

1.[논문리뷰] Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

원하는 이미지에 대한 설명을 텍스트로 넣으면 이미지를 만들어 주는 생성 모델: Kandinsky

2025년 5월 31일

2.[논문리뷰] CONTINUAL LEARNING AND CATASTROPHIC FORGETTING

파인튜닝을 하면 기존 성능을 잃어버린다고?

2025년 4월 26일

3.[논문리뷰] A Survey on Multimodal Large Language Models

A Survey on Multimodal Large Language Models

2025년 6월 30일

4.[논문리뷰] BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

2025년 7월 23일

5.[논문리뷰] Visual Instruction Tuning

LLaMA에서 LLaVA로 진화!

2025년 9월 7일

6.[논문리뷰] EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents

방탈출 게임 논문 (이미지 사용하는 척)

2025년 9월 14일

7.[논문리뷰] CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

모델이 행동하기 전에 어떻게 행동할지 이미지로 상상하고 행동한다고??

2025년 9월 21일

8.[논문리뷰] EscapeCraft: A 3D Room Escape Environment for Benchmarking Complex Multimodal Reasoning Ability

3D 환경에서 진짜 방탈출 해보자!!

2025년 9월 23일

9.[논문리뷰] ORAK: A FOUNDATIONAL BENCHMARK FOR TRAINING AND EVALUATING LLM AGENTS ON DIVERSE VIDEO GAMES

생각하는 NPC도 머지 않았다.

2025년 11월 2일

10.[논문리뷰] FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games

34개 게임 실화?

2025년 11월 10일

11.[논문리뷰] T*: Re-thinking Temporal Search for Long-Form Video Understanding

시간 차원을 공간 차원으로 확장하는 검색??

2025년 11월 16일

12.[논문리뷰] Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory

경험을 기억해서 더 성장하는 모델이 되자!!

2026년 1월 10일

13.[논문리뷰] Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

장기 메모리에 이미지도 함께 저장해서 활용하기!

2026년 1월 18일

14.[논문리뷰] Masking Strategies for Background Bias Removal in Computer Vision Models

Masking을 통해 CV 모델의 Background Bias를 해결하기!

2026년 2월 10일