higher input image resolution
improved visual instruction tuning data mixture
better visual conversation for more scenarios
Efficient deployment and inference with SGLang(framework)
stronger & larger language models
์๋ก์ด ํ๊ฐ ๋ฐ์ดํฐ์ ์ธ LLaVA-Bench (Wilder)๋ฅผ ์์ง ๋ฐ ๊ฐ๋ฐ
motivation