[2025/W04] ๐Ÿค— Weekly AI Research

Skyยท2025๋…„ 1์›” 25์ผ

Weekly AI Research Digest

๋ชฉ๋ก ๋ณด๊ธฐ
4/89

2025๋…„ 4์ฃผ์ฐจ์— ๊ณต๊ฐœ๋œ ์ฃผ๋ชฉํ• ๋งŒํ•œ AI ๋ถ„์•ผ์˜ ๋…ผ๋ฌธ๋“ค์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.

์ถ”๋ก  ๋ฐ ์ตœ์ ํ™” ๊ธฐ์ˆ  ๋ถ„์•ผ

Evolving Deeper LLM Thinking

Paper

์ด ์—ฐ๊ตฌ๋Š” ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์˜ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•œ 'Mind Evolution'์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ๋‹จ์ˆœํ•œ ์—ฌ๋Ÿฌ ๋ฒˆ ์‹œ๋„ํ•˜๊ธฐ(Best-of-N)๋‚˜ ์ˆœ์ฐจ์  ์ˆ˜์ •(Sequential Revision) ๋ฐฉ์‹ ๋Œ€์‹ , ์ง„ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฐœ๋…์„ ํ™œ์šฉํ•˜์—ฌ ์–ธ์–ด ๋ชจ๋ธ์ด ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜๊ณ , ์ด๋ฅผ ์กฐํ•ฉํ•˜๊ณ  ๊ฐœ์„ ํ•˜๋Š” ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ์—ฌํ–‰ ๊ณ„ํš์ด๋‚˜ ์ผ๋ฐ˜์ ์ธ ๊ณ„ํš ์ˆ˜๋ฆฝ๊ณผ ๊ฐ™์€ ๋ฌธ์ œ์—์„œ Gemini 1.5 Pro๋ฅผ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ 98% ์ด์ƒ์˜ ๋†’์€ ์„ฑ๊ณต๋ฅ ์„ ๋ณด์—ฌ, ๋ณต์žกํ•œ ๋ฌธ์ œ ํ•ด๊ฒฐ์— ํšจ๊ณผ์ ์ž„์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Paper, Project

Agent-R๋Š” ์–ธ์–ด ๋ชจ๋ธ ๊ธฐ๋ฐ˜ AI ์—์ด์ „ํŠธ๊ฐ€ ์ž์‹ ์˜ ์‹ค์ˆ˜๋ฅผ ์Šค์Šค๋กœ ์ธ์‹ํ•˜๊ณ  ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ์ž๊ฐ€ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ๋ฐฉ์‹์ด ๋‹จ์ˆœํžˆ ์ •๋‹ต๊ณผ ์˜ค๋‹ต์„ ๊ตฌ๋ถ„ํ•˜์—ฌ ๋ณด์ƒํ•˜๋Š” ๊ฒƒ๊ณผ ๋‹ฌ๋ฆฌ, Agent-R๋Š” ๋ชฌํ…Œ์นด๋ฅผ๋กœ ํŠธ๋ฆฌ ํƒ์ƒ‰(MCTS)์„ ํ™œ์šฉํ•˜์—ฌ ์ž˜๋ชป๋œ ๊ฒฝ๋กœ์—์„œ ์˜ฌ๋ฐ”๋ฅธ ๊ฒฝ๋กœ๋กœ ๋ณต๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ์—์ด์ „ํŠธ๊ฐ€ ์‹คํ–‰ ๋„์ค‘์— ์ฆ‰์‹œ ์˜ค๋ฅ˜๋ฅผ ๋ฐœ๊ฒฌํ•˜๊ณ  ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก, ํ˜„์žฌ ์ƒํƒœ์—์„œ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์˜ฌ๋ฐ”๋ฅธ ๊ฒฝ๋กœ๋ฅผ ์ฐพ์•„ ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹์„ ๋„์ž…ํ–ˆ์Šต๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ ์ด ๋ฐฉ์‹์€ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค 5.59% ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Paper, Project

TPO(Test-time Preference Optimization)๋Š” ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์ด ์ถ”๊ฐ€ ํ•™์Šต ์—†์ด๋„ ์‹คํ–‰ ์ค‘์— ์‚ฌ์šฉ์ž์˜ ์„ ํ˜ธ๋„์— ๋งž๊ฒŒ ์ถœ๋ ฅ์„ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋Š” ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ์ˆ˜์น˜์  ๋ณด์ƒ ๋Œ€์‹  ํ…์ŠคํŠธ ํ˜•ํƒœ์˜ ํ”ผ๋“œ๋ฐฑ์„ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ์˜ ์‘๋‹ต์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ์‹์„ ์ฑ„ํƒํ–ˆ์Šต๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ์ง€์‹œ์‚ฌํ•ญ ๋”ฐ๋ฅด๊ธฐ, ์„ ํ˜ธ๋„ ์กฐ์ •, ์•ˆ์ „์„ฑ, ์ˆ˜ํ•™ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ํšจ๊ณผ์ ์ธ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ฃผ์—ˆ์œผ๋ฉฐ, ํŠนํžˆ ๋ช‡ ๋ฒˆ์˜ TPO ๋‹จ๊ณ„๋งŒ์œผ๋กœ๋„ ๊ธฐ์กด์— ์กฐ์ •๋˜์ง€ ์•Š์€ ๋ชจ๋ธ์ด ์กฐ์ •๋œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋›ฐ์–ด๋„˜์„ ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

์‹œ๊ฐ ์ธ๊ณต์ง€๋Šฅ ๋ถ„์•ผ

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper, Project

MMVU๋Š” AI ๋ชจ๋ธ์˜ ์ „๋ฌธ๊ฐ€ ์ˆ˜์ค€ ๋™์˜์ƒ ์ดํ•ด ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ์ž…๋‹ˆ๋‹ค. ๊ณผํ•™, ์˜๋ฃŒ, ์ธ๋ฌธ์‚ฌํšŒ๊ณผํ•™, ๊ณตํ•™ ๋“ฑ 27๊ฐœ ์ „๋ฌธ ๋ถ„์•ผ์— ๊ฑธ์ณ 3,000๊ฐœ์˜ ์ „๋ฌธ๊ฐ€๊ฐ€ ์ž‘์„ฑํ•œ ์งˆ๋ฌธ๋“ค๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ๋ฒค์น˜๋งˆํฌ๋“ค์ด ๋‹จ์ˆœํ•œ ์‹œ๊ฐ์  ์ธ์‹์— ์ดˆ์ ์„ ๋งž์ท„๋˜ ๊ฒƒ๊ณผ ๋‹ฌ๋ฆฌ, MMVU๋Š” ์ „๋ฌธ ๋ถ„์•ผ์˜ ์ง€์‹์„ ์ ์šฉํ•˜๊ณ  ์ „๋ฌธ๊ฐ€ ์ˆ˜์ค€์˜ ์ถ”๋ก ์„ ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค. ์ตœ์‹  AI ๋ชจ๋ธ๋“ค์„ ๋Œ€์ƒ์œผ๋กœ ํ•œ ํ‰๊ฐ€์—์„œ ๊ฐ€์žฅ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ธ ๋ชจ๋ธ๋“ค์กฐ์ฐจ ์ธ๊ฐ„ ์ „๋ฌธ๊ฐ€์˜ ์ˆ˜์ค€์—๋Š” ๋ฏธ์น˜์ง€ ๋ชปํ–ˆ์œผ๋ฉฐ, ์ด๋Š” ์ „๋ฌธ ๋ถ„์•ผ์˜ ๋™์˜์ƒ ์ดํ•ด์— ์žˆ์–ด AI๊ฐ€ ๋” ๋ฐœ์ „ํ•ด์•ผ ํ•  ์—ฌ์ง€๊ฐ€ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper, Project

VideoLLaMA3๋Š” ์ด๋ฏธ์ง€์™€ ๋น„๋””์˜ค๋ฅผ ๋” ์ž˜ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด AI ๋ชจ๋ธ๋กœ, '์‹œ๊ฐ ์ค‘์‹ฌ' ์ ‘๊ทผ๋ฐฉ์‹์„ ํ•ต์‹ฌ์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ๋Œ€๊ทœ๋ชจ ๋น„๋””์˜ค-ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹ ๋Œ€์‹  ๊ณ ํ’ˆ์งˆ์˜ ์ด๋ฏธ์ง€-ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹์„ ์ค‘์‹ฌ์œผ๋กœ ํ›ˆ๋ จ๋˜๋ฉฐ, 4๋‹จ๊ณ„ ํ›ˆ๋ จ ๊ณผ์ •(์‹œ๊ฐ ์ •๋ ฌ, ์‹œ๊ฐ-์–ธ์–ด ์‚ฌ์ „ํ•™์Šต, ๋‹ค์ค‘์ž‘์—… ๋ฏธ์„ธ์กฐ์ •, ๋น„๋””์˜ค ์ค‘์‹ฌ ๋ฏธ์„ธ์กฐ์ •)์„ ๊ฑฐ์นฉ๋‹ˆ๋‹ค. ํŠนํžˆ ์ด๋ฏธ์ง€์˜ ์„ธ๋ฐ€ํ•œ ํŠน์ง•์„ ๋” ์ž˜ ํฌ์ฐฉํ•˜๊ธฐ ์œ„ํ•ด ์ด๋ฏธ์ง€ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๊ฐ€๋ณ€์ ์ธ ์ˆ˜์˜ ์‹œ๊ฐ ํ† ํฐ์„ ์ƒ์„ฑํ•˜๊ณ , ๋น„๋””์˜ค์˜ ๊ฒฝ์šฐ ์œ ์‚ฌํ•œ ํ† ํฐ๋“ค์„ ์ค„์—ฌ์„œ ๋” ์ •ํ™•ํ•˜๊ณ  ์••์ถ•๋œ ํ‘œํ˜„์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์‹œ๊ฐ ์ค‘์‹ฌ ์„ค๊ณ„ ๋•๋ถ„์— ์ด๋ฏธ์ง€์™€ ๋น„๋””์˜ค ์ดํ•ด ๋ฒค์น˜๋งˆํฌ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space

Paper, Project

TokenVerse๋Š” ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€๋งŒ์œผ๋กœ๋„ ๋ณต์žกํ•œ ์‹œ๊ฐ์  ์š”์†Œ์™€ ํŠน์„ฑ์„ ๋ถ„๋ฆฌํ•ด๋‚ด๊ณ , ์—ฌ๋Ÿฌ ์ด๋ฏธ์ง€์—์„œ ์ถ”์ถœํ•œ ๊ฐœ๋…๋“ค์„ ์ž์œ ๋กญ๊ฒŒ ์กฐํ•ฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ํ…์ŠคํŠธ-์ด๋ฏธ์ง€ ๋ณ€ํ™˜ ๋ชจ๋ธ์—์„œ ํ…์ŠคํŠธ๊ฐ€ ์ฃผ์˜(attention)์™€ ๋ณ€์กฐ(modulation)๋ฅผ ํ†ตํ•ด ์ด๋ฏธ์ง€ ์ƒ์„ฑ์— ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค๋Š” ์ ์— ์ฐฉ์•ˆํ•˜์—ฌ, ๋ณ€์กฐ ๊ณต๊ฐ„์—์„œ ๊ฐ ๋‹จ์–ด์— ํ•ด๋‹นํ•˜๋Š” ๊ณ ์œ ํ•œ ๋ฐฉํ–ฅ์„ ์ฐพ์•„๋‚ด๋Š” ์ตœ์ ํ™” ๊ธฐ๋ฐ˜ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ฐœ๋ฐœํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ฌผ์ฒด, ์•ก์„ธ์„œ๋ฆฌ, ์žฌ์งˆ, ํฌ์ฆˆ, ์กฐ๋ช… ๋“ฑ ๋‹ค์–‘ํ•œ ๊ฐœ๋…๋“ค์„ ์›ํ•˜๋Š” ๋Œ€๋กœ ์กฐํ•ฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค ๋” ํšจ๊ณผ์ ์ธ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

GameFactory: Creating New Games with Generative Interactive Videos

Paper, Project

GameFactory๋Š” ์ƒˆ๋กœ์šด ๊ฒŒ์ž„ ์ฝ˜ํ…์ธ ๋ฅผ ์ž๋™์œผ๋กœ ์ƒ์„ฑํ•˜๋Š” AI ๊ธฐ๋ฐ˜ ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ๊ฒŒ์ž„ ์ƒ์„ฑ ๋ฐฉ์‹๊ณผ ๋‹ฌ๋ฆฌ, ์‚ฌ์ „ ํ•™์Šต๋œ ๋น„๋””์˜ค ํ™•์‚ฐ ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์Šคํƒ€์ผ๊ณผ ์žฅ๋ฉด์˜ ๊ฒŒ์ž„์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ๊ฒŒ์ž„ ์Šคํƒ€์ผ ํ•™์Šต๊ณผ ๋™์ž‘ ์ œ์–ด๋ฅผ ๋ถ„๋ฆฌํ•˜๋Š” ๋‹ค๋‹จ๊ณ„ ํ•™์Šต ์ „๋žต์„ ๋„์ž…ํ•˜์—ฌ, ์˜คํ”ˆ ๋„๋ฉ”์ธ์˜ ๋‹ค์–‘์„ฑ์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ์‚ฌ์šฉ์ž์˜ ์•ก์…˜์— ๋ฐ˜์‘ํ•˜๋Š” ์ƒํ˜ธ์ž‘์šฉ์ด ๊ฐ€๋Šฅํ•œ ๊ฒŒ์ž„ ์˜์ƒ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งˆ์ธํฌ๋ž˜ํ”„ํŠธ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๋ฐ์ดํ„ฐ์…‹(GF-Minecraft)์„ ํ†ตํ•ด ๋ฌดํ•œํ•œ ๊ธธ์ด์˜ ๊ฒŒ์ž„ ์˜์ƒ ์ƒ์„ฑ์ด ๊ฐ€๋Šฅํ•จ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์ค‘ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ ๋ถ„์•ผ

FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces

Paper, Project

FilmAgent๋Š” ๊ฐ€์ƒ 3D ๊ณต๊ฐ„์—์„œ ์˜ํ™” ์ œ์ž‘์„ ์ž๋™ํ™”ํ•˜๋Š” AI ๊ธฐ๋ฐ˜ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. ๊ฐ๋…, ์‹œ๋‚˜๋ฆฌ์˜ค ์ž‘๊ฐ€, ๋ฐฐ์šฐ, ์ดฌ์˜๊ฐ๋… ๋“ฑ ๋‹ค์–‘ํ•œ ์˜ํ™” ์ œ์ž‘์ง„์˜ ์—ญํ• ์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๋ฉฐ, ์•„์ด๋””์–ด ๊ฐœ๋ฐœ๋ถ€ํ„ฐ ๋Œ€๋ณธ ์ž‘์„ฑ, ์ดฌ์˜๊นŒ์ง€ ์˜ํ™” ์ œ์ž‘์˜ ์ „ ๊ณผ์ •์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ AI ์—์ด์ „ํŠธ๋“ค์ด ์„œ๋กœ ํ”ผ๋“œ๋ฐฑ์„ ์ฃผ๊ณ ๋ฐ›์œผ๋ฉฐ ํ˜‘์—…ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ, ์ค‘๊ฐ„ ๋‹จ๊ณ„์˜ ๋Œ€๋ณธ์„ ๊ฒ€์ฆํ•˜๊ณ  ์˜ค๋ฅ˜๋ฅผ ์ค„์ž…๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, FilmAgent๋Š” ๋‹จ์ผ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์œผ๋ฉฐ, ์ธ๊ฐ„ ํ‰๊ฐ€์—์„œ๋„ 5์  ๋งŒ์ ์— ํ‰๊ท  3.98์ ์„ ๋ฐ›์•„ ์˜ํ™” ์ œ์ž‘์—์„œ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ํ˜‘์—…์˜ ๊ฐ€๋Šฅ์„ฑ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

Paper, Project

SRMT(Shared Recurrent Memory Transformer)๋Š” ๋‹ค์ค‘ ์—์ด์ „ํŠธ ๊ฐ•ํ™”ํ•™์Šต(MARL)์—์„œ ์—์ด์ „ํŠธ๋“ค ๊ฐ„์˜ ํ˜‘๋ ฅ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ๋ฐฉ์‹๊ณผ ๋‹ฌ๋ฆฌ, ๊ฐ ์—์ด์ „ํŠธ์˜ ์ž‘์—… ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ†ตํ•ฉํ•˜๊ณ  ์ „์ฒด์ ์œผ๋กœ ๊ณต์œ ํ•จ์œผ๋กœ์จ ์—์ด์ „ํŠธ๋“ค์ด ์•”๋ฌต์ ์œผ๋กœ ์ •๋ณด๋ฅผ ๊ตํ™˜ํ•˜๊ณ  ํ–‰๋™์„ ์กฐ์œจํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ข์€ ํ†ต๋กœ๋ฅผ ํ†ต๊ณผํ•ด์•ผ ํ•˜๋Š” ๋ณ‘๋ชฉ ๋„ค๋น„๊ฒŒ์ด์…˜ ๊ณผ์ œ์™€ POGEMA ๋ฒค์น˜๋งˆํฌ์—์„œ์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ, SRMT๋Š” ๊ธฐ์กด์˜ ๊ฐ•ํ™”ํ•™์Šต ๋ฐฉ์‹๋“ค๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์œผ๋ฉฐ, ํŠนํžˆ ํ•™์Šต ๋•Œ ๋ณด์ง€ ๋ชปํ•œ ๋” ๊ธด ํ†ต๋กœ์—์„œ๋„ ํšจ๊ณผ์ ์œผ๋กœ ์ž‘๋™ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๊ณต์œ  ์ˆœํ™˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ transformer ๊ธฐ๋ฐ˜ ๊ตฌ์กฐ์— ํ†ตํ•ฉํ•˜๋Š” ๊ฒƒ์ด ๋ถ„์‚ฐ๋œ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์˜ ํ˜‘๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Paper, Project

UI-TARS๋Š” ํ™”๋ฉด ์บก์ฒ˜๋งŒ์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ํ‚ค๋ณด๋“œ์™€ ๋งˆ์šฐ์Šค ์กฐ์ž‘ ๊ฐ™์€ ์ธ๊ฐ„๋‹ค์šด ์ƒํ˜ธ์ž‘์šฉ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์ƒˆ๋กœ์šด GUI ์—์ด์ „ํŠธ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. GPT-4 ๊ฐ™์€ ์ƒ์—…์šฉ ๋ชจ๋ธ์— ์˜์กดํ•˜๋Š” ๊ธฐ์กด ๋ฐฉ์‹๊ณผ ๋‹ฌ๋ฆฌ, UI-TARS๋Š” ๋…์ž์ ์ธ end-to-end ๋ชจ๋ธ๋กœ์„œ ๋” ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด๋Š” ๋„ค ๊ฐ€์ง€ ํ•ต์‹ฌ ํ˜์‹ ์— ๊ธฐ๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค: 1) ๋Œ€๊ทœ๋ชจ GUI ์Šคํฌ๋ฆฐ์ƒท ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ํ–ฅ์ƒ๋œ ์ธ์‹ ๋Šฅ๋ ฅ, 2) ๋‹ค์–‘ํ•œ ํ”Œ๋žซํผ์—์„œ์˜ ํ–‰๋™์„ ํ†ตํ•ฉ์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๋Š” ๋ฐฉ์‹, 3) ์ž‘์—… ๋ถ„ํ•ด์™€ ๋ฐ˜์„ฑ์  ์‚ฌ๊ณ ๋ฅผ ํฌํ•จํ•˜๋Š” ์ฒด๊ณ„์  ์ถ”๋ก  ๋Šฅ๋ ฅ, 4) ์ˆ˜๋ฐฑ ๋Œ€์˜ ๊ฐ€์ƒ ๋จธ์‹ ์—์„œ ์ž๋™์œผ๋กœ ์ƒํ˜ธ์ž‘์šฉ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  ๊ฐœ์„ ํ•˜๋Š” ๋ฐ˜๋ณต ํ•™์Šต ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ AndroidWorld์™€ OSWorld ๋“ฑ ์—ฌ๋Ÿฌ ๋ฒค์น˜๋งˆํฌ์—์„œ ๊ธฐ์กด ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๋›ฐ์–ด๋„˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ฐ•ํ™”ํ•™์Šต ๋ถ„์•ผ

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper, Project

DeepSeek-R1์€ ๋Œ€๊ทœ๋ชจ ๊ฐ•ํ™”ํ•™์Šต์„ ํ†ตํ•ด ๊ฐœ๋ฐœ๋œ ์ƒˆ๋กœ์šด ์–ธ์–ด ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๋จผ์ € ๊ฐœ๋ฐœ๋œ DeepSeek-R1-Zero๋Š” ์ง€๋„ํ•™์Šต ์—†์ด ์ˆœ์ˆ˜ํ•˜๊ฒŒ ๊ฐ•ํ™”ํ•™์Šต๋งŒ์œผ๋กœ ํ›ˆ๋ จ๋˜์–ด ๋›ฐ์–ด๋‚œ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์ง€๋งŒ, ๊ฐ€๋…์„ฑ์ด ๋–จ์–ด์ง€๊ณ  ์–ธ์–ด๊ฐ€ ํ˜ผํ•ฉ๋˜๋Š” ๋“ฑ์˜ ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ฐœ๋ฐœ๋œ DeepSeek-R1์€ ๋‹ค๋‹จ๊ณ„ ํ›ˆ๋ จ๊ณผ ์‚ฌ์ „ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ OpenAI์˜ ์ตœ์‹  ๋ชจ๋ธ๊ณผ ๋น„์Šทํ•œ ์ˆ˜์ค€์˜ ์ถ”๋ก  ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ์—ฐ๊ตฌ ์ปค๋ฎค๋‹ˆํ‹ฐ๋ฅผ ์œ„ํ•ด ๋‘ ๋ชจ๋ธ๊ณผ ํ•จ๊ป˜, ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งŒ๋“  6๊ฐœ์˜ ๋‹ค์–‘ํ•œ ํฌ๊ธฐ(1.5B~70B)์˜ ๊ฒฝ๋Ÿ‰ํ™” ๋ชจ๋ธ๋“ค๋„ ์˜คํ”ˆ์†Œ์Šค๋กœ ๊ณต๊ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper, Project

Kimi k1.5๋Š” ๊ฐ•ํ™”ํ•™์Šต(RL)์„ ํ™œ์šฉํ•œ ์ƒˆ๋กœ์šด ๋‹ค์ค‘ ๋ชจ๋‹ฌ ์–ธ์–ด ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ์–ธ์–ด ๋ชจ๋ธ๋“ค์ด ๋‹ค์Œ ํ† ํฐ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๊ฒƒ๊ณผ ๋‹ฌ๋ฆฌ, Kimi k1.5๋Š” ๊ฐ•ํ™”ํ•™์Šต์„ ํ†ตํ•ด ๋ณด์ƒ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์Šค์Šค๋กœ ํƒ์ƒ‰ํ•˜๊ณ  ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ ๊ธด ๋งฅ๋ฝ ์ฒ˜๋ฆฌ ๋Šฅ๋ ฅ๊ณผ ๊ฐœ์„ ๋œ ์ •์ฑ… ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์„ ํ•ต์‹ฌ์œผ๋กœ ํ•˜๋ฉฐ, ๋ณต์žกํ•œ ๊ธฐ์ˆ  ์—†์ด๋„ ๊ฐ„๋‹จํ•˜๊ณ  ํšจ๊ณผ์ ์ธ ๊ฐ•ํ™”ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ตฌ์ถ•ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ ์ˆ˜ํ•™(AIME, MATH 500), ์ฝ”๋”ฉ(Codeforces), ์‹œ๊ฐ์  ์ˆ˜ํ•™ ๋ฌธ์ œ(MathVista) ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ์ตœ๊ณ  ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ์œผ๋ฉฐ, ํŠนํžˆ ์งง์€ ์‚ฌ๊ณ  ๊ณผ์ •(short-CoT)์—์„œ GPT-4๋‚˜ Claude๋ณด๋‹ค ์ตœ๋Œ€ 550% ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ ๊ตฌ์กฐ ์ตœ์ ํ™” ๋ถ„์•ผ

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper

์ด ์—ฐ๊ตฌ๋Š” Mixture-of-Experts(MoE) ๋ชจ๋ธ์˜ ๋ถ€ํ•˜ ๊ท ํ˜• ์†์‹ค(Load-balancing Loss, LBL) ๊ตฌํ˜„ ๋ฐฉ์‹์„ ๊ฐœ์„ ํ•œ ๋‚ด์šฉ์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ๋ฐฉ์‹์€ ์ž‘์€ ๋ฐฐ์น˜(micro-batch) ๋‹จ์œ„๋กœ LBL์„ ๊ณ„์‚ฐํ•˜๋‹ค ๋ณด๋‹ˆ, ๊ฐ ์‹œํ€€์Šค ๋‚ด์—์„œ ํ† ํฐ๋“ค์„ ๋ชจ๋“  ์ „๋ฌธ๊ฐ€์—๊ฒŒ ๊ท ๋“ฑํ•˜๊ฒŒ ๋ถ„๋ฐฐํ•˜๋ ค๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์—ˆ๊ณ , ์ด๋Š” ์ „๋ฌธ๊ฐ€์˜ ํŠนํ™”๋ฅผ ๋ฐฉํ•ดํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์—ฐ๊ตฌ์ง„์€ ๋” ํฐ ์ „์ฒด ๋ฐฐ์น˜(global-batch) ๋‹จ์œ„๋กœ LBL์„ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ์‹์„ ์ œ์•ˆํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฐฉ์‹์€ ๋” ๋‹ค์–‘ํ•œ ์‹œํ€€์Šค๋ฅผ ํฌํ•จํ•˜๋ฏ€๋กœ ์ฝ”ํผ์Šค ์ˆ˜์ค€์—์„œ์˜ ๋ถ€ํ•˜ ๊ท ํ˜•์„ ๋‹ฌ์„ฑํ•˜๋ฉด์„œ๋„ ๊ฐ ์ „๋ฌธ๊ฐ€์˜ ๋„๋ฉ”์ธ ํŠนํ™”๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ๊ณ , ์‹คํ—˜ ๊ฒฐ๊ณผ ์‚ฌ์ „ ํ•™์Šต ์„ฑ๋Šฅ๊ณผ ๋‹ค์šด์ŠคํŠธ๋ฆผ ํƒœ์Šคํฌ ๋ชจ๋‘์—์„œ ํ–ฅ์ƒ๋œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

profile
XR๊ณผ AI์— ๊ด€์‹ฌ์ด ๋งŽ์€ Sky ์ž…๋‹ˆ๋‹ค.

0๊ฐœ์˜ ๋Œ“๊ธ€