[2025/W41] ๐Ÿค— Weekly AI Research

Skyยท2025๋…„ 10์›” 10์ผ

Weekly AI Research Digest

๋ชฉ๋ก ๋ณด๊ธฐ
67/89

์†Œํ˜• ๋ชจ๋ธ๊ณผ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ ์ตœ์ ํ™”๋ฅผ ํ†ตํ•œ AI ํšจ์œจ์„ฑ ๊ทน๋Œ€ํ™”
๊ฒฝํ—˜ ๊ธฐ๋ฐ˜ ํ•™์Šต๊ณผ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ถ”๋ก ์„ ํ†ตํ•œ ๊ธฐ์กด ํ•œ๊ณ„์˜ ๊ทน๋ณต

Less is More: Recursive Reasoning with Tiny Networks

Paper
์ด ๋…ผ๋ฌธ์€ ๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ(LLM) ๋Œ€์‹  ๋งค์šฐ ์ž‘์€ ๊ทœ๋ชจ์˜ ์‹ ๊ฒฝ๋ง์„ ์žฌ๊ท€์ ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ๋ณต์žกํ•œ ์ถ”๋ก  ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ์ƒˆ๋กœ์šด ์ ‘๊ทผ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. TRM(Tiny Recursive Model)์ด๋ผ ๋ถˆ๋ฆฌ๋Š” ์ด ๋ชจ๋ธ์€ ๋‹จ 2๊ฐœ์˜ ๋ ˆ์ด์–ด์™€ 700๋งŒ ๊ฐœ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋งŒ์œผ๋กœ, ์Šค๋„์ฟ ๋‚˜ ARC-AGI ๊ฐ™์€ ์–ด๋ ค์šด ํผ์ฆ์—์„œ ๋Œ€๋ถ€๋ถ„์˜ LLM๋ณด๋‹ค ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค. ์ด๋Š” LLM ๋งค๊ฐœ๋ณ€์ˆ˜์˜ 0.01%๋„ ์•ˆ ๋˜๋Š” ํฌ๊ธฐ๋กœ ๋” ๋†’์€ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•œ ๊ฒƒ์œผ๋กœ, ์ ์€ ์ž์›์œผ๋กœ๋„ ๊ณ ๋„์˜ ์ถ”๋ก ์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์ ์„ ์‹œ์‚ฌํ•œ๋‹ค.

Apriel-1.5-15b-Thinker

Paper, Project
์ด ๋…ผ๋ฌธ์€ ๋ชจ๋ธ์˜ ๊ทœ๋ชจ๋ฅผ ๋ฌด์ž‘์ • ํ‚ค์šฐ๊ธฐ๋ณด๋‹ค ํšจ์œจ์ ์ธ ํ›ˆ๋ จ ์„ค๊ณ„๋ฅผ ํ†ตํ•ด ์ตœ๊ณ  ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์— ๋„๋‹ฌํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค. Apriel-1.5-15B-Thinker๋Š” 150์–ต ๊ฐœ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ฐ€์ง„ ์˜คํ”ˆ์†Œ์Šค ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ(์ด๋ฏธ์ง€+ํ…์ŠคํŠธ) ๋ชจ๋ธ๋กœ, 3๋‹จ๊ณ„์˜ ์ ์ง„์  ํ›ˆ๋ จ ๋ฐฉ๋ฒ•๋ก ์„ ํ†ตํ•ด ๊ฐœ๋ฐœ๋˜์—ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, ํ›จ์”ฌ ์ ์€ ์ปดํ“จํŒ… ์ž์›์„ ์‚ฌ์šฉํ•˜๋ฉด์„œ๋„ DeepSeek-R1๊ณผ ๊ฐ™์€ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ๊ณผ ๋™๋“ฑํ•œ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ์œผ๋ฉฐ, ๋‹จ์ผ GPU ํ™˜๊ฒฝ์—์„œ๋„ ๋ฐฐํฌ๊ฐ€ ๊ฐ€๋Šฅํ•  ์ •๋„๋กœ ํšจ์œจ์ ์ด๋‹ค. ์ด๋Š” ํ˜„๋ช…ํ•œ ๋ฐ์ดํ„ฐ ์ค‘์‹ฌ ํ›ˆ๋ จ ๋ฐฉ์‹์ด ๋ง‰๋Œ€ํ•œ ๊ทœ๋ชจ์˜ ์žฅ๋ฒฝ์„ ๋„˜์–ด์„ค ์ˆ˜ ์žˆ์Œ์„ ์ฆ๋ช…ํ•œ๋‹ค.

Agent Learning via Early Experience

Paper
์–ธ์–ด ์—์ด์ „ํŠธ๊ฐ€ ์Šค์Šค๋กœ์˜ ๊ฒฝํ—˜์„ ํ†ตํ•ด ํ•™์Šตํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์ค‘์š”ํ•˜์ง€๋งŒ, ๋ช…ํ™•ํ•œ ๋ณด์ƒ์ด ์—†๊ฑฐ๋‚˜ ๋น„ํšจ์œจ์ ์ธ ๊ฐ•ํ™”ํ•™์Šต(RL) ๋ฐฉ์‹์€ ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค. ์ด ๋…ผ๋ฌธ์€ ๊ทธ ๋Œ€์•ˆ์œผ๋กœ ์ดˆ๊ธฐ ๊ฒฝํ—˜(early experience)์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์•ˆํ•œ๋‹ค. ์ด๋Š” ์—์ด์ „ํŠธ๊ฐ€ ๋ณด์ƒ ์‹ ํ˜ธ ์—†์ด ์Šค์Šค๋กœ์˜ ํ–‰๋™์œผ๋กœ ์ƒ์„ฑํ•œ ์ƒํ˜ธ์ž‘์šฉ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต์— ํ™œ์šฉํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค. ์ด ์ ‘๊ทผ๋ฒ•์€ ์—์ด์ „ํŠธ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ๊ณผ ํšจ์œจ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ค๋ฉฐ, ๋‹จ์ˆœํ•œ ๋ชจ๋ฐฉ ํ•™์Šต๊ณผ ์™„์ „ํ•œ ๊ฒฝํ—˜ ๊ธฐ๋ฐ˜ ๊ฐ•ํ™”ํ•™์Šต ์‚ฌ์ด๋ฅผ ์ž‡๋Š” ์‹ค์šฉ์ ์ธ ๋‹ค๋ฆฌ ์—ญํ• ์„ ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper, Project
๊ธฐ์กด์˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ LLM(MLLM)๋“ค์€ ๋ณต์žกํ•œ ์‹ค์ œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ์žฅ๊ธฐ ์—ฐ์‡„ ์„ฑ์ฐฐ์  ์ถ”๋ก (iterative thinking and backtracking) ๋Šฅ๋ ฅ์ด ๋ถ€์กฑํ•˜๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ๋จผ์ € ์ด๋Ÿฌํ•œ ๋Šฅ๋ ฅ์„ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•œ MM-HELIX ๋ฒค์น˜๋งˆํฌ๋ฅผ ๊ตฌ์ถ•ํ•˜์—ฌ MLLM์˜ ํ•œ๊ณ„๋ฅผ ์‹คํ—˜์ ์œผ๋กœ ์ฆ๋ช…ํ–ˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, 10๋งŒ ๊ฐœ์˜ ๊ณ ํ’ˆ์งˆ ์ถ”๋ก  ๋ฐ์ดํ„ฐ์…‹์„ ์ƒ์„ฑํ•˜๊ณ , ์˜คํ”„๋ผ์ธ ์ง€๋„ ํ•™์Šต๊ณผ ์˜จ๋ผ์ธ ์ตœ์ ํ™”๋ฅผ ๋™์ ์œผ๋กœ ๊ฒฐํ•ฉํ•˜๋Š” ์ƒˆ๋กœ์šด ํ›ˆ๋ จ ์ „๋žต AHPO(Adaptive Hybrid Policy Optimization)๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, MLLM์˜ ์„ฑ์ฐฐ์  ์ถ”๋ก  ๋Šฅ๋ ฅ์ด ํฌ๊ฒŒ ํ–ฅ์ƒ๋˜์–ด ๋” ์œ ๋Šฅํ•œ MLLM ๊ฐœ๋ฐœ์˜ ๊ธธ์„ ์—ด์—ˆ๋‹ค.

Paper2Video: Automatic Video Generation from Scientific Papers

Paper, Project
ํ•™์ˆ  ๋ฐœํ‘œ ์˜์ƒ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์€ ๋งค์šฐ ๋…ธ๋™ ์ง‘์•ฝ์ ์ธ ์ž‘์—…์ด๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ๋…ผ๋ฌธ๋งŒ์œผ๋กœ ๋ฐœํ‘œ ์˜์ƒ์„ ์ž๋™์œผ๋กœ ์ƒ์„ฑํ•˜๋Š” ์ตœ์ดˆ์˜ ํ”„๋ ˆ์ž„์›Œํฌ์ธ PaperTalker๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋…ผ๋ฌธ๊ณผ ์ €์ž๊ฐ€ ์ง์ ‘ ๋งŒ๋“  ๋ฐœํ‘œ ์˜์ƒ, ์Šฌ๋ผ์ด๋“œ ๋“ฑ์œผ๋กœ ๊ตฌ์„ฑ๋œ Paper2Video ๋ฒค์น˜๋งˆํฌ๋ฅผ ๊ตฌ์ถ•ํ–ˆ๋‹ค. PaperTalker๋Š” ์Šฌ๋ผ์ด๋“œ ์ƒ์„ฑ, ๋ ˆ์ด์•„์›ƒ ์ตœ์ ํ™”, ์ปค์„œ ์›€์ง์ž„, ์ž๋ง‰, ์Œ์„ฑ ํ•ฉ์„ฑ, ๋ฐœํ‘œ์ž ์˜์ƒ ๋ Œ๋”๋ง๊นŒ์ง€ ์ „ ๊ณผ์ •์„ ์ž๋™ํ™”ํ•˜๋Š” ๋ฉ€ํ‹ฐ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์ด๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ๊ธฐ์กด ๋ฐฉ์‹๋ณด๋‹ค ๋…ผ๋ฌธ์˜ ์ •๋ณด๋ฅผ ๋” ์ถฉ์‹คํ•˜๊ณ  ์œ ์ตํ•˜๊ฒŒ ์ „๋‹ฌํ•˜๋Š” ์˜์ƒ์„ ์ƒ์„ฑํ•˜์—ฌ ํ•™์ˆ  ์˜์ƒ ์ œ์ž‘ ์ž๋™ํ™”์˜ ์‹ค์šฉ์ ์ธ ์ฒซ๊ฑธ์Œ์„ ๋‚ด๋””๋Ž ๋‹ค.

Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Paper, Project
์—ฌ๋Ÿฌ LLM์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋Š” ์‹œ์Šคํ…œ์—์„œ ๋ชจ๋ธ๋“ค์€ ์ฃผ๋กœ ํ…์ŠคํŠธ๋ฅผ ํ†ตํ•ด ์†Œํ†ตํ•˜๋Š”๋ฐ, ์ด๋Š” ์†๋„๊ฐ€ ๋А๋ฆฌ๊ณ  ๋ชจ๋ธ ๋‚ด๋ถ€์˜ ํ’๋ถ€ํ•œ ์˜๋ฏธ ์ •๋ณด๋ฅผ ์†์‹ค์‹œํ‚ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. ์ด ๋…ผ๋ฌธ์€ ํ…์ŠคํŠธ๋ฅผ ๊ฑฐ์น˜์ง€ ์•Š๊ณ  LLM ๊ฐ„์— ์ง์ ‘ ์˜๋ฏธ๋ฅผ ์ „๋‹ฌํ•˜๋Š” ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„ C2C(Cache-to-Cache)๋ฅผ ์ œ์•ˆํ•œ๋‹ค. C2C๋Š” ํ•œ ๋ชจ๋ธ์˜ KV-Cache(๋‚ด๋ถ€ ์ƒํƒœ ์ •๋ณด)๋ฅผ ๋‹ค๋ฅธ ๋ชจ๋ธ์˜ KV-Cache๋กœ ์ง์ ‘ ํˆฌ์‚ฌํ•˜๊ณ  ์œตํ•ฉํ•˜์—ฌ ์˜๋ฏธ๋ฅผ ์ „๋‹ฌํ•œ๋‹ค. ์ด ๋ฐฉ์‹์€ ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜ ์†Œํ†ต๋ณด๋‹ค ํ‰๊ท  ์ •ํ™•๋„๋Š” 8.5~10.5% ๋” ๋†’๊ณ , ์†๋„๋Š” 2๋ฐฐ ๋” ๋นจ๋ผ ํ›จ์”ฌ ํšจ์œจ์ ์ธ LLM ํ˜‘์—…์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค.

Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer

Paper, Project
ํ•˜๋‚˜์˜ ๋ชจ๋ธ๋กœ ์ด๋ฏธ์ง€ ์ดํ•ด์™€ ์ƒ์„ฑ์„ ๋ชจ๋‘ ์ž˜ํ•˜๊ธฐ๋Š” ์–ด๋ ต๋‹ค. ๋‘ ์ž‘์—…์ด ์‹œ๊ฐ ์ •๋ณด๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ์‹(ํ† ํฌ๋‚˜์ด์ €)์— ๋Œ€ํ•œ ์š”๊ตฌ์‚ฌํ•ญ์ด ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ์—ฐ์†์ ์ธ ์ž ์žฌ ๊ณต๊ฐ„์„ ๊ฐ€์ง„ ์ƒˆ๋กœ์šด ์‹œ๊ฐ ํ† ํฌ๋‚˜์ด์ € MingTok์„ ์ œ์•ˆํ•˜์—ฌ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค. ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์ถ•๋œ Ming-UniVision ๋ชจ๋ธ์€ ์ดํ•ด์™€ ์ƒ์„ฑ์„ ํฌํ•จํ•œ ๋ชจ๋“  ์‹œ๊ฐ-์–ธ์–ด ์ž‘์—…์„ ๋‹จ์ผํ•œ ์˜ˆ์ธก ํŒจ๋Ÿฌ๋‹ค์ž„์œผ๋กœ ํ†ตํ•ฉํ•œ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, ๋ณ„๋„์˜ ์ž‘์—…๋ณ„ ํ‘œํ˜„ ์—†์ด๋„ ์ด๋ฏธ์ง€ ์ดํ•ด์™€ ์ƒ์„ฑ ์–‘์ชฝ ๋ถ„์•ผ ๋ชจ๋‘์—์„œ ์ตœ๊ณ  ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๋ฉฐ, ํ†ตํ•ฉ๋œ ์‹œ๊ฐ ํ† ํฌ๋‚˜์ด์ €์˜ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค€๋‹ค.

In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper, Project
๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์„ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๊ฒƒ์€ ๋ณต์žกํ•˜๊ณ  ๋น„ํšจ์œจ์ ์ผ ์ˆ˜ ์žˆ๋‹ค. ์ด ๋…ผ๋ฌธ์€ AgentFlow๋ผ๋Š” ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ด๋Š” ๊ณ„ํš, ์‹คํ–‰, ๊ฒ€์ฆ, ์ƒ์„ฑ์˜ 4๊ฐœ ๋ชจ๋“ˆ๋กœ ๊ตฌ์„ฑ๋œ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์œผ๋กœ, ์‹ค์‹œ๊ฐ„ ์ƒํ˜ธ์ž‘์šฉ ๋ฃจํ”„ ์•ˆ์—์„œ ์ง์ ‘ ํ”Œ๋ž˜๋„ˆ(planner)๋ฅผ ์ตœ์ ํ™”ํ•œ๋‹ค. Flow-GRPO๋ผ๋Š” ์ƒˆ๋กœ์šด ์ •์ฑ… ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ํ†ตํ•ด, ์ตœ์ข… ๊ฒฐ๊ณผ(์„ฑ๊ณต/์‹คํŒจ)๋ฅผ ๊ฐ ๋‹จ๊ณ„์˜ ๊ฒฐ์ •์— ํšจ๊ณผ์ ์œผ๋กœ ๋ฐ˜์˜ํ•˜์—ฌ ํ•™์Šต์„ ์•ˆ์ •ํ™”ํ•œ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, 70์–ต ๊ฐœ ๋งค๊ฐœ๋ณ€์ˆ˜ ๋ชจ๋ธ๋กœ๋„ GPT-4o์™€ ๊ฐ™์€ ๋” ํฐ ์ƒ์šฉ ๋ชจ๋ธ์„ ๋Šฅ๊ฐ€ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์ด๋ฉฐ, ํšจ์œจ์ ์ธ ์—์ด์ „ํŠธ ํ›ˆ๋ จ ๋ฐฉ์‹์„ ์ œ์‹œํ•œ๋‹ค.

TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning

Paper
ํ”„๋กœ์„ธ์Šค ๋ณด์ƒ ๋ชจ๋ธ(PRM)์€ ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐ ํšจ๊ณผ์ ์ด์ง€๋งŒ, ํ‘œ(table) ํ˜•์‹์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฐ๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ํ‘œ ์ถ”๋ก ์— ํŠนํ™”๋œ ์ƒˆ๋กœ์šด PRM ํ”„๋ ˆ์ž„์›Œํฌ TaTToo๋ฅผ ์ œ์•ˆํ•œ๋‹ค. TaTToo๋Š” ๋„๊ตฌ ๊ธฐ๋ฐ˜ ๊ฒ€์ฆ์„ ํ†ตํ•ฉํ•˜์—ฌ ํ‘œ์™€ ๊ด€๋ จ๋œ ๊ฐ ์ถ”๋ก  ๋‹จ๊ณ„์— ๋Œ€ํ•ด ์ •ํ™•ํ•œ ๋ณด์ƒ ์‹ ํ˜ธ๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด 6๋งŒ ๊ฐœ ์ด์ƒ์˜ ๊ณ ํ’ˆ์งˆ ๋‹จ๊ณ„๋ณ„ ์ฃผ์„ ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌ์ถ•ํ•˜๊ณ , ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ–ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, ํ‘œ๋ฅผ ๋‹ค๋ฃจ๋Š” ์ถ”๋ก  ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ 30.9% ํ–ฅ์ƒ์‹œํ‚ค๊ณ , ํ›จ์”ฌ ํฐ PRM ๋ชจ๋ธ๋ณด๋‹ค ์ ์€ ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ๋” ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค.

Large Reasoning Models Learn Better Alignment from Flawed Thinking

Paper
๋Œ€๊ทœ๋ชจ ์ถ”๋ก  ๋ชจ๋ธ(LRM)์€ ์ž˜๋ชป๋œ ์ „์ œ๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์‰ฝ๊ฒŒ ํŽธํ–ฅ๋˜๊ณ  ์•ˆ์ „์„ฑ์— ์ทจ์•ฝํ•ด์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ์ด ๋…ผ๋ฌธ์€ RECAP์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ๊ฐ•ํ™”ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๋ชจ๋ธ์—๊ฒŒ ์˜๋„์ ์œผ๋กœ ๊ฒฐํ•จ์ด ์žˆ๋Š” ์ถ”๋ก  ๊ณผ์ •(CoT)์„ ์ œ์‹œํ•˜๊ณ , ์ด๋ฅผ ์Šค์Šค๋กœ ์ธ์‹ํ•˜์—ฌ ๊ทน๋ณตํ•˜๊ณ  ์•ˆ์ „ํ•œ ๊ฒฐ๋ก ์œผ๋กœ ๊ฒฝ๋กœ๋ฅผ ์žฌ์„ค์ •ํ•˜๋„๋ก ๋ช…์‹œ์ ์œผ๋กœ ํ›ˆ๋ จ์‹œํ‚จ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, ์ถ”๊ฐ€์ ์ธ ์ถ”๋ก  ๋น„์šฉ ์—†์ด๋„ ๋ชจ๋ธ์˜ ์•ˆ์ „์„ฑ ๋ฐ ํƒˆ์˜ฅ(jailbreak) ๋ฐฉ์–ด ๋Šฅ๋ ฅ์ด ํฌ๊ฒŒ ํ–ฅ์ƒ๋˜์—ˆ์œผ๋ฉฐ, ๊ณผ๋„ํ•œ ๋‹ต๋ณ€ ๊ฑฐ๋ถ€ ๋ฌธ์ œ๋ฅผ ์ค„์ด๊ณ  ํ•ต์‹ฌ ์ถ”๋ก  ๋Šฅ๋ ฅ์€ ์œ ์ง€ํ–ˆ๋‹ค.

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper
LLM ์—์ด์ „ํŠธ์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ์ปจํ…์ŠคํŠธ(์ง€์นจ, ์ „๋žต ๋“ฑ)๋ฅผ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ์‹์€ ์ข…์ข… ์ •๋ณด ์†์‹ค(brevity bias, context collapse) ๋ฌธ์ œ๋ฅผ ๊ฒช๋Š”๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ์ปจํ…์ŠคํŠธ๋ฅผ ๊ณ ์ •๋œ ์ž…๋ ฅ์ด ์•„๋‹Œ, ์ง€์†์ ์œผ๋กœ ์ง„ํ™”ํ•˜๋Š” ํ”Œ๋ ˆ์ด๋ถ(playbook)์œผ๋กœ ์ทจ๊ธ‰ํ•˜๋Š” ACE(Agentic Context Engineering) ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ACE๋Š” ์ƒ์„ฑ, ์„ฑ์ฐฐ, ํ๋ ˆ์ด์…˜์„ ํ†ตํ•ด ์ „๋žต์„ ์ฒด๊ณ„์ ์œผ๋กœ ์ถ•์ ํ•˜๊ณ  ๊ฐœ์„ ํ•˜์—ฌ ์ง€์‹ ์†์‹ค์„ ๋ฐฉ์ง€ํ•œ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, ๋ณ„๋„์˜ ์ •๋‹ต ๋ฐ์ดํ„ฐ ์—†์ด ์‹ค์ œ ์‹คํ–‰ ํ”ผ๋“œ๋ฐฑ๋งŒ์œผ๋กœ๋„ ์Šค์Šค๋กœ ๊ฐœ์„ ํ•˜๋ฉฐ, ๋” ์ž‘์€ ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ์„ ์‚ฌ์šฉํ–ˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์ตœ์ƒ์œ„ ์—์ด์ „ํŠธ์™€ ๋™๋“ฑํ•˜๊ฑฐ๋‚˜ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค.

profile
XR๊ณผ AI์— ๊ด€์‹ฌ์ด ๋งŽ์€ Sky ์ž…๋‹ˆ๋‹ค.

0๊ฐœ์˜ ๋Œ“๊ธ€