[2025/W35] ๐Ÿค— Weekly AI Research

Skyยท2025๋…„ 8์›” 29์ผ

Weekly AI Research Digest

๋ชฉ๋ก ๋ณด๊ธฐ
54/89

๊ฐ•ํ™”ํ•™์Šต๊ณผ ์ž๊ธฐ-๋ณด์ƒ ๊ธฐ๋ฐ˜์˜ ์„ฑ๋Šฅ ํ˜์‹ , ์ถ”๋ก  ๋Šฅ๋ ฅ๊ณผ ํšจ์œจ์˜ ๊ทน๋Œ€ํ™”
์—์ด์ „ํŠธ์™€ ๋กœ๋ณดํ‹ฑ์Šค๋ฅผ ๋„˜์–ด์„  ์‘์šฉ ํ™•์žฅ, ์ „๋ฌธ ๊ณผํ•™ ๋ถ„์•ผ ๋‚œ์ œ ํ•ด๊ฒฐ๋กœ์˜ ๋„์•ฝ

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper, Project
InternVL 3.5๋Š” ๋‹ค์žฌ๋‹ค๋Šฅํ•จ, ์ถ”๋ก  ๋Šฅ๋ ฅ, ํšจ์œจ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚จ ์ฐจ์„ธ๋Œ€ ์˜คํ”ˆ์†Œ์Šค ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ์ด๋‹ค. ์ถ”๋ก  ๋Šฅ๋ ฅ ๊ฐ•ํ™”๋ฅผ ์œ„ํ•œ ์บ์Šค์ผ€์ด๋“œ ๊ฐ•ํ™”ํ•™์Šต(Cascade RL)๊ณผ ํšจ์œจ์„ฑ์„ ์œ„ํ•œ ์‹œ๊ฐ ํ•ด์ƒ๋„ ๋ผ์šฐํ„ฐ(ViR) ๋“ฑ ํ˜์‹  ๊ธฐ์ˆ ์„ ํ†ตํ•ด ์ด์ „ ๋ฒ„์ „ ๋Œ€๋น„ ์ถ”๋ก  ์„ฑ๋Šฅ๊ณผ ์†๋„๋ฅผ ๋Œ€ํญ ๊ฐœ์„ ํ–ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด GUI ์ƒํ˜ธ์ž‘์šฉ๊ณผ ๊ฐ™์€ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์„ ์ง€์›ํ•˜๋ฉฐ, ์ตœ์ƒ์œ„ ์ƒ์šฉ ๋ชจ๋ธ๊ณผ์˜ ์„ฑ๋Šฅ ๊ฒฉ์ฐจ๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ์ขํ˜”๋‹ค.

Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

Paper, Project
Beyond Pass@1 ๋…ผ๋ฌธ์€ ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•œ ๋ณด์ƒ ๊ธฐ๋ฐ˜ ๊ฐ•ํ™”ํ•™์Šต(RLVR)์˜ ๊ณ ์งˆ์  ๋ฌธ์ œ์ธ ์ƒ์„ฑ ๋‹ค์–‘์„ฑ ๊ฐ์†Œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณ€ํ˜• ๋ฌธ์ œ ํ•ฉ์„ฑ์„ ํ†ตํ•œ ์ž๊ธฐ-๋Œ€๊ตญ(SvS) ์ „๋žต์„ ์ œ์•ˆํ•œ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๋ชจ๋ธ์ด ์Šค์Šค๋กœ ์ƒ์„ฑํ•œ ์ •๋‹ต์„ ํ™œ์šฉํ•˜์—ฌ ์ •๋‹ต์€ ๊ฐ™์ง€๋งŒ ํ˜•ํƒœ๊ฐ€ ๋‹ค๋ฅธ ์ƒˆ๋กœ์šด ๋ฌธ์ œ๋ฅผ ์˜จ๋ผ์ธ์œผ๋กœ ํ•ฉ์„ฑํ•˜์—ฌ ํ›ˆ๋ จ์— ์‚ฌ์šฉํ•œ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, ๋ชจ๋ธ์ด ๋‹จ์ผ ์ •๋‹ต ๊ฒฝ๋กœ์— ๊ณผ์ ํ•ฉ๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ณ  ์ƒ์„ฑ ๋‹ค์–‘์„ฑ์„ ์œ ์ง€ํ•˜์—ฌ ์—ฌ๋Ÿฌ ๋‹ต์„ ํƒ์ƒ‰ํ•˜๋Š” ๋Šฅ๋ ฅ(Pass@k)์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐ ์„ฑ๊ณตํ–ˆ๋‹ค.

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper, Project
AgentFly๋Š” LLM ์ž์ฒด๋ฅผ ํŒŒ์ธํŠœ๋‹ํ•˜๋Š” ๋ง‰๋Œ€ํ•œ ๋น„์šฉ ์—†์ด๋„ LLM ์—์ด์ „ํŠธ๊ฐ€ ์ง€์†์ ์œผ๋กœ ํ•™์Šตํ•˜๊ณ  ์ ์‘ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์‹œํ•œ๋‹ค. ์ด ๋ชจ๋ธ์€ ๊ณผ๊ฑฐ ๊ฒฝํ—˜์„ ์ €์žฅํ•˜๋Š” ์™ธ๋ถ€ ๋ฉ”๋ชจ๋ฆฌ์™€ ์ด๋ฅผ ํ™œ์šฉํ•˜๋Š” ์‚ฌ๋ก€ ์„ ํƒ ์ •์ฑ…์„ ํ†ตํ•ด ํ•™์Šตํ•˜๋ฉฐ, ํ™˜๊ฒฝ ํ”ผ๋“œ๋ฐฑ์— ๋”ฐ๋ผ LLM์˜ ๊ฐ€์ค‘์น˜๊ฐ€ ์•„๋‹Œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ˆ˜์ •ํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ด ์ ‘๊ทผ๋ฒ•์€ ๊ณ„์‚ฐ์ ์œผ๋กœ ๋งค์šฐ ํšจ์œจ์ ์ด๋ฉด์„œ๋„ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ฉฐ, ํŠนํžˆ ๋ถ„ํฌ๋ฅผ ๋ฒ—์–ด๋‚œ ์ƒˆ๋กœ์šด ์ž‘์—…์— ๋Œ€ํ•œ ๊ฐ•ํ•œ ์ ์‘๋ ฅ์„ ์ž…์ฆํ–ˆ๋‹ค.

VibeVoice Technical Report

Paper, Project
VibeVoice๋Š” ์—ฌ๋Ÿฌ ํ™”์ž๊ฐ€ ์ฐธ์—ฌํ•˜๋Š” ์žฅ์‹œ๊ฐ„์˜ ๋Œ€ํ™” ์Œ์„ฑ์„ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ํ•ฉ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ชจ๋ธ์ด๋‹ค. ์ด ๋ชจ๋ธ์€ ์—ฐ์†์ ์ธ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ํšจ๊ณผ์ ์ธ ๋„ฅ์ŠคํŠธ-ํ† ํฐ ํ™•์‚ฐ(next-token diffusion) ๋ฐฉ์‹๊ณผ, ๊ธฐ์กด๋ณด๋‹ค ๋ฐ์ดํ„ฐ ์••์ถ•๋ฅ ์„ 80๋ฐฐ ํ–ฅ์ƒ์‹œํ‚จ ํ˜์‹ ์ ์ธ ์Œ์„ฑ ํ† ํฌ๋‚˜์ด์ €๋ฅผ ๊ฒฐํ•ฉํ–ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ตœ๋Œ€ 90๋ถ„ ๊ธธ์ด์˜ ๋‹ค์ค‘ ํ™”์ž ์Œ์„ฑ์„ ์‹ค์ œ ๋Œ€ํ™”์™€ ๊ฐ™์€ ๋ถ„์œ„๊ธฐ๋กœ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๊ธฐ์กด ๋ชจ๋ธ๋“ค์„ ๋Šฅ๊ฐ€ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค.

Beyond Transcription: Mechanistic Interpretability in ASR

Paper
Beyond Transcription ๋…ผ๋ฌธ์€ LLM ๋ถ„์•ผ์—์„œ ํ™œ๋ฐœํžˆ ์—ฐ๊ตฌ๋˜๋Š” ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ(interpretability) ๋ฐฉ๋ฒ•๋ก ์„ ์ž๋™ ์Œ์„ฑ ์ธ์‹(ASR) ์‹œ์Šคํ…œ์— ์ ์šฉํ•˜์—ฌ ๋ชจ๋ธ์˜ ๋‚ด๋ถ€ ์ž‘๋™ ์›๋ฆฌ๋ฅผ ๊ทœ๋ช…ํ•œ๋‹ค. ๋กœ์ง“ ๋ Œ์ฆˆ, ํ™œ์„ฑํ™” ํŒจ์นญ๊ณผ ๊ฐ™์€ ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ASR ๋ชจ๋ธ์˜ ๊ณ„์ธต๋ณ„ ์ •๋ณด ์ฒ˜๋ฆฌ ๊ณผ์ •์„ ๋ถ„์„ํ•œ ๊ฒฐ๊ณผ, ๋ฐ˜๋ณต์ ์ธ ์˜ค๋ฅ˜๋ฅผ ์œ ๋ฐœํ•˜๋Š” ํŠน์ • ์ธ์ฝ”๋”-๋””์ฝ”๋” ์ƒํ˜ธ์ž‘์šฉ์„ ์ฐพ์•„๋‚ด๊ณ  ์Œํ–ฅ ์ •๋ณด ๋‚ด์— ์ˆจ๊ฒจ์ง„ ์˜๋ฏธ์  ํŽธํ–ฅ์„ ๋ฐœ๊ฒฌํ–ˆ๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ASR ๋ชจ๋ธ์˜ ํˆฌ๋ช…์„ฑ๊ณผ ์‹ ๋ขฐ์„ฑ์„ ๋†’์ด๋Š” ์ค‘์š”ํ•œ ์ฒซ๊ฑธ์Œ์ด๋‹ค.

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper, Project
TreePO๋Š” ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜์˜ LLM ์ •๋ ฌ ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋†’์€ ๊ณ„์‚ฐ ๋น„์šฉ๊ณผ ๋น„ํšจ์œจ์ ์ธ ํƒ์ƒ‰ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ํ…์ŠคํŠธ ์ƒ์„ฑ์„ ํŠธ๋ฆฌ ๊ตฌ์กฐ ํƒ์ƒ‰์œผ๋กœ ๊ฐ„์ฃผํ•˜๊ณ , ์„ธ๊ทธ๋จผํŠธ ๋‹จ์œ„๋กœ ์ƒ์„ฑํ•˜๋ฉฐ ๋ถˆํ™•์‹ค์„ฑ์ด ๋†’์€ ์ง€์ ์—์„œ ๋™์ ์œผ๋กœ ๋ถ„๊ธฐํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๊ฒฝ๋กœ๋ฅผ ํƒ์ƒ‰ํ•œ๋‹ค. ๋ถˆํ•„์š”ํ•œ ๊ณ„์‚ฐ์„ ์ค„์ด๊ณ  ๊ฐ€์น˜ ์—†๋Š” ๊ฒฝ๋กœ๋ฅผ ์กฐ๊ธฐ์— ์ œ๊ฑฐํ•จ์œผ๋กœ์จ ํ›ˆ๋ จ์— ํ•„์š”ํ•œ GPU ์‹œ๊ฐ„์„ ์ตœ๋Œ€ 43% ์ ˆ์•ฝํ•˜๊ณ  ์ถ”๋ก  ํšจ์œจ์„ฑ๊นŒ์ง€ ๋†’์ด๋Š” ์„ฑ๊ณผ๋ฅผ ๊ฑฐ๋‘์—ˆ๋‹ค.

Self-Rewarding Vision-Language Model via Reasoning Decomposition

Paper, Project
Self-Rewarding Vision-Language Model์€ VLM์˜ ๊ณ ์งˆ์ ์ธ ์‹œ๊ฐ์  ํ™˜๊ฐ ๋ฐ ์–ธ์–ด์  ์ง€๋ฆ„๊ธธ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ž๊ธฐ-๋ณด์ƒ(self-rewarding) ํ”„๋ ˆ์ž„์›Œํฌ์ธ Vision-SR1์„ ์ œ์•ˆํ•œ๋‹ค. ์ด ๋ชจ๋ธ์€ ์ถ”๋ก  ๊ณผ์ •์„ '์‹œ๊ฐ์  ์ธ์‹'๊ณผ '์–ธ์–ด์  ์ถ”๋ก '์˜ ๋‘ ๋‹จ๊ณ„๋กœ ๋ถ„ํ•ดํ•˜์—ฌ, ๋จผ์ € ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ํ…์ŠคํŠธ ์„ค๋ช…์„ ์ƒ์„ฑํ•œ ๋’ค ์˜ค์ง ๊ทธ ์„ค๋ช…๋งŒ์„ ์ด์šฉํ•ด ์งˆ๋ฌธ์— ๋‹ตํ•˜๋„๋ก ํ•œ๋‹ค. ์ด ๊ณผ์ •์˜ ์„ฑ๊ณต ์—ฌ๋ถ€๋ฅผ ๋‚ด๋ถ€์ ์ธ ๋ณด์ƒ ์‹ ํ˜ธ๋กœ ํ™œ์šฉํ•จ์œผ๋กœ์จ, ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ ์—†์ด๋„ ๋ชจ๋ธ์˜ ์‹œ๊ฐ์  ์ธ์‹ ๋Šฅ๋ ฅ์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ฐ•ํ™”์‹œํ‚จ๋‹ค.

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper, Project
Pref-GRPO๋Š” ํ…์ŠคํŠธ-์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋ชจ๋ธ์˜ ๊ฐ•ํ™”ํ•™์Šต ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋ณด์ƒ ํ•ดํ‚น ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์Œ๋ณ„ ์„ ํ˜ธ๋„(pairwise preference) ๊ธฐ๋ฐ˜์˜ ๋ณด์ƒ ์ฒด๊ณ„๋ฅผ ๋„์ž…ํ•œ๋‹ค. ๊ฐœ๋ณ„ ์ด๋ฏธ์ง€์— ์ ˆ๋Œ€ ์ ์ˆ˜๋ฅผ ๋งค๊ธฐ๋Š” ๋Œ€์‹ , ๋‘ ์ด๋ฏธ์ง€๋ฅผ ๋น„๊ตํ•˜์—ฌ ์–ด๋А ์ชฝ์ด ๋” ์„ ํ˜ธ๋˜๋Š”์ง€๋ฅผ ํ•™์Šต ์‹ ํ˜ธ๋กœ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ๋ชจ๋ธ์ด ๋ฏธ์„ธํ•œ ์ ์ˆ˜ ์ฐจ์ด์— ๊ณผ์ ํ•ฉ๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•œ๋‹ค. ์ด ์ ‘๊ทผ๋ฒ•์€ ํ›ˆ๋ จ ๊ณผ์ •์„ ์•ˆ์ •์‹œํ‚ค๊ณ , ํ•จ๊ป˜ ์ œ์•ˆ๋œ UniGenBench ๋ฒค์น˜๋งˆํฌ๋ฅผ ํ†ตํ•ด ๊ทธ ํšจ๊ณผ๋ฅผ ์ž…์ฆํ–ˆ๋‹ค.

CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics

Paper, Project
CMPhysBench๋Š” ์‘์ง‘๋ฌผ์งˆ๋ฌผ๋ฆฌํ•™์ด๋ผ๋Š” ๊ณ ๋„๋กœ ์ „๋ฌธํ™”๋œ ๊ณผํ•™ ๋ถ„์•ผ์—์„œ LLM์˜ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์„ ์ •๋ฐ€ํ•˜๊ฒŒ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋œ ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ์ด๋‹ค. 520๊ฐœ ์ด์ƒ์˜ ๋Œ€ํ•™์› ์ˆ˜์ค€ ๊ณ„์‚ฐ ๋ฌธ์ œ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ, ์ •๋‹ต ์—ฌ๋ถ€๋งŒ ํŒ๋‹จํ•˜๋Š” ๋Œ€์‹  ํ’€์ด ๊ณผ์ •์˜ ์ˆ˜์‹ ์œ ์‚ฌ๋„๊นŒ์ง€ ์ธก์ •ํ•˜๋Š” SEED ์ ์ˆ˜ ์ฒด๊ณ„๋ฅผ ๋„์ž…ํ•˜์—ฌ ์„ธ๋ฐ€ํ•œ ํ‰๊ฐ€๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ํ˜„์กด ์ตœ๊ณ  ์„ฑ๋Šฅ์˜ ๋ชจ๋ธ์กฐ์ฐจ ๋งค์šฐ ๋‚ฎ์€ ์ ์ˆ˜๋ฅผ ๊ธฐ๋กํ•˜์—ฌ, ์ „๋ฌธ ๊ณผํ•™ ๋ถ„์•ผ์— ๋Œ€ํ•œ LLM์˜ ๋Šฅ๋ ฅ์— ์•„์ง ์ƒ๋‹นํ•œ ํ•œ๊ณ„๊ฐ€ ์žˆ์Œ์„ ๋ช…ํ™•ํžˆ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค.

ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks

Paper, Project
ODYSSEY๋Š” ์กฐ์ž‘๊ธฐ๊ฐ€ ๋‹ฌ๋ฆฐ 4์กฑ ๋ณดํ–‰ ๋กœ๋ด‡์ด ์–ธ์–ด ๋ช…๋ น์— ๋”ฐ๋ผ ๋ณต์žกํ•˜๊ณ  ์žฅ๊ธฐ์ ์ธ ์ž„๋ฌด๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ํ†ตํ•ฉ ๋ชจ๋ฐ”์ผ ์กฐ์ž‘ ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค. ์ด ์‹œ์Šคํ…œ์€ VLM ๊ธฐ๋ฐ˜์˜ ๊ณ„์ธต์  ํ”Œ๋ž˜๋„ˆ๋ฅผ ํ†ตํ•ด ๋†’์€ ์ˆ˜์ค€์˜ ์ž‘์—…์„ ๊ณ„ํšํ•˜๊ณ , ์ด๋ฅผ ๊ฒฌ๊ณ ํ•œ ์ „์‹  ์ œ์–ด ์ •์ฑ…์œผ๋กœ ์‹คํ–‰ํ•˜์—ฌ ๋ฏผ์ฒฉํ•œ ์ด๋™๊ณผ ์ •๋ฐ€ํ•œ ์กฐ์ž‘์„ ๋™์‹œ์— ๋‹ฌ์„ฑํ•œ๋‹ค. ์„ฑ๊ณต์ ์ธ ์‹œ๋ฎฌ๋ ˆ์ด์…˜-์‹ค์ œ ํ™˜๊ฒฝ ์ „์ด๋ฅผ ํ†ตํ•ด ๋น„์ •ํ˜• ํ™˜๊ฒฝ์—์„œ์˜ ์‹ค์šฉ์„ฑ์„ ์ž…์ฆํ–ˆ์œผ๋ฉฐ, ๋ฒ”์šฉ ๋กœ๋ด‡ ๋น„์„œ์˜ ์‹คํ˜„ ๊ฐ€๋Šฅ์„ฑ์„ ํ•œ ๋‹จ๊ณ„ ์•ž๋‹น๊ฒผ๋‹ค.

profile
XR๊ณผ AI์— ๊ด€์‹ฌ์ด ๋งŽ์€ Sky ์ž…๋‹ˆ๋‹ค.

0๊ฐœ์˜ ๋Œ“๊ธ€