[2025/W29] ๐Ÿค— Weekly AI Research

Skyยท2025๋…„ 7์›” 18์ผ

Weekly AI Research Digest

๋ชฉ๋ก ๋ณด๊ธฐ
42/89

์ ์‘ํ˜• ๊ณ„์‚ฐ๋ถ€ํ„ฐ ์‹ฌ์ธต ์ถ”๋ก ๊นŒ์ง€, ์ฐจ์„ธ๋Œ€ ์ƒ์„ฑ ๋ชจ๋ธ์˜ ์ง€๋Šฅ๊ณผ ํšจ์œจ์„ฑ ํƒ๊ตฌ
๊ฒ€์ƒ‰-์ถ”๋ก  ์œตํ•ฉ๋ถ€ํ„ฐ ์‹œ๊ฐ ์ง€๋Šฅ ๋ฐ ์‹ ๊ฒฝ ์šด์˜์ฒด์ œ๊นŒ์ง€, AI์˜ ์ƒˆ๋กœ์šด ์ง€ํ‰์„ ์—ด๋‹ค

TL;DR

  1. Test-Time Scaling with Reflective Generative Model
    ์ •์ฑ… ๋ชจ๋ธ๊ณผ ๋ณด์ƒ ๋ชจ๋ธ์„ ํ•˜๋‚˜๋กœ ํ†ตํ•ฉํ•˜์—ฌ ์ถ”๋ก  ํšจ์œจ์„ ๋†’์ด๊ณ , ํ…Œ์ŠคํŠธ ์‹œ์ ์— ๊ณ„์‚ฐ๋Ÿ‰์„ ์กฐ์ ˆํ•ด ์„ฑ๋Šฅ์„ ์œ ์—ฐํ•˜๊ฒŒ ๋ฐ”๊พธ๋Š” ์ƒ์„ฑํ˜• ๋ชจ๋ธ MetaStone-S1์„ ์ œ์•ˆํ•œ๋‹ค.

  2. A Survey of Context Engineering for Large Language Models
    LLM์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•œ '์ปจํ…์ŠคํŠธ ์—”์ง€๋‹ˆ์–ด๋ง' ๋ถ„์•ผ๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ์ •๋ฆฌํ•˜๊ณ , ๋ชจ๋ธ์ด ๋ณต์žกํ•œ ๋ฌธ๋งฅ์„ ์ดํ•ดํ•˜๋Š” ๋Šฅ๋ ฅ์— ๋น„ํ•ด ์ƒ์„ฑํ•˜๋Š” ๋Šฅ๋ ฅ์ด ๋ถ€์กฑํ•˜๋‹ค๋Š” ํ•ต์‹ฌ ์—ฐ๊ตฌ ๊ณผ์ œ๋ฅผ ์ œ์‹œํ•œ๋‹ค.

  3. Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
    ๋ฐ์ดํ„ฐ ์˜ค์—ผ์ด ์—†๋Š” ์ž์ฒด ์ œ์ž‘ ๋ฒค์น˜๋งˆํฌ๋ฅผ ํ†ตํ•ด LLM์˜ ๊ฐ•ํ™”ํ•™์Šต ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ์ง„์ •ํ•œ ์ถ”๋ก  ๋Šฅ๋ ฅ์˜ ๋ฐœ์ „์ด ์•„๋‹Œ ๋ฐ์ดํ„ฐ ์•”๊ธฐ ๋•Œ๋ฌธ์ผ ์ˆ˜ ์žˆ์Œ์„ ๋ฐํžˆ๊ณ , ์‹ ๋ขฐ์„ฑ ์žˆ๋Š” ํ‰๊ฐ€๋ฅผ ์ด‰๊ตฌํ•œ๋‹ค.

  4. Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
    ์–ธ์–ด์  ๋ฏธ์„ธ์กฐ์ •๊ณผ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๊ฐ•ํ™”ํ•™์Šต์„ ๊ฒฐํ•ฉํ•œ 2๋‹จ๊ณ„ ํ›ˆ๋ จ๋ฒ•์œผ๋กœ ์–ธ์–ด ๋ชจ๋ธ์˜ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ์‹œ๊ฐ ์˜์—ญ์œผ๋กœ ์„ฑ๊ณต์ ์œผ๋กœ ์ „์ด์‹œํ‚จ Open-Vision-Reasoner ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ–ˆ๋‹ค.

  5. NeuralOS: Towards Simulating Operating Systems via Neural Generative Models
    ์‚ฌ์šฉ์ž์˜ ์ž…๋ ฅ์— ๋ฐ˜์‘ํ•˜์—ฌ ์šด์˜์ฒด์ œ์˜ ๊ทธ๋ž˜ํ”ฝ ์ธํ„ฐํŽ˜์ด์Šค(GUI) ํ™”๋ฉด์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ƒ์„ฑํ•˜๋Š” ์‹ ๊ฒฝ๋ง ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ NeuralOS๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

  6. Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
    ์‚ฌ์‹ค์ ์ด์ง€๋งŒ ์ถ”๋ก ์ด ์•ฝํ•œ RAG์™€ ์ถ”๋ก ์€ ์ž˜ํ•˜์ง€๋งŒ ํ™˜๊ฐ์ด ์žˆ๋Š” ์ˆœ์ˆ˜ ์ถ”๋ก ์„ ํ†ตํ•ฉํ•˜๋Š” ๊ด€์ ์„ ์ œ์‹œํ•˜๊ณ , ์—์ด์ „ํŠธ๊ฐ€ ๊ฒ€์ƒ‰๊ณผ ์ถ”๋ก ์„ ๋ฐ˜๋ณตํ•˜๋Š” ์œตํ•ฉํ˜• ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ•ต์‹ฌ ๋ฐœ์ „ ๋ฐฉํ–ฅ์œผ๋กœ ์ œ์‹œํ•œ๋‹ค.

  7. Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation
    ์ด๋ฏธ์ง€ ์ดํ•ด์— ์“ฐ์ด๋˜ ๋น„์ „ ํŒŒ์šด๋ฐ์ด์…˜ ๋ชจ๋ธ์„ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์šฉ ํ† ํฌ๋‚˜์ด์ €๋กœ ํ™œ์šฉํ•˜์—ฌ, ์ƒ์„ฑ ํ’ˆ์งˆ๊ณผ ํ•™์Šต ํšจ์œจ์„ ๋™์‹œ์— ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚จ VFMTok์„ ์ œ์•ˆํ•œ๋‹ค.

  8. Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
    ๋ชจ๋ธ์˜ ์ผ๋ถ€๋ฅผ ์žฌ์‚ฌ์šฉํ•˜๋ฉด์„œ๋„ ํ† ํฐ๋ณ„๋กœ ๊ณ„์‚ฐ๋Ÿ‰์„ ๋™์ ์œผ๋กœ ํ• ๋‹นํ•˜๋Š” ์žฌ๊ท€ ํ˜ผํ•ฉ(MoR) ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ†ตํ•ด ์ ์€ ๋น„์šฉ์œผ๋กœ ํฐ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•œ๋‹ค.

  9. CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering
    3D ์žฅ๋ฉด์„ ์••์ถ•๋œ ํ† ํฐ(CLiFT)์œผ๋กœ ํ‘œํ˜„ํ•˜๊ณ  ๋ Œ๋”๋ง ์‹œ ํ† ํฐ ์ˆ˜๋ฅผ ์กฐ์ ˆํ•˜์—ฌ, ๋ฐ์ดํ„ฐ ํฌ๊ธฐ, ํ’ˆ์งˆ, ์†๋„ ๊ฐ„์˜ ๊ท ํ˜•์„ ์œ ์—ฐํ•˜๊ฒŒ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋Š” ์‹ ๊ฒฝ ๋ Œ๋”๋ง ๊ธฐ์ˆ ์„ ์ œ์•ˆํ•œ๋‹ค.

Test-Time Scaling with Reflective Generative Model

Paper, Project

'๋ฐ˜์„ฑ์  ์ƒ์„ฑ ๋ชจ๋ธ์„ ํ†ตํ•œ ํ…Œ์ŠคํŠธ ์‹œ๊ฐ„ ์Šค์ผ€์ผ๋ง' ๋…ผ๋ฌธ์€ MetaStone-S1์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ์ƒ์„ฑํ˜• ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ์ด ๋ชจ๋ธ์€ ์ •์ฑ… ๋ชจ๋ธ๊ณผ ๋ณด์ƒ ๋ชจ๋ธ์„ '์ž๊ธฐ ์ง€๋„ ๋ฐฉ์‹ ํ”„๋กœ์„ธ์Šค ๋ณด์ƒ ๋ชจ๋ธ(SPRM)'์ด๋ผ๋Š” ๋‹จ์ผ ๋„คํŠธ์›Œํฌ๋กœ ํ†ตํ•ฉํ•˜์—ฌ, ๋ณด์ƒ ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ 99% ์ด์ƒ ์ ˆ๊ฐํ•˜๋ฉฐ ์ถ”๋ก  ํšจ์œจ์„ฑ์„ ํฌ๊ฒŒ ๋†’์˜€๋‹ค. ์ด๋Ÿฌํ•œ ํ†ตํ•ฉ ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์ด ์ถ”๋ก  ์‹œ ์ƒ๊ฐ์˜ ๊ธธ์ด๋ฅผ ์กฐ์ ˆํ•˜์—ฌ ์„ฑ๋Šฅ๊ณผ ๊ณ„์‚ฐ๋Ÿ‰์˜ ๊ท ํ˜•์„ ๋งž์ถ”๋Š” 'ํ…Œ์ŠคํŠธ ์‹œ๊ฐ„ ์Šค์ผ€์ผ๋ง(TTS)'์„ ์‹คํ˜„ํ–ˆ์œผ๋ฉฐ, ์‹คํ—˜์„ ํ†ตํ•ด 320์–ต ๊ฐœ๋ผ๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ OpenAI์˜ o3-mini์™€ ํ•„์ ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆํ–ˆ๋‹ค.

A Survey of Context Engineering for Large Language Models

Paper, Project
'LLM์„ ์œ„ํ•œ ์ปจํ…์ŠคํŠธ ์—”์ง€๋‹ˆ์–ด๋ง ๊ฐœ์š”'๋Š” LLM์˜ ์„ฑ๋Šฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์ž…๋ ฅ ์ •๋ณด๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ์ตœ์ ํ™”ํ•˜๋Š” '์ปจํ…์ŠคํŠธ ์—”์ง€๋‹ˆ์–ด๋ง'์ด๋ผ๋Š” ๋ถ„์•ผ๋ฅผ ์ •๋ฆฝํ•œ ์—ฐ๊ตฌ๋‹ค. 1,300ํŽธ์ด ๋„˜๋Š” ๋…ผ๋ฌธ์„ ์ข…ํ•ฉ ๋ถ„์„ํ•˜์—ฌ, ์ปจํ…์ŠคํŠธ์˜ ๊ฒ€์ƒ‰ยท์ƒ์„ฑยท์ฒ˜๋ฆฌยท๊ด€๋ฆฌ์™€ ๊ฐ™์€ ๊ธฐ๋ณธ ์š”์†Œ๋ถ€ํ„ฐ RAG, ๋ฉ€ํ‹ฐ ์—์ด์ „ํŠธ ๊ฐ™์€ ๋ณตํ•ฉ ์‹œ์Šคํ…œ๊นŒ์ง€์˜ ๊ธฐ์ˆ ์  ๋กœ๋“œ๋งต์„ ์ œ์‹œํ–ˆ๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ํ˜„์žฌ LLM์ด ๋ณต์žกํ•œ ๋ฌธ๋งฅ์„ ์ดํ•ดํ•˜๋Š” ๋ฐ๋Š” ๋›ฐ์–ด๋‚˜์ง€๋งŒ, ๊ทธ์— ์ƒ์‘ํ•˜๋Š” ์ •๊ตํ•˜๊ณ  ๊ธด ๊ฒฐ๊ณผ๋ฌผ์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ๋Š” ๋šœ๋ ทํ•œ ํ•œ๊ณ„๋ฅผ ๋ณด์ด๋Š” '์ดํ•ด-์ƒ์„ฑ ๋น„๋Œ€์นญ์„ฑ' ๋ฌธ์ œ๋ฅผ ํ•ต์‹ฌ์ ์ธ ์—ฐ๊ตฌ ๊ฒฉ์ฐจ๋กœ ์ง€์ ํ•˜๋ฉฐ, ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๊ฒƒ์ด ํ–ฅํ›„ ์ค‘์š”ํ•œ ๊ณผ์ œ์ž„์„ ๊ฐ•์กฐํ•œ๋‹ค.

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Paper
'์ถ”๋ก ์ธ๊ฐ€ ์•”๊ธฐ์ธ๊ฐ€?' ๋…ผ๋ฌธ์€ ๊ฐ•ํ™”ํ•™์Šต(RL)์ด LLM์˜ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค๋Š” ์ตœ๊ทผ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋“ค์˜ ์‹ ๋ขฐ์„ฑ์— ์˜๋ฌธ์„ ์ œ๊ธฐํ•œ๋‹ค. ์—ฐ๊ตฌ์ง„์€ ํŠน์ • ๋ชจ๋ธ์—์„œ ๋‚˜ํƒ€๋‚˜๋Š” ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ์ง„์ •ํ•œ ์ถ”๋ก  ๋Šฅ๋ ฅ์˜ ๋ฐœ์ „์ด ์•„๋‹ˆ๋ผ, ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ๊ฐ€ ์‚ฌ์ „ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ํฌํ•จ๋œ '๋ฐ์ดํ„ฐ ์˜ค์—ผ'์— ๊ธฐ์ธํ•œ ์•”๊ธฐ์˜ ๊ฒฐ๊ณผ์ผ ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ฃผ์žฅํ•œ๋‹ค. ์ด๋ฅผ ์ž…์ฆํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ ์œ ์ถœ์ด ์›์ฒœ์ ์œผ๋กœ ๋ถˆ๊ฐ€๋Šฅํ•œ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ์…‹ 'RandomCalculation'์„ ๊ฐœ๋ฐœํ•˜์—ฌ ์‹คํ—˜ํ•œ ๊ฒฐ๊ณผ, ์˜ค์ง ์ •ํ™•ํ•œ ๋ณด์ƒ ์‹ ํ˜ธ๋งŒ์ด ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๊ธฐ์—ฌํ•จ์„ ๋ณด์˜€๋‹ค. ๋”ฐ๋ผ์„œ ์ด ์—ฐ๊ตฌ๋Š” ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด ์˜ค์—ผ๋˜์ง€ ์•Š์€ ๊นจ๋—ํ•œ ๋ฒค์น˜๋งˆํฌ๋ฅผ ์‚ฌ์šฉํ•  ๊ฒƒ์„ ๊ฐ•๋ ฅํžˆ ์ด‰๊ตฌํ•œ๋‹ค.

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Paper
'Open Vision Reasoner'๋Š” ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ณ ์ฐจ์›์  ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ์— ์ „์ด์‹œ์ผœ ์‹œ๊ฐ์  ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์ „ ํ•™์Šต์„ ์ง„ํ–‰ํ•œ ๋’ค, ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๊ฐ•ํ™”ํ•™์Šต์„ ์ ์šฉํ•˜๋Š” 2๋‹จ๊ณ„ ํ›ˆ๋ จ ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์•ˆํ–ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, ์–ธ์–ด ํ•™์Šต๋งŒ์œผ๋กœ๋„ '์–ธ์–ด์  ์‹ฌ์ƒ'์„ ํ†ตํ•ด ์ถ”๋ก  ํ–‰๋™์ด ์‹œ๊ฐ ์˜์—ญ์œผ๋กœ ์ „์ด๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ดํ›„ ๊ฐ•ํ™”ํ•™์Šต์ด ํšจ๊ณผ์ ์ธ ์‹œ๊ฐ ํŒจํ„ด์„ ์„ ๋ณ„ํ•˜๊ณ  ๊ฐ•ํ™”ํ•˜๋Š” ํ•ต์‹ฌ์ ์ธ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•จ์„ ๋ฐœ๊ฒฌํ–ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๊ฐœ๋ฐœ๋œ 'OVR' ๋ชจ๋ธ์€ MATH500 ๋“ฑ ์—ฌ๋Ÿฌ ๊ณ ๋‚œ๋„ ์‹œ๊ฐ ์ถ”๋ก  ๋ฒค์น˜๋งˆํฌ์—์„œ ์ตœ๊ณ  ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค.

NeuralOS: Towards Simulating Operating Systems via Neural Generative Models

Paper, Project
'NeuralOS'๋Š” ์‚ฌ์šฉ์ž์˜ ๋งˆ์šฐ์Šค, ํ‚ค๋ณด๋“œ ์ž…๋ ฅ์— ์ง์ ‘ ๋ฐ˜์‘ํ•˜์—ฌ ์šด์˜์ฒด์ œ์˜ GUI ํ™”๋ฉด ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•˜๋Š” ์‹ ๊ฒฝ๋ง ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์†Œ๊ฐœํ•œ๋‹ค. ์ด ์‹œ์Šคํ…œ์€ ์ปดํ“จํ„ฐ์˜ ๋‚ด๋ถ€ ์ƒํƒœ๋ฅผ ์ถ”์ ํ•˜๋Š” ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง(RNN)๊ณผ ํ™”๋ฉด ์ด๋ฏธ์ง€๋ฅผ ๋ Œ๋”๋งํ•˜๋Š” ํ™•์‚ฐ ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•˜์—ฌ, ์‹ค์ œ OS์™€ ์œ ์‚ฌํ•˜๊ฒŒ ์ž‘๋™ํ•˜๋Š” ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋ฅผ ๊ตฌํ˜„ํ–ˆ๋‹ค. ๋Œ€๊ทœ๋ชจ ์šฐ๋ถ„ํˆฌ ์‚ฌ์šฉ ๊ธฐ๋ก ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ํ•™์Šตํ•œ NeuralOS๋Š” ์‚ฌ์‹ค์ ์ธ GUI ์‹œํ€€์Šค๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ๋งˆ์šฐ์Šค ์ƒํ˜ธ์ž‘์šฉ ๋ฐ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ ์‹คํ–‰๊ณผ ๊ฐ™์€ ์ƒํƒœ ๋ณ€ํ™”๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ ์„ฑ๊ณตํ–ˆ์ง€๋งŒ, ์ •๊ตํ•œ ํ‚ค๋ณด๋“œ ์ž…๋ ฅ ๋ชจ๋ธ๋ง์€ ํ–ฅํ›„ ๊ณผ์ œ๋กœ ๋‚จ๊ฒผ๋‹ค.

Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs

Paper, Project
'์‹ฌ์ธต ์ถ”๋ก ์„ ํ†ตํ•œ ์—์ด์ „ํŠธํ˜• RAG๋ฅผ ํ–ฅํ•˜์—ฌ' ๋…ผ๋ฌธ์€ ๊ฒ€์ƒ‰์ฆ๊ฐ•์ƒ์„ฑ(RAG)์˜ ์‚ฌ์‹ค์„ฑ๊ณผ ์ˆœ์ˆ˜ ์ถ”๋ก ์˜ ์žฅ์ ์„ ๊ฒฐํ•ฉํ•˜๊ธฐ ์œ„ํ•œ ํ†ตํ•ฉ์  ๊ด€์ ์„ ์ œ์‹œํ•˜๋Š” ์—ฐ๊ตฌ๋‹ค. ๊ธฐ์กด RAG๋Š” ๋‹ค๋‹จ๊ณ„ ์ถ”๋ก ์— ์ทจ์•ฝํ•˜๊ณ , ์ˆœ์ˆ˜ ์ถ”๋ก ์€ ์‚ฌ์‹ค ์™œ๊ณก์„ ์ผ์œผํ‚ค๋Š” ํ•œ๊ณ„๋ฅผ ์ง€๋‹Œ๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ๋‘ ์ ‘๊ทผ๋ฒ•์„ ํ†ตํ•ฉํ•˜์—ฌ, (1) ์ถ”๋ก  ๋Šฅ๋ ฅ์ด RAG์˜ ๊ฐ ๋‹จ๊ณ„๋ฅผ ์–ด๋–ป๊ฒŒ ๊ฐ•ํ™”ํ•˜๋Š”์ง€, (2) ๊ฒ€์ƒ‰๋œ ์ง€์‹์ด ๋ณต์žกํ•œ ์ถ”๋ก ์„ ์–ด๋–ป๊ฒŒ ๋•๋Š”์ง€๋ฅผ ๋ถ„์„ํ•œ๋‹ค. ๋‚˜์•„๊ฐ€, ์—์ด์ „ํŠธ(agent)์™€ ๊ฐ™์€ LLM์ด ๊ฒ€์ƒ‰๊ณผ ์ถ”๋ก ์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋ฉฐ ์ง€์‹ ์ง‘์•ฝ์  ๊ณผ์ œ์—์„œ ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๋Š” '์œตํ•ฉํ˜• RAG-์ถ”๋ก ' ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ•ต์‹ฌ์ ์ธ ๋ฐœ์ „ ๋ฐฉํ–ฅ์œผ๋กœ ์กฐ๋ช…ํ•œ๋‹ค. ์ด ๋…ผ๋ฌธ์€ ๊ด€๋ จ ๊ธฐ์ˆ , ๋ฐ์ดํ„ฐ์…‹, ๊ทธ๋ฆฌ๊ณ  ํ–ฅํ›„ ๊ณผ์ œ๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ์ •๋ฆฌํ•˜๋ฉฐ ๋” ํšจ๊ณผ์ ์ด๊ณ  ์‹ ๋ขฐ์„ฑ ๋†’์€ RAG-์ถ”๋ก  ์‹œ์Šคํ…œ์„ ์œ„ํ•œ ์—ฐ๊ตฌ ๋กœ๋“œ๋งต์„ ์ œ๊ณตํ•œ๋‹ค.

Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation

Paper
์ด ์—ฐ๊ตฌ๋Š” ์ด๋ฏธ์ง€ '์ดํ•ด'๋ฅผ ์œ„ํ•ด ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ•๋ ฅํ•œ ๋น„์ „ ํŒŒ์šด๋ฐ์ด์…˜ ๋ชจ๋ธ(VFM)์„ ์ด๋ฏธ์ง€ '์ƒ์„ฑ'์„ ์œ„ํ•œ ํ† ํฌ๋‚˜์ด์ €๋กœ ํ™œ์šฉํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฐฉํ–ฅ์„ ํƒ๊ตฌํ•œ๋‹ค. ์ œ์•ˆ๋œ 'VFMTok'๋Š” ๊ณ ์ •๋œ VFM์„ ์ธ์ฝ”๋”๋กœ ์‚ฌ์šฉํ•˜๋ฉด์„œ, ์ค‘๋ณต ์ •๋ณด๋ฅผ ์ค„์ด๋Š” '์˜์—ญ ์ ์‘ํ˜• ์–‘์žํ™”'์™€ ์˜๋ฏธ ์ •๋ณด๋ฅผ ๋ณด์กดํ•˜๋Š” '์˜๋ฏธ์  ์žฌ๊ตฌ์„ฑ' ๋ชฉํ‘œ๋ฅผ ๋„์ž…ํ•˜์—ฌ ํšจ์œจ์„ฑ์„ ๋†’์˜€๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, VFMTok์€ ๊ธฐ์กด ๋ฐฉ์‹๋ณด๋‹ค ์ด๋ฏธ์ง€ ์ƒ์„ฑ ํ’ˆ์งˆ๊ณผ ํ† ํฐ ํšจ์œจ์„ ํฌ๊ฒŒ ๊ฐœ์„ ํ–ˆ์œผ๋ฉฐ, ํŠนํžˆ ์ž๊ธฐํšŒ๊ท€(AR) ์ƒ์„ฑ ๋ชจ๋ธ์˜ ํ•™์Šต ์†๋„๋ฅผ 3๋ฐฐ๊ฐ€๋Ÿ‰ ๊ฐ€์†ํ•˜๊ณ  ์ตœ๊ณ  ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๋Š” ๋ฐ ๊ธฐ์—ฌํ–ˆ๋‹ค.

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Paper, Project
'์žฌ๊ท€ ํ˜ผํ•ฉ ๋ชจ๋ธ(Mixture-of-Recursions, MoR)'์€ LLM์˜ ๊ณ ๋น„์šฉ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํŒŒ๋ผ๋ฏธํ„ฐ ์žฌ์‚ฌ์šฉ๊ณผ ์ ์‘ํ˜• ๊ณ„์‚ฐ์„ ๊ฒฐํ•ฉํ•œ ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ๋‹ค. MoR์€ ๊ณต์œ ๋œ ๋ ˆ์ด์–ด ์Šคํƒ์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ(ํŒŒ๋ผ๋ฏธํ„ฐ ํšจ์œจ์„ฑ) ๋ชจ๋ธ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋Š” ๋™์‹œ์—, ๊ฒฝ๋Ÿ‰ ๋ผ์šฐํ„ฐ๋ฅผ ํ†ตํ•ด ๊ฐ ํ† ํฐ๋งˆ๋‹ค ์žฌ๊ท€ ๊นŠ์ด๋ฅผ ๋™์ ์œผ๋กœ ํ• ๋‹นํ•˜์—ฌ ์ค‘์š”ํ•œ ํ† ํฐ์—๋งŒ ๊ณ„์‚ฐ์„ ์ง‘์ค‘์‹œํ‚จ๋‹ค(์ ์‘ํ˜• ๊ณ„์‚ฐ). ์ด ์ ‘๊ทผ๋ฒ•์„ ํ†ตํ•ด MoR์€ ๋™์ผํ•œ ํ•™์Šต ์—ฐ์‚ฐ๋Ÿ‰(FLOPs)์œผ๋กœ ๋” ์ž‘์€ ๋ชจ๋ธ ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง€๋ฉด์„œ๋„ ๋” ๋†’์€ ์„ฑ๋Šฅ๊ณผ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ๋‹ฌ์„ฑํ•˜๋ฉฐ, ํฐ ๋ชจ๋ธ์˜ ํ’ˆ์งˆ์„ ์ ์€ ๋น„์šฉ์œผ๋กœ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ํšจ์œจ์ ์ธ ๊ฒฝ๋กœ๋ฅผ ์ œ์‹œํ•œ๋‹ค.

CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering

Paper, Project
'CLiFT' ๋…ผ๋ฌธ์€ 3D ์žฅ๋ฉด์„ '์••์ถ•๋œ ๋ผ์ดํŠธ ํ•„๋“œ ํ† ํฐ(CLiFTs)'์ด๋ผ๋Š” ํšจ์œจ์ ์ธ ๋‹จ์œ„๋กœ ํ‘œํ˜„ํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋ทฐ๋ฅผ ๋ Œ๋”๋งํ•˜๋Š” ์ ‘๊ทผ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ์—ฌ๋Ÿฌ ์žฅ์˜ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ์žฅ๋ฉด์˜ ์™ธํ˜•๊ณผ ๊ธฐํ•˜ํ•™ ์ •๋ณด๋ฅผ ๋‹ด์€ ์••์ถ• ํ† ํฐ์„ ์ƒ์„ฑํ•˜๋ฉฐ, ํ•ต์‹ฌ์€ ๋ Œ๋”๋ง ์‹œ ์‚ฌ์šฉํ•  ํ† ํฐ์˜ ์ˆ˜๋ฅผ ๊ณ„์‚ฐ ์˜ˆ์‚ฐ์— ๋”ฐ๋ผ ์œ ์—ฐํ•˜๊ฒŒ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ด๋‹ค. ์ด๋ฅผ ํ†ตํ•ด CLiFT๋Š” ๋ฐ์ดํ„ฐ ํฌ๊ธฐ๋ฅผ ํฌ๊ฒŒ ์ค„์ด๋ฉด์„œ๋„ ๋†’์€ ํ’ˆ์งˆ์˜ ๋ Œ๋”๋ง์„ ์œ ์ง€ํ•˜๊ณ , ์‚ฌ์šฉ์ž๊ฐ€ ๋ฐ์ดํ„ฐ ํฌ๊ธฐ, ๋ Œ๋”๋ง ํ’ˆ์งˆ, ์†๋„ ๊ฐ„์˜ ์ƒ์ถฉ ๊ด€๊ณ„๋ฅผ ์ง์ ‘ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋Š” ์ ์‘ํ˜• ๋ Œ๋”๋ง์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค.

profile
XR๊ณผ AI์— ๊ด€์‹ฌ์ด ๋งŽ์€ Sky ์ž…๋‹ˆ๋‹ค.

0๊ฐœ์˜ ๋Œ“๊ธ€