[2025/W18] ๐Ÿค— Weekly AI Research

Skyยท2025๋…„ 5์›” 3์ผ

Weekly AI Research Digest

๋ชฉ๋ก ๋ณด๊ธฐ
20/89

TL;DR

AI ์ถ”๋ก  ๋ฐ ํšจ์œจ์„ฑ ์ตœ์ ํ™” ๋ถ„์•ผ์—์„œ๋Š” ๊ทน์†Œ๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋งŒ์œผ๋กœ ๊ฐ•ํ™”ํ•™์Šต์„ ํ†ตํ•ด ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๊ฑฐ๋‚˜(1-Shot RLVR), ๋ณต์žกํ•œ ์ถ”๋ก  ๊ณผ์ •์„ ๋ณด์กฐํ•˜๋Š” ํŠนํ™”๋œ ์ •๋ณด ๊ฒ€์ƒ‰๊ธฐ๋ฅผ ๊ฐœ๋ฐœํ•˜๊ณ (ReasonIR), ๋‹ค์–‘ํ•œ ์ข…๋ฅ˜์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ•จ๊ป˜ ์ฒ˜๋ฆฌํ•˜๋ฉฐ ์ถ”๋ก ํ•˜๋Š” ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ์˜ ๋Šฅ๋ ฅ์„ ๊ณ ๋„ํ™”ํ•˜๋Š”(Skywork R1V2) ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰ ์ค‘์ด๋‹ค. ๋”๋ถˆ์–ด, 1๋น„ํŠธ ์–ธ์–ด ๋ชจ๋ธ์˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ 4๋น„ํŠธ๋กœ ์ •๋ฐ€ํ•˜๊ฒŒ ์–‘์žํ™”ํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ๋ฐ ๊ณ„์‚ฐ ํšจ์œจ์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š”(BitNet v2) ๋“ฑ ๋ชจ๋ธ ๊ฒฝ๋Ÿ‰ํ™” ๋ฐ ์ตœ์ ํ™” ์—ฐ๊ตฌ๋„ ์ค‘์š”ํ•œ ์ถ•์„ ์ด๋ฃจ๊ณ  ์žˆ๋‹ค.

๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ธ์‹ ๋ฐ ์ƒ์„ฑ ๋ถ„์•ผ์—์„œ๋Š” ๋น„๋””์˜ค ์† ์นด๋ฉ”๋ผ์˜ ๋™์ ์ธ ์›€์ง์ž„์„ ๊นŠ์ด ์žˆ๊ฒŒ ์ดํ•ดํ•˜๊ณ  ๋ถ„์„ํ•˜๋ฉฐ(Camera Motions), ํ…์ŠคํŠธ, ์ด๋ฏธ์ง€, ๋น„๋””์˜ค ๋“ฑ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์™€ ์„ธ๋ถ„์„ฑ์„ ๊ฐ€์ง„ ์ •๋ณด ์†Œ์Šค๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ†ตํ•ฉํ•˜์—ฌ ํ™œ์šฉํ•˜๋Š” ์ฐจ์„ธ๋Œ€ ๊ฒ€์ƒ‰ ์ฆ๊ฐ• ์ƒ์„ฑ(UniversalRAG) ๊ธฐ์ˆ ์ด ๊ฐœ๋ฐœ๋˜๊ณ  ์žˆ๋‹ค. ๋˜ํ•œ, 3์ฐจ์› ๊ณต๊ฐ„์˜ ์‹œ๊ฐ„์  ๋ณ€ํ™”๊นŒ์ง€ ๋ชจ๋ธ๋งํ•˜๋Š” 4D ์„ธ๊ณ„ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ฑฐ๋‚˜(TesserAct), ์ž์—ฐ์–ด ์ง€์‹œ์— ๋”ฐ๋ผ ์ด๋ฏธ์ง€๋ฅผ ์ •๊ตํ•˜๊ฒŒ ํŽธ์ง‘ํ•˜๊ณ (In-Context Edit), ์˜์ƒ ์† ์ธ๋ฌผ์˜ ์ž… ๋ชจ์–‘์„ ์˜ค๋””์˜ค์— ๋งž์ถฐ ์‚ฌ์‹ค์ ์œผ๋กœ ์ƒ์„ฑํ•˜๋ฉฐ(KeySync), ์—ฌ๋Ÿฌ ํ™”์ž์˜ ๊ณต๊ฐ„ ์Œํ–ฅ ์ •๋ณด๊นŒ์ง€ ๋ณด์กดํ•˜๋ฉฐ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ฒˆ์—ญํ•˜๋Š”(Spatial Speech Translation) ๋“ฑ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ฐ ์ƒ์„ฑ ๋Šฅ๋ ฅ์ด ๊ณ ๋„ํ™”๋˜๊ณ  ์žˆ๋‹ค.

AI ํ‰๊ฐ€, ์‹ ๋ขฐ์„ฑ ๋ฐ ํŠนํ™” ์‘์šฉ ๋ถ„์•ผ์—์„œ๋Š” ํ˜„์žฌ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” AI ์„ฑ๋Šฅ ๋ฆฌ๋”๋ณด๋“œ์˜ ์ž ์žฌ์  ํŽธํ–ฅ์„ฑ๊ณผ ๊ตฌ์กฐ์  ๋ฌธ์ œ๋ฅผ ๋น„ํŒ์ ์œผ๋กœ ๋ถ„์„ํ•˜๊ณ  ๊ฐœ์„  ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•˜๋Š” ์—ฐ๊ตฌ(Leaderboard Illusion)๊ฐ€ ์ด๋ฃจ์–ด์กŒ๋‹ค. ์ด์™€ ํ•จ๊ป˜, ๋ฒ”์šฉ AI๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ํŠน์ • ์–ธ์–ด(์•„๋ž์–ด)์˜ ๊ณ ์œ ํ•œ ๋ฌธ์ œ(๋ฐœ์Œ ๋ถ€ํ˜ธ ํ‘œ๊ธฐ)๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํŠนํ™”๋œ ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•˜๊ณ , ํ•ด๋‹น ๋ถ„์•ผ์˜ ์ƒˆ๋กœ์šด ํ‘œ์ค€ ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ์…‹(๋ฒค์น˜๋งˆํฌ)์„ ๊ตฌ์ถ•ํ•˜์—ฌ ์—ฐ๊ตฌ ์ปค๋ฎค๋‹ˆํ‹ฐ์˜ ๋ฐœ์ „์„ ๋„๋ชจํ•˜๋Š”(Sadeed) ๋“ฑ, ํŠน์ • ์‘์šฉ ๋ถ„์•ผ์—์„œ์˜ AI ์„ฑ๋Šฅ ํ–ฅ์ƒ ๋ฐ ์—„๋ฐ€ํ•œ ํ‰๊ฐ€ ์ฒด๊ณ„ ์ˆ˜๋ฆฝ ๋…ธ๋ ฅ์ด ๋ณ‘ํ–‰๋˜๊ณ  ์žˆ๋‹ค.

AI ์ถ”๋ก  ๋ฐ ํšจ์œจ์„ฑ ์ตœ์ ํ™” ๋ถ„์•ผ

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper, Project

๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์˜ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ฐ•ํ™”ํ•˜๊ธฐ ์œ„ํ•ด, ๋‹จ ํ•˜๋‚˜์˜ ํ›ˆ๋ จ ์˜ˆ์ œ๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๊ฐ•ํ™”ํ•™์Šต(1-shot RLVR)์ด ๋งค์šฐ ํšจ๊ณผ์ ์ž„์„ ๋ณด์˜€๋‹ค. ์ด ๋ฐฉ์‹์€ ํŠน์ • ์ˆ˜ํ•™ ๋ฌธ์ œ ํ•ด๊ฒฐ ์˜ˆ์ œ ํ•˜๋‚˜๋งŒ์œผ๋กœ ๋ชจ๋ธ์˜ MATH500 ๋ฒค์น˜๋งˆํฌ ์„ฑ๋Šฅ์„ 36%์—์„œ 73.6%๋กœ ๋Œ€ํญ ํ–ฅ์ƒ์‹œ์ผœ, ์ˆ˜์ฒœ ๊ฐœ ์˜ˆ์ œ๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฐ๊ณผ์™€ ๋งž๋จน๋Š” ์ˆ˜์ค€์„ ๋‹ฌ์„ฑํ–ˆ์œผ๋ฉฐ, ํ›ˆ๋ จ ์‹œ ํƒ์ƒ‰์„ ์žฅ๋ คํ•˜๋Š” ๊ฒƒ(์—”ํŠธ๋กœํ”ผ ์†์‹ค ์ถ”๊ฐ€)์ด ์ค‘์š”ํ•จ์„ ํ™•์ธํ–ˆ๋‹ค.

Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning

Paper, Project

๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ถ”๋ก  ๋ชจ๋ธ Skywork R1V2๋Š” ์ถ”๋ก  ๋Šฅ๋ ฅ๊ณผ ์ผ๋ฐ˜ํ™” ์‚ฌ์ด์˜ ์–ด๋ ค์šด ๊ท ํ˜•์„ ๋งž์ถ”๊ธฐ ์œ„ํ•ด ๋ณด์ƒ ๋ชจ๋ธ ์œ ๋„์™€ ๊ทœ์น™ ๊ธฐ๋ฐ˜ ์ „๋žต์„ ๊ฒฐํ•ฉํ•œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฐ•ํ™”ํ•™์Šต ํŒจ๋Ÿฌ๋‹ค์ž„์„ ๋„์ž…ํ–ˆ๋‹ค. ๋˜ํ•œ, ํ›ˆ๋ จ ํšจ์œจ์„ฑ์„ ๋†’์ด๋Š” ์„ ํƒ์  ์ƒ˜ํ”Œ ๋ฒ„ํผ(SSB) ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์ ์šฉํ•˜๊ณ  ๊ณผ๋„ํ•œ ๊ฐ•ํ™” ์‹ ํ˜ธ๋กœ ์ธํ•œ ์‹œ๊ฐ์  ํ™˜๊ฐ ํ˜„์ƒ์„ ์ œ์–ดํ•˜๋ฉฐ, ์—ฌ๋Ÿฌ ์ฃผ์š” ์ถ”๋ก  ๋ฒค์น˜๋งˆํฌ(OlympiadBench, AIME2024 ๋“ฑ)์—์„œ ์ตœ๊ณ  ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค.

ReasonIR: Training Retrievers for Reasoning Tasks

Paper, Project

๊ธฐ์กด ์ •๋ณด ๊ฒ€์ƒ‰๊ธฐ๋“ค์ด ์‚ฌ์‹ค์  ์งˆ์˜์— ์น˜์ค‘๋˜์–ด ๋ณต์žกํ•œ ์ถ”๋ก ์— ์•ฝํ•œ ํ•œ๊ณ„๋ฅผ ๋„˜์–ด, ์ถ”๋ก  ๊ณผ์ œ๋ฅผ ์œ„ํ•ด ํŠน๋ณ„ํžˆ ํ›ˆ๋ จ๋œ ๊ฒ€์ƒ‰๊ธฐ ReasonIR-8B๋ฅผ ๊ฐœ๋ฐœํ–ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์–ด๋ ค์šด ์งˆ๋ฌธ๊ณผ ํ˜ผ๋™ํ•˜๊ธฐ ์‰ฌ์šด ๊ทธ๋Ÿด๋“ฏํ•œ ์˜ค๋‹ต(hard negative)์„ ํฌํ•จํ•œ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๊ธฐ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜์—ฌ ํ›ˆ๋ จํ–ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, ์ถ”๋ก  ์ค‘์‹ฌ ๋ฒค์น˜๋งˆํฌ(BRIGHT)์—์„œ ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ๊ณ , RAG ์‹œ์Šคํ…œ์— ์ ์šฉ ์‹œ ์ฃผ์š” ์ถ”๋ก  ๊ณผ์ œ(MMLU, GPQA ๋“ฑ) ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ๊ฐœ์„ ํ–ˆ๋‹ค.

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

Paper, Project

1๋น„ํŠธ LLM์˜ ํšจ์œจ์ ์ธ ๋ฐฐํฌ๋ฅผ ๋ฐฉํ•ดํ•˜๋Š” ํ™œ์„ฑํ™” ๊ฐ’์˜ ์ด์ƒ์น˜(outlier) ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด BitNet v2 ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ–ˆ๋‹ค. ํ•ต์‹ฌ์ธ H-BitLinear ๋ชจ๋“ˆ์€ ํ™œ์„ฑํ™” ์–‘์žํ™” ์ง์ „์— ์˜จ๋ผ์ธ ์•„๋‹ค๋งˆ๋ฅด ๋ณ€ํ™˜์„ ์ ์šฉํ•˜์—ฌ ๋ถ„ํฌ๋ฅผ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ๋งŒ๋“ ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด 1๋น„ํŠธ LLM์—์„œ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๋ฉด์„œ ๋„ค์ดํ‹ฐ๋ธŒ 4๋น„ํŠธ ํ™œ์„ฑํ™” ์–‘์žํ™”๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰ ๋ฐ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค.

๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ธ์‹ ๋ฐ ์ƒ์„ฑ ๋ถ„์•ผ

Towards Understanding Camera Motions in Any Video

Paper, Project

๋น„๋””์˜ค ์† ์นด๋ฉ”๋ผ ์›€์ง์ž„ ์ดํ•ด๋ฅผ ๋ชฉํ‘œ๋กœ, ์•ฝ 3,000๊ฐœ์˜ ๋น„๋””์˜ค๋กœ ๊ตฌ์„ฑ๋œ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹/๋ฒค์น˜๋งˆํฌ CameraBench์™€ ์ „๋ฌธ๊ฐ€ ๊ธฐ๋ฐ˜ ์›€์ง์ž„ ๋ถ„๋ฅ˜ ์ฒด๊ณ„๋ฅผ ๊ตฌ์ถ•ํ–ˆ๋‹ค. ๊ธฐ์กด ๊ตฌ์กฐ-์›€์ง์ž„ ๋ณต์›(SfM) ๋ชจ๋ธ์€ ์˜๋ฏธ๋ก ์  ์›€์ง์ž„(์˜ˆ: ๋”ฐ๋ผ๊ฐ€๊ธฐ)์—, ๋น„๋””์˜ค-์–ธ์–ด ๋ชจ๋ธ(VLM)์€ ์ •๋ฐ€ํ•œ ๊ธฐํ•˜ํ•™์  ๊ถค์  ์ถ”์ •์— ์•ฝ์ ์„ ๋ณด์˜€์œผ๋ฉฐ, ์ด๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ๋‘ ์žฅ์ ์„ ๊ฒฐํ•ฉํ•œ VLM์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ๋” ํฌ๊ด„์ ์ธ ๋ถ„์„์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ๋‹ค.

UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities

Paper, Project

๊ธฐ์กด ๊ฒ€์ƒ‰ ์ฆ๊ฐ• ์ƒ์„ฑ(RAG)์ด ์ฃผ๋กœ ํ…์ŠคํŠธ์— ๊ตญํ•œ๋œ ํ•œ๊ณ„๋ฅผ ๋„˜์–ด, UniversalRAG๋Š” ํ…์ŠคํŠธ, ์ด๋ฏธ์ง€, ๋น„๋””์˜ค ๋“ฑ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ(modality)์™€ ์„ธ๋ถ„์„ฑ(granularity)์˜ ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋ฅผ ๋™์‹œ์— ํ™œ์šฉํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋‹ค. ํ•ต์‹ฌ์ธ ํ˜•ํƒœ ์ธ์‹ ๋ผ์šฐํŒ… ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ†ตํ•ด ์งˆ์˜์˜ ๋‚ด์šฉ์— ๊ฐ€์žฅ ์ ํ•ฉํ•œ ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋ฅผ ๋™์ ์œผ๋กœ ์‹๋ณ„ํ•˜๊ณ  ๊ทธ ์•ˆ์—์„œ ๊ฒ€์ƒ‰ํ•จ์œผ๋กœ์จ, ๋ณด๋‹ค ์ •ํ™•ํ•˜๊ณ  ๊ด€๋ จ์„ฑ ๋†’์€ ์ •๋ณด๋ฅผ ์ƒ์„ฑํ•˜๋„๋ก ๊ฐœ์„ ํ–ˆ๋‹ค.

TesserAct: Learning 4D Embodied World Models

Paper, Project

๋กœ๋ด‡๊ณผ ๊ฐ™์€ ์ฒดํ™”๋œ ์—์ด์ „ํŠธ๊ฐ€ ์ฃผ๋ณ€ ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ์ž‘์šฉํ•˜๋Š” ๋ฐฉ์‹์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด, TesserAct๋Š” 3D ๊ณต๊ฐ„์˜ ์‹œ๊ฐ„์  ๋ณ€ํ™”๊นŒ์ง€ ์˜ˆ์ธกํ•˜๋Š” 4D(3D+์‹œ๊ฐ„) ์ฒดํ™”๋œ ์„ธ๊ณ„ ๋ชจ๋ธ ํ•™์Šต๋ฒ•์„ ์ œ์‹œํ–ˆ๋‹ค. RGB ๋น„๋””์˜ค์— ๊นŠ์ด(D)์™€ ๋ฒ•์„ (N) ์ •๋ณด๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ํ•™์Šตํ•จ์œผ๋กœ์จ, ๊ฐ์ฒด์˜ ์ƒ์„ธํ•œ ํ˜•ํƒœ์™€ ๋™์ ์ธ ๋ณ€ํ™”๋ฅผ ํ•จ๊ป˜ ๋ชจ๋ธ๋งํ•˜๊ณ  ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋” ํšจ๊ณผ์ ์ธ ์—์ด์ „ํŠธ ํ–‰๋™ ์ •์ฑ…(policy) ํ•™์Šต์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ๋‹ค.

In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

Paper, Project

์ง€์‹œ ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ํŽธ์ง‘์—์„œ ์ •๋ฐ€๋„์™€ ํšจ์œจ์„ฑ์˜ ๊ท ํ˜•์„ ๋งž์ถ”๊ณ ์ž In-Context Edit ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ฐœ๋ฐœํ–ˆ๋‹ค. ๋Œ€๊ทœ๋ชจ ๋””ํ“จ์ „ ํŠธ๋žœ์Šคํฌ๋จธ(DiT)์˜ ๊ฐ•๋ ฅํ•œ ๋ฌธ๋งฅ ์ดํ•ด ๋Šฅ๋ ฅ์„ ํ™œ์šฉํ•˜์—ฌ, ๋ณ„๋„์˜ ํฐ ํŒŒ์ธํŠœ๋‹ ์—†์ด ์ธ-์ปจํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŒ…๋งŒ์œผ๋กœ ์ œ๋กœ์ƒท(zero-shot) ํŽธ์ง‘์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ๋˜ํ•œ LoRA-MoE ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ํŠœ๋‹์œผ๋กœ ๊ทน์†Œ๋Ÿ‰์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ํšจ์œจ์ ์œผ๋กœ ํ•™์Šต์‹œ์ผœ, ๋งค์šฐ ์ ์€ ์ž์›์œผ๋กœ๋„ ์ตœ์ฒจ๋‹จ ์ˆ˜์ค€์˜ ํŽธ์ง‘ ํ’ˆ์งˆ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค.

KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution

Paper, Project

๋น„๋””์˜ค ๋ฆฝ์‹ฑํฌ ์‹œ ์›๋ณธ ์˜์ƒ์˜ ํ‘œ์ •์ด ๋ถ€์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋‚จ๋Š” ํ‘œ์ • ๋ˆ„์ˆ˜(leakage) ๋ฌธ์ œ์™€ ์ž… ์ฃผ๋ณ€์ด ๊ฐ€๋ ค์ง€๋Š” ์–ผ๊ตด ๊ฐ€๋ฆผ(occlusion) ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด KeySync ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ–ˆ๋‹ค. ์ด 2๋‹จ๊ณ„ ์‹œ์Šคํ…œ์€ ์‹œ๊ฐ„์  ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•˜๋ฉด์„œ, ํŠน์ˆ˜ํ•˜๊ฒŒ ์„ค๊ณ„๋œ ๋งˆ์Šคํ‚น ์ „๋žต์„ ์ ์šฉํ•˜์—ฌ ์›์น˜ ์•Š๋Š” ํ‘œ์ • ์ •๋ณด๋‚˜ ๊ฐ€๋ ค์ง„ ๋ถ€๋ถ„์„ ํšจ๊ณผ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•จ์œผ๋กœ์จ, ๋”์šฑ ์ž์—ฐ์Šค๋Ÿฝ๊ณ  ์ •ํ™•ํ•œ ๊ณ ํ•ด์ƒ๋„ ๋ฆฝ์‹ฑํฌ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

Spatial Speech Translation: Translating Across Space With Binaural Hearables

Paper, Project

์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์ด ๋™์‹œ์— ๋‹ค๋ฅธ ์–ธ์–ด๋กœ ๋งํ•˜๋Š” ๋ณต์žกํ•œ ํ™˜๊ฒฝ์—์„œ, ํžˆ์–ด๋Ÿฌ๋ธ” ๊ธฐ๊ธฐ๊ฐ€ ์ด๋ฅผ ์‹ค์‹œ๊ฐ„ ๋ฒˆ์—ญํ•˜์—ฌ ๋“ค๋ ค์ฃผ๋˜, ๊ฐ ํ™”์ž์˜ ์›๋ž˜ ์œ„์น˜(๊ณต๊ฐ„๊ฐ)์™€ ๊ณ ์œ ํ•œ ๋ชฉ์†Œ๋ฆฌ ํŠน์„ฑ๊นŒ์ง€ ๋ณด์กดํ•˜๋Š” ๊ณต๊ฐ„ ์Œ์„ฑ ๋ฒˆ์—ญ ๊ฐœ๋…๊ณผ ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ–ˆ๋‹ค. ์Œ์› ๋ถ„๋ฆฌ, ํ™”์ž ์œ„์น˜ ํŒŒ์•…, ์‹ค์‹œ๊ฐ„ ๋ฒˆ์—ญ, ๋ฐ”์ด๋…ธ๋Ÿด ๋ Œ๋”๋ง ๊ธฐ์ˆ ์„ ํ†ตํ•ฉํ•˜์—ฌ, ์ฃผ๋ณ€ ์†Œ์Œ์ด๋‚˜ ๋‹ค๋ฅธ ํ™”์ž์˜ ๊ฐ„์„ญ ์†์—์„œ๋„ ๊ฐ ํ™”์ž์˜ ๋ฒˆ์—ญ๋œ ๋ชฉ์†Œ๋ฆฌ๊ฐ€ ํ•ด๋‹น ๋ฐฉํ–ฅ์—์„œ ๋“ค๋ฆฌ๋„๋ก ๊ตฌํ˜„ํ–ˆ๋‹ค.

AI ํ‰๊ฐ€, ์‹ ๋ขฐ์„ฑ ๋ฐ ํŠนํ™” ์‘์šฉ ๋ถ„์•ผ

The Leaderboard Illusion

Paper

AI ๋ชจ๋ธ ์ˆœ์œ„ ๋ฆฌ๋”๋ณด๋“œ์ธ Chatbot Arena๊ฐ€ ๊ณต์ •์„ฑ ๋ฌธ์ œ๋ฅผ ๊ฒช๊ณ  ์žˆ์Œ์„ ์ง€์ ํ–ˆ๋‹ค. ์ผ๋ถ€ ๊ธฐ์—…์ด ๋น„๊ณต๊ฐœ ์‚ฌ์ „ ํ…Œ์ŠคํŠธ, ์œ ๋ฆฌํ•œ ์ ์ˆ˜๋งŒ ์„ ํƒ์  ๊ณต๊ฐœ, ํŠน์ • ๊ธฐ์—… ๋ชจ๋ธ์— ๋Œ€ํ•œ ๋” ๋งŽ์€ ๋ฐ์ดํ„ฐ(๋ฐฐํ‹€ ๊ธฐํšŒ) ์ ‘๊ทผ ๋“ฑ ์ฒด๊ณ„์  ํŽธํ–ฅ์˜ ํ˜œํƒ์„ ๋ˆ„๋ฆฌ๊ณ  ์žˆ์œผ๋ฉฐ, ์ด๋กœ ์ธํ•ด ์ˆœ์œ„๊ฐ€ ์™œ๊ณก๋˜๊ณ  ๋ชจ๋ธ๋“ค์ด ์‹ค์ œ ์„ฑ๋Šฅ ํ–ฅ์ƒ๋ณด๋‹ค Arena ์ž์ฒด์— ๊ณผ์ ํ•ฉ๋˜๋Š” ๋ฆฌ๋”๋ณด๋“œ ์ฐฉ์‹œ ํ˜„์ƒ์ด ๋ฐœ์ƒํ•œ๋‹ค๊ณ  ๋ฐํ˜”๋‹ค.

Sadeed: Advancing Arabic Diacritization Through Small Language Model

Paper, Project

์•„๋ž์–ด ๋ฐœ์Œ ๋ถ€ํ˜ธ ํ‘œ๊ธฐ(diacritization)์˜ ์–ด๋ ค์›€์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๋‹ค์–‘ํ•œ ์•„๋ž์–ด ์ฝ”ํผ์Šค๋กœ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ์†Œํ˜• ์–ธ์–ด ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ์— ๋ฏธ์„ธ ์กฐ์ •ํ•œ Sadeed ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ–ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ์ œํ•œ๋œ ์ปดํ“จํŒ… ์ž์›์œผ๋กœ๋„ ๋…์  ๋Œ€ํ˜• ๋ชจ๋ธ๊ณผ ๊ฒฝ์Ÿ๋ ฅ ์žˆ๋Š” ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค. ๋”๋ถˆ์–ด, ๊ธฐ์กด ํ‰๊ฐ€ ๋ฐฉ์‹์˜ ํ•œ๊ณ„๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ํ…์ŠคํŠธ ์œ ํ˜•๊ณผ ๋‚œ์ด๋„๋ฅผ ํฌ๊ด„ํ•˜๋Š” ์ƒˆ ๋ฒค์น˜๋งˆํฌ SadeedDiac-25๋„ ํ•จ๊ป˜ ์†Œ๊ฐœํ•˜์—ฌ ๊ณต์ •ํ•œ ๋น„๊ต๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ๋‹ค.

profile
XR๊ณผ AI์— ๊ด€์‹ฌ์ด ๋งŽ์€ Sky ์ž…๋‹ˆ๋‹ค.

0๊ฐœ์˜ ๋Œ“๊ธ€