[Daily report] 24-02-29

kiteday·2024년 2월 29일

목록 보기

5/69

issue

Nvidia Is a Must-Buy. Or Is It?
기사 중간에 나와있는 The Global Race for Computer Chips 표가 잘 정리되어 있어 흥미롭다.
https://www.nytimes.com/2024/02/28/technology/openai-copyright-suit-media.html
AI의 저작권 이슈는 계속 나온다. 뭐가 정답인지는 모르겠다.

✨EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Alibaba에서 한 연구. 한장의 이미지와 소리를 주면 소리에 맞도록 말한다. Alibaba가 최근 비디오 생성 연구를 잘한다.
✨ DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model
LoRA와 같은 Adapter가 main contribution인 것 같다.
✨Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models && Sora Generates Videos with Stunning Geometrical Consistency
Sora의 review와 paper가 나왔다. Lower-Dimension Latent Space라는 개념이 핵심인데 좀 뜯어봐야겠다.
Video as the New Language for Real-World Decision Making
비디오의다음 프레임 생성 및 inpainting까지 해준다. 인상깊은 것은 다음 프레임의 seg.map이나 depth map, joint 등 프레임을 생성하기 위한 다양한 prediction task를 먼저 만들고 convert하는 방법을 택했다는 것이다.

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
1-bit로 LLM이 가능하게 했다고 한다. 기존 Mul-Add인 연산을 오직 Add로만 바꿔서 연산량을 확줄였다. 0.xx.. 이런 수치를 그냥 0, -1, 1로 퉁쳐 버린 것인데 주장에 따르면 성능에 큰 차이가 없다고 한다.

OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
번외 논문으로 한 편 선정해보았다. human-computer interaction agent를 학습하는 논문. 선택을 python GUI로 학습한다. 대단한데?

오늘 본 vision논문은 비디오 & LLM은 경량화에 초점을 두고 있다. 역시 이미지가 잘 되니 3D와 비디오로 넘어간다. 아직 수정할 부분이 보이긴 한다.