[Paper Review] MoAI: Mixture of All Intelligence for Large Language and Vision Models (2024)

함지율·2024년 4월 25일

Paper I should read

목록 보기

13/18

MoAI-Compressor & MoAI-Mixer를 소개한다.

MoAI-Compressor는 external CV 모델의 출력값을 VL task에 적합한 보조적인 visual information을 aligns하고 condenses한다.

MoAI-Mixer는 3가지 타입의 지능을 섞는다.
1) Visual features
2) auxiliary features from the external CV models
3) language features
를 통해서 MoE 개념을 활용한다.

이를 통해 zero-shot VL tasks에서 우수한 성능을 보였다.

verbalization을 어떻게 하였는지 구체적인 예로 보여주었다.

MoAI Compressor는 플라밍고 모델을 그대로 사용하였다

MoAI-Mixer의 구조이다. cross-attention과 self-attention을 볼 수 있다.

구체적인 CA, SA방식에 Low Rank Adaptation 방식을 적용한 expert module이다

꿈 꾸는 디그다