
이 논문은 특정 pre-trained model의 행동을 바꾸는, 즉 새로운 downstream task의 성능을 높이거나 편향을 제거하는 등의 일을 하기 위해 task vector라는 새로운 개념을 제안한다.

VideoTree is a adaptive, hierarchical framework of training-free long-form video understanding using LLM agent.
VideoAgent is a multimodal agentic video understanding model that utilizes a structured, unified memory capturing both temporal events and object-cent

Trimming redundant parameters and resolving sign conflicts in model merging step helps to address interference between parameters

DoraemonGPT is a Video Agent based on LLM that understands dynamic scenes.