dgss

우병주·2025년 1월 15일
0

https://innate-ship-06e.notion.site/17c36f71c91e81279db4d4fb19a4248d?pvs=4

Topics:
We encourage submissions that are under one of the topics of interest, but also we welcome other interesting and relevant research for pixel-level understanding with vision foundation models.
Vision foundation models in pixel-level image and video understanding tasks, including: image segmentation, video segmentation, tracking, actor-action segmentation, depth estimation, motion estimation, etc.
Adaptation, generalization, and prompting of vision foundation models.
Interpretation and benchmarking of vision foundation models and their training data.
Real-world applications with focus on the societal impact of vision foundation models

In recent years, foundation models have gained significant traction and success, particularly in natural language processing, exemplified by the GPT series. These models are large-scale, trained on diverse datasets, primarily through self-supervised learning or vision language modelling. Such foundation models were shown to effectively adapt across various downstream tasks, with strong generalization capabilities, especially in zero-shot and few-shot scenarios. However, while language foundation models are well-established, their counterparts in the vision domain and their adoption in various tasks are still in the early-mid stages of development. Despite this, there is growing interest and progress in vision foundation models (VFM). Some of the latest models include those trained using self supervision, such as the DINO series, and those utilizing image/text like CLIP, Flamingo, and Llava. Various pixel-level vision foundation models have also emerged recently such as OMG-LLava or SAM series. Our workshop aims to bring together researchers dedicated to developing and adapting vision foundation models for pixel-level understanding tasks, including image segmentation, video segmentation, tracking, actor-action segmentation, depth estimation, and motion estimation. We will explore major directions in pixel-level understanding with vision foundation models and discuss the opportunities they present, particularly in low-resource settings that could have a positive societal impact. This is especially apparent in marginalized communities that lack access to large-scale labeled datasets tailored to their needs. Additionally, we will discuss the risks associated with these models and explore methods to mitigate them. The workshop features 7 invited talks, mixing emerging and established researchers, along with two poster sessions and selective spotlight presentations. We encourage submissions related to any research or application of pixel-level understanding with vision foundation models.

0개의 댓글

관련 채용 정보