https://github.com/Qinying-Liu/Awesome-Open-Vocabulary-Semantic-Segmentation 참고
The model is trained on fully-supervised semantic segmentation datasets with pixel-level annotations (e.g., COCO Stuff dataset).
[ODISE] | CVPR'23 | Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models | https://arxiv.org/pdf/2303.04803 | https://github.com/NVlabs/ODISE
[Li et al.] | ICCV'23 | Open-vocabulary Object Segmentation with Diffusion Models | https://arxiv.org/pdf/2301.05221 | https://github.com/Lipurple/Grounded-Diffusion
The model is modified from the off-the-shelf large models (e.g., CLIP, Diffusion models) without an additional training phase. Note that, the large models have already been trained with some datasets (e.g., image-caption datasets).
[OVDiff] | Arxiv'23.06 | Diffusion Models for Zero-Shot Open-Vocabulary Segmentation | [pdf]
[DiffSegmenter] | Arxiv'23.09 | Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter | [pdf] | [code]
[EmerDiff] | ICLR'24 | EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models | [pdf] | [code]
[FreeSeg-Diff] | Arxiv'24.03 | FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models | [pdf] | [code]
[MaskDiffusion] | Arxiv'24.03 | MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation | [pdf] | [code]
[FreeDA] | CVPR'24 | Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation| [pdf] | [code]
[OVAM] | Arxiv'24.03 | Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models | [pdf]
[Diff2Scene] | ECCV'24 | Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models | [pdf]