Learning Prototype via Placeholder for Zero-shot Recognition 제2부

이준석·2023년 3월 22일

Learning Prototype via Placeholder for Zero-shot Recognition

목록 보기

2/2

2.1 Visual-Semantic Gap

Zero-shot learning (ZSL) transfers knowledge from seen classes to unseen by class semantic embeddings. Visual and semantic are two kinds of modal embeddings located on different manifold structures. Thus, there is typically a gap between visual and semantic domains. Thus the crucial task of ZSL is to learn a visual-semantic alignment. Embedding-based methods [Xian et al., 2017; Li et al., 2019] learn a common space to bridge the gap. Specifically, CVC-ZSL[Li et al., 2019] thinks visual space has highly precious discriminative power and proposes classifying visual features based on prototypes projected from semantic embeddings. Our work also learns classes prototypes by semantic→visual mapping for classification.
ZSL(Zero-shot learning)은 보이는 클래스에서 보이지 않는 클래스 시맨틱 임베딩으로 지식을 전송합니다. 시각적 및 의미적 임베딩은 서로 다른 매니폴드 구조에 있는 두 종류의 모달 임베딩입니다. 따라서 시각적 영역과 의미 영역 사이에는 일반적으로 간격이 있습니다. 따라서 ZSL의 중요한 작업은 시각적-의미적 정렬을 학습하는 것입니다. 임베딩 기반 방법[Xian et al., 2017; Li et al., 2019] 격차를 해소하기 위해 공통 공간을 배웁니다. 구체적으로 CVC-ZSL[Li et al., 2019]은 시각적 공간이 매우 중요한 식별력을 가지고 있다고 생각하고 시맨틱 임베딩에서 투영된 프로토타입을 기반으로 시각적 특징을 분류할 것을 제안합니다. 우리 작업은 또한 분류를 위한 의미론적→시각적 매핑으로 클래스 프로토타입을 학습합니다.

2.2 Projection Domain Shift

The problem of domain shift in ZSL is proposed by [Fu et al., 2015] and known as the projection domain shift. The same attributes may have very different visual appearances in terms of seen and unseen classes of ZSL. Thus, the visualsemantic alignment, i.e., the projection function learned from the seen classes, is often distorted when directly applied to the unseen classes. SAE [Kodirov et al., 2017] takes the encoderdecoder paradigm to enforces the reconstruction constraint on seen classes. However, it is less discriminating to project visual features into semantic space. As the baseline of our work, CVC-ZSL[Li et al., 2019] projects semantic embeddings to visual space and treats the projected results as classes prototypes. Due to the lack of unseen classes, the domain shift still cannot be well managed. Differently, LPL alleviates the domain shift by improving the separability of classlevel prototypes. Dispersed prototypes are learned by reserving placeholders for the unseen classes. Placeholder is implemented by class hallucination which has been explored in [Zhang and Wang, 2021] to mitigate the lack of samples for few-shot detection scenario. In this work, with different motivation and implementation, we leverage class hallucination to play a bridge between seen and unseen classes.
ZSL에서 도메인 이동 문제는 [Fu et al., 2015]에 의해 제안되었으며 프로젝션 도메인 이동으로 알려져 있습니다. ZSL의 보이는 클래스와 보이지 않는 클래스의 측면에서 동일한 속성이 매우 다른 시각적 모양을 가질 수 있습니다. 따라서 시각적 의미적 정렬, 즉 보이는 클래스에서 학습된 프로젝션 함수는 보이지 않는 클래스에 직접 적용될 때 종종 왜곡됩니다. SAE[Kodirov et al., 2017]는 인코더 디코더 패러다임을 사용하여 표시된 클래스에 대한 재구성 제약 조건을 적용합니다. 그러나 시각적 특징을 의미론적 공간에 투사하는 것은 덜 차별적입니다. 작업의 기준선으로 CVC-ZSL[Li et al., 2019]은 시맨틱 임베딩을 시각적 공간에 투영하고 투영된 결과를 클래스 프로토타입으로 처리합니다. 보이지 않는 클래스의 부족으로 인해 여전히 영역 이동을 잘 관리할 수 없습니다. 이와는 달리 LPL은 클래스 수준 프로토타입의 분리성을 개선하여 도메인 이동을 완화합니다. 분산된 프로토타입은 보이지 않는 클래스에 대한 자리 표시자를 예약하여 학습됩니다. 플레이스홀더는 [Zhang and Wang, 2021]에서 탐색된 클래스 환각으로 구현되어 소수 샷 감지 시나리오에 대한 샘플 부족을 완화합니다. 이 작업에서는 다른 동기와 구현으로 클래스 환각을 활용하여 보이는 클래스와 보이지 않는 클래스 사이의 다리 역할을 합니다.

3 Method

3.1 Preliminaries

이준석

인공지능 전문가가 될레요

이전 포스트

Learning Prototype via Placeholder for Zero-shot Recognition 제2부