Semantics Disentangling for Generalized Zero-Shot Learning 제2 부

이준석·2022년 8월 1일

Semantics Disentangling for Generalized Zero-Shot Learning

목록 보기

2/3

Introduction

Human beings have a remarkable ability to learn new notions based on prior experience without seeing them in advance.
notion 개념
인간은 사전에 보지 않고도 이전 경험을 바탕으로 새로운 개념을 배우는 놀라운 능력을 가지고 있습니다.

For example, given the clues that zebras appear like horses yet with black-and-white stripes, one can quickly recognize a zebra if he/she has seen horses before.
clue 단서
예를 들어 얼룩말이 말처럼 보이지만 흑백 줄무늬가 있다는 단서가 주어지면 전에 말을 본 적이 있다면 얼룩말을 빨리 알아볼 수 있습니다.

Nevertheless, unlike humans, supervised machine learning algorithms can only classify samples belonging to the classes that have already appeared during the training phase, and they are not able to handle samples from previously unseen categories.
그럼에도 불구하고 지도 머신 러닝 알고리즘은 인간과 달리 훈련 단계에서 이미 등장한 클래스에 속하는 샘플만 분류할 수 있으며 이전에 본 적이 없는 범주의 샘플을 처리할 수 없습니다.

This challenge motivates the study of generalizing models to the unseen classes by transferring knowledge from intermediate semantics (e.g., attributes), which typically refers to zero-shot learning (ZSL). intro
intro 소개 intermediate 중간의
이 과제는 일반적으로 제로샷 학습(ZSL)을 참조하는 중간 의미론(예: 속성)에서 지식을 전달하여 모델을 보이지 않는 클래스로 일반화하는 연구에 동기를 부여한다.

Particularly, the core idea of ZSL [19, 34, 1, 17] lies in learning to map features between the semantic space and visual space, thereby closing the gap between the seen and unseen classes.
특히 ZSL[19, 34, 1, 17]의 핵심 아이디어는 의미 공간과 시각적 공간 사이의 기능을 매핑하는 방법을 학습하여 보이는 클래스와 보이지 않는 클래스 사이의 간격을 좁히는 데 있습니다.

While effective, conventional ZSL techniques are built upon the assumption that the test set only contains samples from the unseen classes, which can be easily violated in practice.
violated 위반하다
효과적이기는 하지만, 기존의 ZSL 기법은 테스트 세트에 보이지 않는 클래스의 샘플만 포함되어 있다는 가정 하에 구축되며, 실제로 이를 위반하기 쉽다.

Hence, it is more reasonable to consider a new protocol called generalized zero-shot learning (GZSL), where seen and unseen images are both to be identified.
따라서 본 이미지와 보이지 않는 이미지를 모두 식별하는 GZSL(Generalized Zero-shot Learning)이라는 새로운 프로토콜을 고려하는 것이 더 합리적입니다.

Figure 1: An illustration of the visual features (red boxes) that are not associated with the annotated attributes. Learning from such visual features that are semantically unrelated may jeopardize the model generalization to unseen classes.
jeopardize 위태롭게 하다
그림 1: 주석이 달린 속성과 관련이 없는 시각적 특징(빨간색 상자)의 그림. 의미적으로 관련이 없는 이러한 시각적 기능에서 학습하면 보이지 않는 클래스에 대한 모델 일반화가 위험할 수 있다.

Existing GZSL techniques can be roughly grouped into two types: embedding-based [9, 34, 22, 21, 13] and generative-based [38, 35, 28, 24, 15] approaches.
기존 GZSL 기술은 크게 두 가지 유형으로 그룹화할 수 있습니다. 임베딩 기반[9, 34, 22, 21, 13]과 생성 기반[38, 35, 28, 24, 15] 접근 방식입니다.

The former group learns a projection or an embedding function to associate the visual features of seen classes with the respective semantic vectors, while the latter one learns a visual generator for the unseen classes based on the seen points and semantic representations of both classes.
respective 각각의, 각자의
전자는 투영 또는 임베딩 함수를 학습하여 보이는 클래스의 시각적 특징을 각각의 의미 벡터와 연관시키고, 후자는 두 클래스의 보이는 점 및 의미 표현을 기반으로 보이지 않는 클래스에 대한 시각적 생성기를 학습한다.

However, most GZSL approaches directly leverage the visual features extracted from the pre-trained deep models, such as ResNet101 [11] pre-trained on ImageNet, which are not tailored for ZSL tasks.
그러나 대부분의 GZSL 접근 방식은 ZSL 작업에 맞게 조정되지 않은 ImageNet에서 사전 훈련된 ResNet101[11]과 같은 사전 훈련된 심층 모델에서 추출한 시각적 기능을 직접 활용합니다.

In [31], it is observed that not all the dimensions of the extracted visual features are semantically related to the pre-defined attributes, which triggers the bias on learning semantic-visual alignment and causes negative transfer to unseen classes.
[31]에서 추출된 시각적 특징의 모든 차원이 사전 정의된 속성과 의미론적으로 관련되어 있지는 않은 것으로 관찰되었으며, 이는 의미론적 시각적 정렬 학습에 대한 편향을 유발하고 보이지 않는 클래스에 부정적인 전달을 유발합니다.

Given an example from the AWA dataset shown in Figure 1, despite the features of animals’ ears are visually salient for discriminating image samples, it is ignored in the manually annotated attributes.
salient 가장 중요한 annotated 주석이 달린
그림 1에 표시된 AWA 데이터 세트의 예를 보면, 동물의 귀의 특징이 이미지 샘플을 구별하기 위해 시각적으로 두드러지지만 수동으로 주석이 달린 속성에서는 무시된다.

When generalizing to unseen classes such as cats, it is easy for them to be misclassified as tigers because the visual features corresponding to the concepts “Big, Strong, Muscle” are not highlighted.
고양이와 같이 눈에 보이지 않는 클래스로 일반화하면 '크고 강하고 근육질'이라는 개념에 해당하는 시각적 특징이 부각되지 않아 호랑이로 오분류되기 쉽다.

From this case, we believe that GZSL will benefit from using the visual features that can consistently align with the respective semantic attributes.
이 경우 GZSL이 각각의 의미론적 속성과 일관되게 정렬할 수 있는 시각적 기능을 사용하여 이점을 얻을 수 있다고 믿습니다.

We define this type of visual features as the semantic-consistent features, which are agnostic to both seen and unseen classes.
우리는 이러한 유형의 시각적 특징을 의미론적으로 일관된 특징으로 정의하며, 이는 보이는 클래스와 보이지 않는 클래스 모두에 불가지론적입니다.

In contrast, those visual features that are irrelevant to manually annotated attributes are defined as semantic-unrelated.
manually 수동적으로,
대조적으로, 수동으로 주석 처리된 속성과 관련이 없는 시각적 기능은 의미론적 관련이 없는 것으로 정의됩니다.

이준석

인공지능 전문가가 될레요

이전 포스트

Semantics Disentangling for Generalized Zero-Shot Learning 제1 부

다음 포스트

Semantics Disentangling for Generalized Zero-Shot Learning 제2 부