[paper review] CLIP : Learning Transferable Visual Models From Natural Language Supervision

Jude's Sound Lab·2023년 1월 22일

Paper Review

목록 보기

10/17

Abstract

Introduction

Methodology

Finally, the temperature parameter which controls the range of the logits in the softmax, τ, is directly optimized during training as a log-parameterized multiplicative scalar to avoid turning as a hyper-parameter.

In the context of the sentence you provided, the temperature parameter, denoted as τ, is a value that controls the range of the logits in the softmax function. The softmax function is commonly used in neural networks as the last activation function in a multi-class classification problem. It maps the output of the network to a probability distribution over the possible classes.

The temperature parameter, τ, is a scalar value that modifies the logits before the softmax function is applied. It is applied as a multiplicative scalar, meaning it scales the logits by a fixed factor. When the temperature parameter is set to a higher value, the range of the logits becomes larger, which results in a softer probability distribution over the possible classes. This can be useful in certain situations where the model's predictions need to be less confident.

The authors of the paper mention that the temperature parameter is optimized during training as a log-parameterized multiplicative scalar. This means that the temperature parameter is optimized as a logarithm of the value, which allows for a wider range of values to be explored during optimization.

By optimizing the temperature parameter during training, the authors avoid having to tune it as a hyper-parameter, which can be time-consuming and resource-intensive. Instead, the temperature parameter is learned from the data and adapts to the specific task and dataset that the model is being trained on.

Sure, let me explain the sentence in more detail with an example.

In a multi-class classification problem, the output of the model is a set of logits, which are real-valued numbers representing the confidence of the model's predictions for each class. The softmax function is applied to these logits to convert them into a probability distribution over the possible classes. The probability of a certain class is computed as the exponential of the corresponding logit divided by the sum of exponentials of all the logits.

The temperature parameter, τ, controls the range of the logits before the softmax function is applied. When the temperature parameter is set to a higher value, it increases the range of the logits, making the exponential function less sensitive to small variations in the logits. As a result, the difference between the probabilities of the classes becomes smaller, and the distribution becomes softer. This means that the model's predictions become less confident.

Let's say we have a model that has been trained to classify images of animals into three classes: "cat", "dog" and "bird". The output of the model is the following logits: [-1, 2, 1].

If we apply the softmax function to these logits with a temperature of 1, we get the following probability distribution: [0.09, 0.78, 0.13]. The model is most confident that the image is a dog, with a probability of 0.78.

If we apply the softmax function to these logits with a temperature of 2, we get the following probability distribution: [0.17, 0.59, 0.24]. The model is still most confident that the image is a dog, but the probability has decreased to 0.59, and the probability of the other two classes has increased.

In this example, the model predictions become less confident as the temperature parameter increases, because the range of the logits becomes larger, resulting in a softer probability distribution over the possible classes.

As the authors mention, this can be useful in certain situations where the model's predictions need to be less confident. For example, when the model is used in a safety-critical application where it is important to avoid false positives, or when the data is highly uncertain and the model should be more conservative in its predictions.

Jude's Sound Lab

chords & code // harmony with structure

이전 포스트

[paper review] DALL-E : Zero-Shot Text-to-Image Generation

다음 포스트

[paper review] MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training

1개의 댓글

olivia james

2023년 3월 25일

When it comes to using a TENS unit, you want something that is easy to use and understand. Look for a unit that has intuitive controls and a user-friendly interface. https://drtensunit.com/best-tens-unit-pads/

답글 달기