Basic : Sigmoid and Softmax

Austin Jiuk Kim·2022년 3월 26일
0

Deep Learning

목록 보기
9/10

Why Sigmoid? Softmax?

In Deep-Learning process, learn depends on loss. loss is the scale of the difference between the dataset which is prepared and the output from the model.

sigmoid, softmax is mainly used for image classification.

There are reasons why starters should focus on the image classification. One is that it is easier than the other DL processes. Another is that there are lots of references because image classification is one of the first fields.


Odds

Odds are the ratio of the probability of one event to that of an alternative event. So, if odds are bigger than 1, the probability of one event is more than that of an alternative event. In contrast, if odds are smaller than 1, the probability of one event is less than that of an alternative event.

o=p1po = {{p} \over {1-p}}

This is just another way to express the probability of an event happening or not. And, the closer the probability goes to 1, the odds diverge infinitely.


Logit

However, the graph of odds are not symmetric. So, it is required to put log on odds. This is called logit.

l=log(p1p)=log(p)log(1p)l = log({{p} \over {1-p}}) \\ \quad\quad\quad\quad = log(p) - log(1-p)

Logit is symmetric with respect to 0.5 at which odds are 1 and logit is 0. And logit diverges not only positively but also negatively.


Logit and Sigmoid

The inverse function of logit is the Sigmoid equation which returns probability from logit.

p=11+el(0p1)p = {{1} \over {1+{e^{-l}}}} \quad (0 \le p \le 1)

Sigmoid is symmetric with respect to 0 and there are horizontal asymptotes on 0 and on 1.

As logit diverges, the value of Affine-Transfromation also diverges. Because of this, We can deal with the value of Affine-Transfromation as a kind of logit. In other words, if the value of Affine-Transfromation passes through the sigmoid equation, it would be converted to the probability. Therefore, in this process, Deep Learning learns the probability.

Sigmoid is used for binary classification.
Softmax is used for multinominal classification.

Logistic Regression is based on this.

p    z[1]    XT\overrightarrow{p} \;\\ \;\\ \overrightarrow{z}^{[1]} \;\\ \;\\ X^T

Affine-Function -> logit
Activation-Function -> probability

profile
그냥 돼지

0개의 댓글