[CS231n] 6. Training NN

회준 권·2025년 3월 19일

CS231n

목록 보기

5/7

〰️ 내용

Until now

Express functions in terms of in computational graphs
Neural Networks, which are linear layers stacked (non-linearities in between)
CNN, which are type of NN which use convolution layers to preserve spatial structure
learn the weights and parameters through optimization using GD

[1] Activation Functions

Where Activation Functions are applied in our system (after the matrix mul.)

(1) Sigmoid

squashes the numbers to [0, 1] -> probablity interpretation possible 😃
Saturation causes killing gradients
Sigmoid outputs are NOT zero-centered

(2) Tanh

squashes the numbers to [-1, 1]
Saturation is still killing gradients
the outputs are zero-centered 😃

(3) ReLU

NO Saturation 😃
Easy calculation 😃
NOT zero-centered
Killing the gradients when x < 0

(4) Leaky ReLU

Dead ReLU problem solved! 😃

[2] Data Preprocessing

Why the need for zero-mean?

For Activation Functions to NOT fall into saturation regions.

How this zero-mean is used in CNN

[3] Weight Initialization

Works fine for small networks
Problem arises in deeper networks

Can check that the std falls down to 0.
Why is this a problem?

[]

Then how about just making the weights big?

The neurons become saturated!

Xavier Initialization

Xavier Initialization as an alternative

[4] Batch Normalization

N : batch size
D: feature size

Normalize by each features in the batch.

Used to control to squash the range as wanted

〰️ 질문

x = -10 : 기울기가 0에 가까워, 역전파시에 제대로 전달이 안 된다.
x = 0 : 괜찮음
x = 10 : 기울기가 0에 가까워, 역전파시에 제대로 전달이 안 된다.

기울기가 항상 양수, 음수가 된다
즉 파라미터 업데이트(W)를 할 때 항상 increase/decrease를 하게 된다
--> 비효율적이다.

같은 output
같은 gradient
update in the same way

〰️ 첨언

Unit Gaussian

Weight Initialization이나 Batch Normalization모두 이것을 해주는건데,
왜 필요할까?

Vanishing/Exploding 기울기 문제 완화 (분산 유지)

Xavier Initialization

왜 Var(z)를 기준으로 논리가 전개되는가?

Var(z)가 Xavier Initialization에서처럼 일정하게 유지된다면,

출력값이 너무 커지거나/작아지지 않는다
기울기의 분산이 유지됨

이전 포스트

[CS231n] 5. Convolutional Neural Networks

다음 포스트

[CS231n] 7. Training NN

0개의 댓글