03. Gradient Descent

Park Jong Hun·2021년 1월 24일
1

PytorchZeroToAll

목록 보기
3/5

Sung Kim님의 유투브 강의 자료인 PytorchZeroToAll를 바탕으로 공부한 내용에 대한 글입니다.

Linear Regression Error


위 loss graph에서는 주황색이 알맞은 weight를 사용한 것으로 초록색과 파란색의 loss를 가지는 weight는 주황색 line으로 가까이 가도록 해야한다.

우리가 해야할 것은 loss function을 최소로 만드는 weight를 찾는 것이고, pytorch에선 argmin함수를 사용하여 알아낼 수 있다. argminwloss(w)argmin_wloss(w)은 loss를 최소로 만드는 weight를 찾는다는 의미이다.

Gradient descent algorithm


이번엔 01. Linear Model에서 weight를 하나씩 바꿔가며 loss를 구해보는 방법이 아닌 자동으로 찾아내는 gradient descent algorithm를 배울 것이다.

처음에는 임의의 weight에서 시작하여 weight의 값을 크게 할지, 작게 할지를 loss의 gradient(기울기)를 구해서 알 수 있다. gradient의 값이 양수이면 weight는 줄어드는 방향으로 진행하게 되고, 음수이면 weight가 커지는 방향으로 진행할 것이다.

w=wασlossσww = w - \alpha\frac{\sigma loss}{\sigma w}

이 식을 사용하여 점점 global loss minimum에 가까워지게 된다. 여기서 α\alpha에 따라 얼마만큼 weight값을 변화시킬지를 결정하고 learning rate라고 부른다.

Code


# Training Data
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]

# our model forward pass
def forward(x):
    return x * w
    
# Loss function
def loss(x, y):
    y_pred = forward(x)
    return (y_pred - y) * (y_pred - y)
    
# compute gradient
def gradient(x, y):  # d_loss/d_w
    return 2 * x * (x * w - y)
    
# Before training
print("Prediction (before training)",  4, forward(4))

# Training loop
for epoch in range(10):
    for x_val, y_val in zip(x_data, y_data):
        # Compute derivative w.r.t to the learned weights
        # Update the weights
        # Compute the loss and print progress
        grad = gradient(x_val, y_val)
        w = w - 0.01 * grad
        print("\tgrad: ", x_val, y_val, round(grad, 2))
        l = loss(x_val, y_val)
    print("progress:", epoch, "w=", round(w, 2), "loss=", round(l, 2))
    
# After training
print("Predicted score (after training)",  "4 hours of studying: ", forward(4))

실행 결과

Prediction (before training) 4 4.0
grad: 1.0 2.0 -2.0
grad: 2.0 4.0 -7.84
grad: 3.0 6.0 -16.23
progress: 0 w= 1.26 loss= 4.92
grad: 1.0 2.0 -1.48
grad: 2.0 4.0 -5.8
grad: 3.0 6.0 -12.0
progress: 1 w= 1.45 loss= 2.69
grad: 1.0 2.0 -1.09
grad: 2.0 4.0 -4.29
grad: 3.0 6.0 -8.87
progress: 2 w= 1.6 loss= 1.47
grad: 1.0 2.0 -0.81
grad: 2.0 4.0 -3.17
grad: 3.0 6.0 -6.56
progress: 3 w= 1.7 loss= 0.8
grad: 1.0 2.0 -0.6
grad: 2.0 4.0 -2.34
grad: 3.0 6.0 -4.85
progress: 4 w= 1.78 loss= 0.44
grad: 1.0 2.0 -0.44
grad: 2.0 4.0 -1.73
grad: 3.0 6.0 -3.58
progress: 5 w= 1.84 loss= 0.24
grad: 1.0 2.0 -0.33
grad: 2.0 4.0 -1.28
grad: 3.0 6.0 -2.65
progress: 6 w= 1.88 loss= 0.13
grad: 1.0 2.0 -0.24
grad: 2.0 4.0 -0.95
grad: 3.0 6.0 -1.96
progress: 7 w= 1.91 loss= 0.07
grad: 1.0 2.0 -0.18
grad: 2.0 4.0 -0.7
grad: 3.0 6.0 -1.45
progress: 8 w= 1.93 loss= 0.04
grad: 1.0 2.0 -0.13
grad: 2.0 4.0 -0.52
grad: 3.0 6.0 -1.07
progress: 9 w= 1.95 loss= 0.02
Predicted score (after training) 4 hours of studying: 7.804863933862125

Exercise 3-1: compute gradient


y^=x2w2+xw1+b\hat y = x^2 * w2 + x * w1 + b
loss=(y^y)2loss = (\hat y - y)^2

σlossσw1=?\frac{\sigma loss}{\sigma w1} = ?

σlossσw2=?\frac{\sigma loss}{\sigma w2} = ?

# Training Data
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]

# a random guess: random value
w1 = 0
w2 = 0
b = 0

# our model forward pass
def forward(x):
    return pow(x,2)*w2 + x*w1 + b
    
# Loss function
def loss(x, y):
    y_pred = forward(x)
    return (y_pred - y) * (y_pred - y)
    
# compute gradient
def w1_gradient(x, y, y_pred):  # d_loss/d_w1
    return 2 * x * (y_pred - y)

def w2_gradient(x, y, y_pred):  # d_loss/d_w2
    return 2 * pow(x, 2) * (y_pred - y)

def b_gradient(x, y, y_pred):  # d_loss/d_b
    return 2 * (y_pred - y)

# Update the weights and the bias
def optimize(x, y, learning_rate = 0.02):
    global w1, w2, b
    y_pred = forward(x)
    
    w1_grad = w1_gradient(x, y, y_pred)
    w1 = w1 - learning_rate * w1_grad

    w2_grad = w2_gradient(x, y, y_pred)
    w2 = w2 - learning_rate * w2_grad

    b_grad = b_gradient(x, y, y_pred)
    b = b - learning_rate * b_grad

    print('\tgrad(w1, w2, b): ', x_val, y_val, round(w1_grad, 2), round(w2_grad, 2), round(b_grad, 2))
    return w1, w2, b
    
# Before training
print("Prediction (before training)",  4, forward(4))

# Training loop
for epoch in range(20):
    for x_val, y_val in zip(x_data, y_data):
        # Compute derivative w.r.t to the learned weights
        # Compute the loss and print progress
        w1, w2, b = optimize(x_val, y_val)
        l = loss(x_val, y_val)
    print("progress : ", epoch, "loss = ", round(l, 2))
    
# After training
print("Predicted score (after training)",  "4 hours of studying: ", forward(4))

실행 결과

Prediction (before training) 4 0
grad(w1, w2, b): 1.0 2.0 -4.0 -4.0 -4.0
grad(w1, w2, b): 2.0 4.0 -13.76 -27.52 -6.88
grad(w1, w2, b): 3.0 6.0 5.74 17.22 1.91
progress : 0 loss = 6.38
grad(w1, w2, b): 1.0 2.0 -2.59 -2.59 -2.59
grad(w1, w2, b): 2.0 4.0 -7.33 -14.67 -3.67
grad(w1, w2, b): 3.0 6.0 7.81 23.42 2.6
progress : 1 loss = 11.8
grad(w1, w2, b): 1.0 2.0 -2.6 -2.6 -2.6
grad(w1, w2, b): 2.0 4.0 -8.67 -17.33 -4.33
grad(w1, w2, b): 3.0 6.0 5.81 17.44 1.94
progress : 2 loss = 6.54
grad(w1, w2, b): 1.0 2.0 -2.09 -2.09 -2.09
grad(w1, w2, b): 2.0 4.0 -6.88 -13.77 -3.44
grad(w1, w2, b): 3.0 6.0 5.67 17.01 1.89
progress : 3 loss = 6.22
grad(w1, w2, b): 1.0 2.0 -1.85 -1.85 -1.85
grad(w1, w2, b): 2.0 4.0 -6.56 -13.13 -3.28
grad(w1, w2, b): 3.0 6.0 4.86 14.58 1.62
progress : 4 loss = 4.57
grad(w1, w2, b): 1.0 2.0 -1.56 -1.56 -1.56
grad(w1, w2, b): 2.0 4.0 -5.75 -11.5 -2.88
grad(w1, w2, b): 3.0 6.0 4.44 13.31 1.48
progress : 5 loss = 3.81
grad(w1, w2, b): 1.0 2.0 -1.33 -1.33 -1.33
grad(w1, w2, b): 2.0 4.0 -5.26 -10.52 -2.63
grad(w1, w2, b): 3.0 6.0 3.94 11.82 1.31
progress : 6 loss = 3.01
grad(w1, w2, b): 1.0 2.0 -1.12 -1.12 -1.12
grad(w1, w2, b): 2.0 4.0 -4.73 -9.47 -2.37
grad(w1, w2, b): 3.0 6.0 3.56 10.67 1.19
progress : 7 loss = 2.45
grad(w1, w2, b): 1.0 2.0 -0.94 -0.94 -0.94
grad(w1, w2, b): 2.0 4.0 -4.31 -8.62 -2.15
grad(w1, w2, b): 3.0 6.0 3.19 9.58 1.06
progress : 8 loss = 1.98
grad(w1, w2, b): 1.0 2.0 -0.78 -0.78 -0.78
grad(w1, w2, b): 2.0 4.0 -3.92 -7.83 -1.96
grad(w1, w2, b): 3.0 6.0 2.88 8.65 0.96
progress : 9 loss = 1.61
grad(w1, w2, b): 1.0 2.0 -0.63 -0.63 -0.63
grad(w1, w2, b): 2.0 4.0 -3.58 -7.16 -1.79
grad(w1, w2, b): 3.0 6.0 2.61 7.83 0.87
progress : 10 loss = 1.32
grad(w1, w2, b): 1.0 2.0 -0.51 -0.51 -0.51
grad(w1, w2, b): 2.0 4.0 -3.28 -6.56 -1.64
grad(w1, w2, b): 3.0 6.0 2.37 7.1 0.79
progress : 11 loss = 1.08
grad(w1, w2, b): 1.0 2.0 -0.4 -0.4 -0.4
grad(w1, w2, b): 2.0 4.0 -3.01 -6.03 -1.51
grad(w1, w2, b): 3.0 6.0 2.15 6.46 0.72
progress : 12 loss = 0.9
grad(w1, w2, b): 1.0 2.0 -0.3 -0.3 -0.3
grad(w1, w2, b): 2.0 4.0 -2.78 -5.56 -1.39
grad(w1, w2, b): 3.0 6.0 1.96 5.89 0.65
progress : 13 loss = 0.75
grad(w1, w2, b): 1.0 2.0 -0.22 -0.22 -0.22
grad(w1, w2, b): 2.0 4.0 -2.58 -5.15 -1.29
grad(w1, w2, b): 3.0 6.0 1.8 5.4 0.6
progress : 14 loss = 0.63
grad(w1, w2, b): 1.0 2.0 -0.14 -0.14 -0.14
grad(w1, w2, b): 2.0 4.0 -2.4 -4.79 -1.2
grad(w1, w2, b): 3.0 6.0 1.65 4.96 0.55
progress : 15 loss = 0.53
grad(w1, w2, b): 1.0 2.0 -0.08 -0.08 -0.08
grad(w1, w2, b): 2.0 4.0 -2.24 -4.47 -1.12
grad(w1, w2, b): 3.0 6.0 1.52 4.57 0.51
progress : 16 loss = 0.45
grad(w1, w2, b): 1.0 2.0 -0.02 -0.02 -0.02
grad(w1, w2, b): 2.0 4.0 -2.09 -4.19 -1.05
grad(w1, w2, b): 3.0 6.0 1.41 4.23 0.47
progress : 17 loss = 0.38
grad(w1, w2, b): 1.0 2.0 0.03 0.03 0.03
grad(w1, w2, b): 2.0 4.0 -1.97 -3.94 -0.99
grad(w1, w2, b): 3.0 6.0 1.31 3.93 0.44
progress : 18 loss = 0.33
grad(w1, w2, b): 1.0 2.0 0.08 0.08 0.08
grad(w1, w2, b): 2.0 4.0 -1.86 -3.72 -0.93
grad(w1, w2, b): 3.0 6.0 1.22 3.66 0.41
progress : 19 loss = 0.29
Predicted score (after training) 4 hours of studying: 7.71975178478036

profile
NLP, AI, LLM, MLops

0개의 댓글