[PyTorch] Lab09.1 - ReLU

Yun Geonil·2021년 2월 27일

📌 학습 목표

Problem of Sigmoid
ReLU
Optimizer in PyTorch
Code: mnist_softmax
Code: mnist_nn

Problem of Sigmoid

Sigmoid 함수를 activate function으로 사용하는 한 네트워크를 생각해보자.

네트워크를 학습시킬 때 backprop을 사용한다. 이때 sigmoid 함수를 도식화하여 살펴보면 끝단 쪽의 gradient는 0에 가깝기 때문에 이러한 gradient들이 layer로 쌓이게 되면 gradient vanishing 현상이 일어난다.

ReLU

ReLU의 식은 다음과 같다.

$f(x) = max(0, x)$

ReLU는 이제 gradient vanishing 현상이 일어나지 않는다. 하지만, x가 0이하일 때는 0이므로 주의해야한다.

Pytorch에는 여러가지 activation function이 있다.

torch.nn.sigmoid(x)
torch.nn.tanh(x)
torch.nn.relu(x)
torch.nn.leaky_relu(x, 0.01)

Optimizer in PyTorch

torch.optim 에는 여러가지 Optimizer들이 구현되어있다.

기회가 된다면 살펴보는 것도 나쁘지 않을 것 같다.

torch.optim.SGD
torch.optim.Adadelta
torch.optim.Adagrad
torch.optim.Adam
torch.optim.Rprop

Code: mnist_softmax

이전 포스트에서 사용했던 mnist 데이터셋을 이용하여 학습 실습을 해본다.

weight를 정규분포로 초기화 한다.

class SoftmaxClassifierModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(28*28, 10, bias=True).to(device)
        torch.nn.init.normal_(self.linear.weight)
    
    def forward(self, x):
        return self.linear(x)

이번에는 Adam optimizer를 이용한다.

model = SoftmaxClassifierModel()

criterion = torch.nn.CrossEntropyLoss().to(device)
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

학습하며 cost를 출력한다.

for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = len(data_loader)
    
    for X, Y in data_loader:
        #reshape
        X = X.view(-1, 28*28).to(device)
        Y = Y.to(device)
        
        optimizer.zero_grad()
        
        hypothesis = model(X)
        cost = criterion(hypothesis, Y)
        cost.backward()
        optimizer.step()
        
        avg_cost += cost / total_batch
        
    print('Epoch: {:4d}/{} Cost: {:.9f}'.format(
        epoch+1, training_epochs, avg_cost
    ))
print('Learning finished')
'''
Epoch:    1/15 Cost: 5.219546795
Epoch:    2/15 Cost: 1.520482898
Epoch:    3/15 Cost: 1.000591159
Epoch:    4/15 Cost: 0.803559899
Epoch:    5/15 Cost: 0.692900479
Epoch:    6/15 Cost: 0.621076047
Epoch:    7/15 Cost: 0.569595575
Epoch:    8/15 Cost: 0.530656099
Epoch:    9/15 Cost: 0.499562770
Epoch:   10/15 Cost: 0.475612253
Epoch:   11/15 Cost: 0.454856724
Epoch:   12/15 Cost: 0.437183768
Epoch:   13/15 Cost: 0.422301054
Epoch:   14/15 Cost: 0.409386396
Epoch:   15/15 Cost: 0.397539407
Learning finished
'''

# Test model using test data
with torch.no_grad():
    X_test = mnist_test.data.view(-1, 28*28).float().to(device)
    Y_test = mnist_test.targets.to(device)
    
    prediction = model(X_test)
    correct_prediction = torch.argmax(prediction, 1) == Y_test
    accuracy = correct_prediction.float().mean()
    print('Accuracy', accuracy.item())
'''
Accuracy 0.8863999843597412
'''

정확도는 0.88이다. 이제 지난 포스트에서 언급한 Multi Layer를 이용해본다.

Code: mnist_nn

이전 포스트에서 사용했던 mnist 데이터셋을 이용하여 학습 실습을 해본다.

총 3개의 계층과 ReLU activation function을 사용한다.

# nn
class SoftmaxClassifierModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = nn.Linear(28*28, 256, bias=True).to(device)
        self.linear2 = nn.Linear(256, 256, bias=True).to(device)
        self.linear3 = nn.Linear(256, 10, bias=True).to(device)
        self.relu = torch.nn.ReLU()
        
        torch.nn.init.normal_(self.linear1.weight)
        torch.nn.init.normal_(self.linear2.weight)
        torch.nn.init.normal_(self.linear3.weight)
        
        self.model = nn.Sequential(self.linear1, self.relu, self.linear2, self.relu, self.linear3).to(device)
    
    def forward(self, x):
        return self.model(x)

학습 코드는 동일하다

for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = len(data_loader)
    
    for X, Y in data_loader:
        #reshape
        X = X.view(-1, 28*28).to(device)
        Y = Y.to(device)
        
        optimizer.zero_grad()
        
        hypothesis = model(X)
        cost = criterion(hypothesis, Y)
        cost.backward()
        optimizer.step()
        
        avg_cost += cost / total_batch
        
    print('Epoch: {:4d}/{} Cost: {:.9f}'.format(
        epoch+1, training_epochs, avg_cost
    ))
print('Learning finished')
'''
Epoch:    1/15 Cost: 148.219757080
Epoch:    2/15 Cost: 40.473247528
Epoch:    3/15 Cost: 25.463268280
Epoch:    4/15 Cost: 17.595787048
Epoch:    5/15 Cost: 12.650253296
Epoch:    6/15 Cost: 9.272251129
Epoch:    7/15 Cost: 6.785788536
Epoch:    8/15 Cost: 5.091068268
Epoch:    9/15 Cost: 3.838643789
Epoch:   10/15 Cost: 2.877854824
Epoch:   11/15 Cost: 2.104069948
Epoch:   12/15 Cost: 1.571815252
Epoch:   13/15 Cost: 1.211519003
Epoch:   14/15 Cost: 1.071796656
Epoch:   15/15 Cost: 0.873725891
Learning finished
'''

# Test model using test data
with torch.no_grad():
    X_test = mnist_test.data.view(-1, 28*28).float().to(device)
    Y_test = mnist_test.targets.to(device)
    
    prediction = model(X_test)
    correct_prediction = torch.argmax(prediction, 1) == Y_test
    accuracy = correct_prediction.float().mean()
    print('Accuracy', accuracy.item())
'''
Accuracy 0.9485999941825867
'''

결과는 정확도 0.94로 훨씬 뛰어난 성능을 보였다.

Yun Geonil

Hello!

이전 포스트

[PyTorch] Lab08.2 - Multi Layer Perceptron

다음 포스트