[AI504] Practice 03: Neural Networks & Backprop

이채연·2023년 3월 6일

AI504

목록 보기

6/28

Week 3: PyTorch, Logistic Regression and MLP

We will cover basic concepts of PyTorch Framework (tensor operations, GPU utilizing and autograd)
We will implement simple logistic regression and multinomial logistic regression (softmax) with PyTorch
We will use simple linear model and multi-layer perceptron (MLP) in this class

If you have any questions, feel free to ask

For additional questions, post questions in classum or send emails to jiho283@kaist.ac.kr

Why PyTorch?

Intuitive and concise code
Define by Run method (Tensorflow is Define and Run method)
High compatibility with Numpy (almost one-to-one mapping)

0. Prelim: Load packages & GPU setup

# visualize current GPU usages in your server
!nvidia-smi

Mon Mar  6 13:40:40 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   44C    P0    27W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

# set gpu by number 
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'  # setting gpu number

# load packages
!pip install torch
!pip install numpy
import torch
import numpy as np

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: torch in /usr/local/lib/python3.8/dist-packages (1.13.1+cu116)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.8/dist-packages (from torch) (4.5.0)
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: numpy in /usr/local/lib/python3.8/dist-packages (1.22.4)

# print the version of PyTorch
print(torch.__version__)

1.13.1+cu116

1. PyTorch and Numpy

PyTorch use tensor: the basic data structure in PyTorch.
Tensor: n-dimensional array + GPU calculation is supported
Almost the same with Numpy array

PyTorch and Numpy shares almost identical grammer

We will show some examples of:

Same operation with identical grammer
Same operation with different grammer
Different operation with same grammer

We will not handle all examples in this class :(

For more examples, see the following reference: https://github.com/wkentaro/pytorch-for-numpy-users

First! Define Numpy array and PyTorch tensor

np_array_1 = np.array([1, 2, 3, 4])
np_array_2 = np.array([5, 6, 7, 8])
torch_tensor_1 = torch.tensor([1, 2, 3, 4])
torch_tensor_2 = torch.tensor([5 ,6 ,7, 8])

print (np_array_1)
print (np_array_2)
print (torch_tensor_1)
print (torch_tensor_2)

[1 2 3 4]
[5 6 7 8]
tensor([1, 2, 3, 4])
tensor([5, 6, 7, 8])

1) Same operations with identical grammer

Example) Get the shape of the tensor

# numpy
print (np_array_1.shape)

# torch
print (torch_tensor_1.shape)
print (torch_tensor_1.size()) # size() and shape operation is identical in torch

(4,)
torch.Size([4])
torch.Size([4])

2) Same operations with different grammer

Example 1) Concatenate two tensors

numpy use np.concatenate
torch use torch.cat
IMPORTANT: axis (numpy) and dim (torch) is identical

# numpy
np_concate = np.concatenate([np_array_1, np_array_2], axis=0)
print ('----numpy----')
print (np_concate)

# torch
torch_concate= torch.cat([torch_tensor_1, torch_tensor_2], dim=0)
print ('----torch----')
print (torch_concate)

----numpy----
[1 2 3 4 5 6 7 8]
----torch----
tensor([1, 2, 3, 4, 5, 6, 7, 8])

torch_concate2= torch.cat([torch_tensor_1[:,None], torch_tensor_2[:,None]], dim=1)
print ('----torch----')
print (torch_concate2)

----torch----
tensor([[1, 5],
        [2, 6],
        [3, 7],
        [4, 8]])

torch_tensor_1_ = torch_tensor_1.reshape(1,4)
torch_tensor_2_ = torch_tensor_2.reshape(1,4)
torch_concate3= torch.cat([torch_tensor_1_, torch_tensor_2_], dim=1)
print ('----torch----')
print (torch_concate3)

----torch----
tensor([[1, 2, 3, 4, 5, 6, 7, 8]])

Example 2) reshape the tensor shape

numpy use X.reshape
torch use X.view
IMPORTANT: axis (numpy) and dim (torch) is identical

# numpy
np_reshaped = np_concate.reshape(4, 2)
print ('----numpy----')
print (np_reshaped)
print (np_reshaped.shape)

# torch
torch_reshaped = torch_concate.view(4, 2)
print ('----torch----')
print (torch_reshaped)
print (torch_reshaped.shape)

----numpy----
[[1 2]
 [3 4]
 [5 6]
 [7 8]]
(4, 2)
----torch----
tensor([[1, 2],
        [3, 4],
        [5, 6],
        [7, 8]])
torch.Size([4, 2])

3) Different operations with same grammer (Confusing operations)

Example) manipulation tensors

Same grammer repeat has different operations

x = np.array([1, 2, 3])
x_repeat = x.repeat(2)

print ('----numpy----')
print (x)
print (x_repeat)

x = torch.tensor([1, 2, 3])
x_repeat = x.repeat(2)

print ('----torch----')
print (x)
print (x_repeat)

# To obtain the same result with np.repeat (will skip explanation: you should be proficient with reshaping operations)
print('----obtain the same result-----')
x_repeat = x.view(3, 1)
print (x_repeat)

x_repeat = x_repeat.repeat(1, 2)
print (x_repeat)

x_repeat = x_repeat.view(-1)
print (x_repeat)

----numpy----
[1 2 3]
[1 1 2 2 3 3]
----torch----
tensor([1, 2, 3])
tensor([1, 2, 3, 1, 2, 3])
----obtain the same result-----
tensor([[1],
        [2],
        [3]])
tensor([[1, 1],
        [2, 2],
        [3, 3]])
tensor([1, 1, 2, 2, 3, 3])

# similar manipulation operation: stack & repeat
x = torch.tensor([1, 2, 3])
x_repeat = x.repeat(4)
x_stack = torch.stack([x, x, x, x])

print (x_repeat)
print (x_stack)
print (x_repeat.view(4, 3)) # reshape x

tensor([1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])
tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])
tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])

2. Tensor operations under GPU utilization

Deep learning frameworks utilize GPUs to accelarate computations.

In this section, we will learn how to utilize GPU in PyTorch

print(torch.cuda.is_available())  # Is GPU accessible?

True

a = torch.ones(3)
b = torch.randn(100, 50, 3)

# Tensors are located in cpu.
print(a.device)
print(b.device)

cpu
cpu

c = a + b

print(c.device)

cpu

# upload a and b to GPU
# Ex. model parameter를 GPU로 옮길 때 쓴다.
a = a.to('cuda')
b = b.to('cuda')

# 특정 cuda로 옮기고 싶으면 a = a.to('cuda:1') 이런 식으로 쓰기

print(a.device)
print(b.device)

cuda:0
cuda:0

c = a + b

# 추가) cpu와 cuda 끼리는 연산하면 error가 난다.

print(c.device)

cuda:0

c = c.to('cpu')

print(c.device)

cpu

3. Autograd

Central to all neural networks in PyTorch is the autograd package.

The autograd package provides automatic differentiation for all operations on Tensors.

torch.Tensor is the central class of the package. If you set its attribute .requires_grad as True, it starts to track all operations on it. When you finish your computation you can call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute.

To stop a tensor from tracking history, you can call .detach() to detach it from the computation history, and to prevent future computation from being tracked.

Example

# gradient를 구하고 싶으면 requires_grad=True를 해야한다.
x = torch.ones(2, 2, requires_grad=True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

y = x + 2
print(y)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)

z = y * y * 3
print(z)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)

out = z.mean()
print(out)

tensor(27., grad_fn=<MeanBackward0>)

y.retain_grad() # Operation graph를 얻는다.
z.retain_grad() # retain_grad()를 하면 memory efficient를 위해 마지막 variable을 제외한 나머지 variable의 gradient는 자동적으로 memory에서 제거한다.
# 실제 모델을 훈련할 때는 memory에서 자동적으로 제거되어야 한다.
out.backward() # Differentation을 수행한다.

print(z.grad)

tensor([[0.2500, 0.2500],
        [0.2500, 0.2500]])

print(y.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

Efficient inference (testing) with torch.no_grad()

To prevent tracking history (and using memory), you can also wrap the code block in with torch.no_grad()

Situation: when gradient calculation is not required e.g., inference\
Solution: use torch.no_grad(), then torch doesn't generate computational graph for back propagation, therefore it is much faster

Tensor와 gradient는 GPU의 memory인 vram에 저장된다. Inference를 할 때에는 gradient가 필요 없으므로 torch.no_grad()를 취한다.

어느 gradient function도 계산하지 않으므로 속도가 빠르다. Backpropagation을 위한 computational graph를 만들지 않는다.

with torch.no_grad(): # You have same value for forward but backward is not available.
    x = torch.ones(2, 2, requires_grad=True)
    y = x + 2
    z = y * y * 3
    out = z.mean()

out

tensor(27.)

out.backward() ## ERROR!!!!: we used torch.no_grad()!!

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

<ipython-input-35-bf3332dd1f01> in <module>
----> 1 out.backward() ## ERROR!!!!: we used torch.no_grad()!!


/usr/local/lib/python3.8/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    486                 inputs=inputs,
    487             )
--> 488         torch.autograd.backward(
    489             self, gradient, retain_graph, create_graph, inputs=inputs
    490         )


/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    195     # some Python versions print out the first line of a multi-line function
    196     # calls in the traceback and some print out the last line
--> 197     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    198         tensors, grad_tensors_, retain_graph, create_graph, inputs,
    199         allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass


RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

4. nn.Module

Using pre-defined modules (subset of models) in PyTorch

import torch.nn as nn

X = torch.tensor([[1., 2., 3.], [4., 5., 6.]])

print (X)
print (X.shape)

tensor([[1., 2., 3.],
        [4., 5., 6.]])
torch.Size([2, 3])

# input dim 3, output dim 1
linear_fn = nn.Linear(3, 1) # fully-connected layer

# Pytorch library가 Linear instance를 제공한다.
linear_fn  # WX + b

Linear(in_features=3, out_features=1, bias=True)

Y = linear_fn(X)
print(Y)
print(Y.shape)

tensor([[1.0339],
        [1.9508]], grad_fn=<AddmmBackward0>)
torch.Size([2, 1])

Y = Y.sum()
print(Y)

tensor(2.9847, grad_fn=<SumBackward0>)

You can use other types of nn.Module in PyTorch

nn.Conv2d
nn.RNNCell
nn.LSTMCell
nn.GRUCell
nn.Transformer;

How can we design a customized model (neural network)?

# First, we define Model using nn.Module.
class Model(nn.Module):
    def __init__(self, input_dim, output_dim, hidden_dim):
      # Parameters are initialized in here.
        super(Model, self).__init__()
        self.linear_1 = nn.Linear(input_dim, hidden_dim) # input_dim -> hidden_dim FC_layer
        self.linear_2 = nn.Linear(hidden_dim, output_dim) # hidden_dim -> output_dim FC_layer
        self.relu = nn.ReLU() # Activation function
    def forward(self, x):
      # Function의 sequence를 결정할 수 있다.
        x = self.linear_1(x)
        x = self.relu(x) # Activation function
        x = self.linear_2(x)
        return x

What is activation function?

They make non-linearity for deep neural networks
Therefore, deep neural networks can approximate complex functions

nn.Sigmoid
nn.ReLU
nn.LeakyReLU
nn.Tanh;

5. MNIST classification with PyTorch (Logistic regression & MLP)

What is MNIST & How to do multi-class classification?

The MNIST database of handwritten digits from 0 to 9, has a training set of 60,000 examples, and a test set of 10,000 examples.

Since we have 10 classes (0~9), current problem can be interpreted as multinomial logistic regression (multi-class classification).

Therefore, we use softmax function to handle multiple class output with cross-entropy loss function.

Load packages

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from torch.utils.data import DataLoader

import torchvision # Handle image conveniently
import torchvision.transforms as transforms # for function like image augmentation

Load datasets for training & testing

# MNIST dataset 
train_dataset = torchvision.datasets.MNIST(root='./', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = torchvision.datasets.MNIST(root='./', train=False, transform=transforms.ToTensor())

# Data loader
# mini batch size
# DataLoader will make batch size and shuffle them.
train_loader = DataLoader(dataset=train_dataset, batch_size=128, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=100, shuffle=False)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz



  0%|          | 0/9912422 [00:00<?, ?it/s]


Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./MNIST/raw/train-labels-idx1-ubyte.gz



  0%|          | 0/28881 [00:00<?, ?it/s]


Extracting ./MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./MNIST/raw/t10k-images-idx3-ubyte.gz



  0%|          | 0/1648877 [00:00<?, ?it/s]


Extracting ./MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST/raw/t10k-labels-idx1-ubyte.gz



  0%|          | 0/4542 [00:00<?, ?it/s]


Extracting ./MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST/raw

Define model (we will use one layer classifier first)

# Define model class
# This model has one hidden layer
class Multinomial_logistic_regression(nn.Module):
    def __init__(self, input_size, output_size):
        super(Multinomial_logistic_regression, self).__init__()
        # Just one linear layer is same as logistic regression.
        # If output_size is not the one, then it is same as multinomial logistic regression.
        self.fc = nn.Linear(input_size, output_size) 
        
    def forward(self, x):
        out = self.fc(x)
        return out

# Generate model
model = Multinomial_logistic_regression(784, 10)  # init(784, 10)
# input dim: 784  / output dim: 10

model

Multinomial_logistic_regression(
  (fc): Linear(in_features=784, out_features=10, bias=True)
)

# Upload model to GPU
model = model.to('cuda')

Define optimizer

Optimization is about finding the best solution (model parameter) that fits the given dataset!

PyTorch optimizer is about which optimization methods to use for training

We will not handle the details in this class. (take "Optimization for AI (AI505)" course)

# Optimizer define
# optimizer = torch.optim.SGD(model.parameters(), lr=0.05) 
optimizer = torch.optim.SGD(model.parameters(), lr=0.05, momentum=0.9)
# toptimizer = orch.optim.Adam(model.parameters(), lr=0.05)

Train the model

# Loss function define (we use cross-entropy)
loss_fn = nn.CrossEntropyLoss()

#Train the model
total_step = len(train_loader)

# 10 epoch means you train total dataset 10 times to the model.
for epoch in range(10):
    for i, (images, labels) in enumerate(train_loader):  # mini batch for loop
        # model이 이미 gpu에 있으므로 input data도 gpu에 upload해야한다.
        # upload to gpu
        images = images.reshape(-1, 28*28).to('cuda')
        labels = labels.to('cuda')
        # Input data, label, model 모두 gpu에 있으면 forward를 수행할 수 있다. 
        # Forward
        outputs = model(images)  # forwardI(images): get prediction
        # Forward를 하면, model parameters tensor는 backward gradient function을 가지게 된다.
        loss = loss_fn(outputs, labels)  # calculate the loss (crossentropy loss) with ground truth & prediction value
        
        # Backward and optimize
        optimizer.zero_grad() # backward하기 전에 모든 opimizer의 gradient를 제거해야한다.
        loss.backward()  # automatic gradient calculation (autograd)
        optimizer.step()  # update model parameter which has requires_grad=True following the optimizer.
        
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, 10, i+1, total_step, loss.item()))

Epoch [1/10], Step [100/469], Loss: 0.3738
Epoch [1/10], Step [200/469], Loss: 0.4604
Epoch [1/10], Step [300/469], Loss: 0.3036
Epoch [1/10], Step [400/469], Loss: 0.3576
Epoch [2/10], Step [100/469], Loss: 0.2903
Epoch [2/10], Step [200/469], Loss: 0.4076
Epoch [2/10], Step [300/469], Loss: 0.3294
Epoch [2/10], Step [400/469], Loss: 0.2468
Epoch [3/10], Step [100/469], Loss: 0.4098
Epoch [3/10], Step [200/469], Loss: 0.2105
Epoch [3/10], Step [300/469], Loss: 0.2060
Epoch [3/10], Step [400/469], Loss: 0.2683
Epoch [4/10], Step [100/469], Loss: 0.4881
Epoch [4/10], Step [200/469], Loss: 0.1585
Epoch [4/10], Step [300/469], Loss: 0.2492
Epoch [4/10], Step [400/469], Loss: 0.1930
Epoch [5/10], Step [100/469], Loss: 0.3474
Epoch [5/10], Step [200/469], Loss: 0.2161
Epoch [5/10], Step [300/469], Loss: 0.2542
Epoch [5/10], Step [400/469], Loss: 0.2496
Epoch [6/10], Step [100/469], Loss: 0.2382
Epoch [6/10], Step [200/469], Loss: 0.2426
Epoch [6/10], Step [300/469], Loss: 0.3540
Epoch [6/10], Step [400/469], Loss: 0.2872
Epoch [7/10], Step [100/469], Loss: 0.2420
Epoch [7/10], Step [200/469], Loss: 0.2859
Epoch [7/10], Step [300/469], Loss: 0.3094
Epoch [7/10], Step [400/469], Loss: 0.2397
Epoch [8/10], Step [100/469], Loss: 0.2120
Epoch [8/10], Step [200/469], Loss: 0.3133
Epoch [8/10], Step [300/469], Loss: 0.2710
Epoch [8/10], Step [400/469], Loss: 0.2138
Epoch [9/10], Step [100/469], Loss: 0.3754
Epoch [9/10], Step [200/469], Loss: 0.3442
Epoch [9/10], Step [300/469], Loss: 0.2198
Epoch [9/10], Step [400/469], Loss: 0.3325
Epoch [10/10], Step [100/469], Loss: 0.3301
Epoch [10/10], Step [200/469], Loss: 0.2388
Epoch [10/10], Step [300/469], Loss: 0.2035
Epoch [10/10], Step [400/469], Loss: 0.1956

Test the model

# Test the model
# Inference process
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 28*28).to('cuda')
        labels = labels.to('cuda')
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)  # classificatoin model -> get the label prediction of top 1 
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))

Accuracy of the network on the 10000 test images: 92.51 %

New model: MLP (multi-layer-perceptron)

Previous model used multinomial logistic regression (one linear layer)\
What if we use MLP (multi-layer-perceptron)? A neural network with hidden layers?

# New model with multi layer
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(NeuralNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size) 
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, output_size)
        self.sigmoid = nn.Sigmoid()  # sigmoid activation function (you can customize)
    
    def forward(self, x):
        out = self.fc1(x)
        out = self.sigmoid(out)
        out = self.fc2(out)
        out = self.sigmoid(out)
        out = self.fc3(out)
        return out

# Generate model
model = NeuralNet(784, 20, 10)  # init(784, 20, 10)
# input dim: 784  / hidden dim: 20  / output dim: 10

# Upload model to GPU
model = model.to('cuda')

# Loss function define (we use cross-entropy)
loss_fn = nn.CrossEntropyLoss()

# Define optimizer
# optimizer = torch.optim.SGD(model.parameters(), lr=0.05) 
optimizer = torch.optim.SGD(model.parameters(), lr=0.05, momentum=0.9)
# optimizer = torch.optim.Adam(model.parameters(), lr=0.05)

# Train the model
total_step = len(train_loader)

for epoch in range(10):
    for i, (images, labels) in enumerate(train_loader):  # mini batch for loop
        # upload to gpu
        images = images.reshape(-1, 28*28).to('cuda')
        labels = labels.to('cuda')
        
        # Forward
        outputs = model(images)  # forwardI(images): get prediction
        loss = loss_fn(outputs, labels)  # calculate the loss (crossentropy loss) with ground truth & prediction value
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()  # automatic gradient calculation (autograd)
        optimizer.step()  # update model parameter with requires_grad=True 
        
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, 10, i+1, total_step, loss.item()))

Epoch [1/10], Step [100/469], Loss: 2.2893
Epoch [1/10], Step [200/469], Loss: 1.9973
Epoch [1/10], Step [300/469], Loss: 1.3428
Epoch [1/10], Step [400/469], Loss: 0.8957
Epoch [2/10], Step [100/469], Loss: 0.5629
Epoch [2/10], Step [200/469], Loss: 0.4676
Epoch [2/10], Step [300/469], Loss: 0.3804
Epoch [2/10], Step [400/469], Loss: 0.3832
Epoch [3/10], Step [100/469], Loss: 0.3103
Epoch [3/10], Step [200/469], Loss: 0.2726
Epoch [3/10], Step [300/469], Loss: 0.3784
Epoch [3/10], Step [400/469], Loss: 0.2654
Epoch [4/10], Step [100/469], Loss: 0.3592
Epoch [4/10], Step [200/469], Loss: 0.3668
Epoch [4/10], Step [300/469], Loss: 0.2052
Epoch [4/10], Step [400/469], Loss: 0.3435
Epoch [5/10], Step [100/469], Loss: 0.2059
Epoch [5/10], Step [200/469], Loss: 0.3557
Epoch [5/10], Step [300/469], Loss: 0.3626
Epoch [5/10], Step [400/469], Loss: 0.2820
Epoch [6/10], Step [100/469], Loss: 0.4171
Epoch [6/10], Step [200/469], Loss: 0.2445
Epoch [6/10], Step [300/469], Loss: 0.1477
Epoch [6/10], Step [400/469], Loss: 0.2845
Epoch [7/10], Step [100/469], Loss: 0.0930
Epoch [7/10], Step [200/469], Loss: 0.2193
Epoch [7/10], Step [300/469], Loss: 0.3161
Epoch [7/10], Step [400/469], Loss: 0.1741
Epoch [8/10], Step [100/469], Loss: 0.1585
Epoch [8/10], Step [200/469], Loss: 0.2300
Epoch [8/10], Step [300/469], Loss: 0.1631
Epoch [8/10], Step [400/469], Loss: 0.2610
Epoch [9/10], Step [100/469], Loss: 0.1597
Epoch [9/10], Step [200/469], Loss: 0.2408
Epoch [9/10], Step [300/469], Loss: 0.2199
Epoch [9/10], Step [400/469], Loss: 0.1732
Epoch [10/10], Step [100/469], Loss: 0.0823
Epoch [10/10], Step [200/469], Loss: 0.1927
Epoch [10/10], Step [300/469], Loss: 0.1641
Epoch [10/10], Step [400/469], Loss: 0.2774

# Test the model
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 28*28).to('cuda')
        labels = labels.to('cuda')
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)  # classificatoin model -> get the label prediction of top 1 
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))

Accuracy of the network on the 10000 test images: 94.66 %

Change the following options to obtain better accuracy!! (try it by your-self)

(1) Model configurations

size of hidden layer units
number of layers
type of activation function (e.g., relu, tanh, softplus etc.)

(2) Optimization configurations

learning rate
epoch
type of optimizer
momentem hyperparameter

class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(NeuralNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size) 
        self.fc3 = nn.Linear(hidden_size, output_size)
        self.ReLU = nn.ReLU()  # sigmoid activation function (you can customize)
    
    def forward(self, x):
        out = self.fc1(x)
        out = self.ReLU(out)
        out = self.fc2(out)
        out = self.ReLU(out)
        out = self.fc2(out)
        out = self.ReLU(out)
        out = self.fc3(out)
        return out

# Generate model
model = NeuralNet(784, 20, 10)  # init(784, 20, 10)
# input dim: 784  / hidden_dim: 20/ output dim: 10

# Upload model to GPU
model = model.to('cuda')

# Loss function define (we use cross-entropy)
loss_fn = nn.CrossEntropyLoss()

# Define optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.05, momentum=0.9)

# Train the model
total_step = len(train_loader)

for epoch in range(10):
    for i, (images, labels) in enumerate(train_loader):  # mini batch for loop
        # upload to gpu
        images = images.reshape(-1, 28*28).to('cuda')
        labels = labels.to('cuda')
        
        # Forward
        outputs = model(images)  # forwardI(images): get prediction
        loss = loss_fn(outputs, labels)  # calculate the loss (crossentropy loss) with ground truth & prediction value
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()  # automatic gradient calculation (autograd)
        optimizer.step()  # update model parameter with requires_grad=True 
        
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, 10, i+1, total_step, loss.item()))

Epoch [1/10], Step [100/469], Loss: 0.6402
Epoch [1/10], Step [200/469], Loss: 0.4866
Epoch [1/10], Step [300/469], Loss: 0.2448
Epoch [1/10], Step [400/469], Loss: 0.3097
Epoch [2/10], Step [100/469], Loss: 0.2207
Epoch [2/10], Step [200/469], Loss: 0.1231
Epoch [2/10], Step [300/469], Loss: 0.2906
Epoch [2/10], Step [400/469], Loss: 0.1964
Epoch [3/10], Step [100/469], Loss: 0.3075
Epoch [3/10], Step [200/469], Loss: 0.1527
Epoch [3/10], Step [300/469], Loss: 0.3046
Epoch [3/10], Step [400/469], Loss: 0.2457
Epoch [4/10], Step [100/469], Loss: 0.1330
Epoch [4/10], Step [200/469], Loss: 0.1601
Epoch [4/10], Step [300/469], Loss: 0.2634
Epoch [4/10], Step [400/469], Loss: 0.0997
Epoch [5/10], Step [100/469], Loss: 0.2139
Epoch [5/10], Step [200/469], Loss: 0.1465
Epoch [5/10], Step [300/469], Loss: 0.1549
Epoch [5/10], Step [400/469], Loss: 0.1493
Epoch [6/10], Step [100/469], Loss: 0.1863
Epoch [6/10], Step [200/469], Loss: 0.2241
Epoch [6/10], Step [300/469], Loss: 0.0877
Epoch [6/10], Step [400/469], Loss: 0.1304
Epoch [7/10], Step [100/469], Loss: 0.1734
Epoch [7/10], Step [200/469], Loss: 0.2090
Epoch [7/10], Step [300/469], Loss: 0.2542
Epoch [7/10], Step [400/469], Loss: 0.1492
Epoch [8/10], Step [100/469], Loss: 0.1260
Epoch [8/10], Step [200/469], Loss: 0.0807
Epoch [8/10], Step [300/469], Loss: 0.0383
Epoch [8/10], Step [400/469], Loss: 0.2181
Epoch [9/10], Step [100/469], Loss: 0.2047
Epoch [9/10], Step [200/469], Loss: 0.1175
Epoch [9/10], Step [300/469], Loss: 0.0380
Epoch [9/10], Step [400/469], Loss: 0.1478
Epoch [10/10], Step [100/469], Loss: 0.0786
Epoch [10/10], Step [200/469], Loss: 0.2313
Epoch [10/10], Step [300/469], Loss: 0.0742
Epoch [10/10], Step [400/469], Loss: 0.1136

# Test the model
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 28*28).to('cuda')
        labels = labels.to('cuda')
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)  # classificatoin model -> get the label prediction of top 1 
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))

Accuracy of the network on the 10000 test images: 95.17 %

Reference

AI504: Programming for AI Lecture at KAIST AI

이채연

AI researcher

이전 포스트

[AI504] Class 03: Neural Networks & Backpropagation

다음 포스트