If you have any questions, feel free to ask
# visualize current GPU usages in your server
!nvidia-smi
Mon Mar 6 13:40:40 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 44C P0 27W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
# set gpu by number
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0' # setting gpu number
# load packages
!pip install torch
!pip install numpy
import torch
import numpy as np
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: torch in /usr/local/lib/python3.8/dist-packages (1.13.1+cu116)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.8/dist-packages (from torch) (4.5.0)
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: numpy in /usr/local/lib/python3.8/dist-packages (1.22.4)
# print the version of PyTorch
print(torch.__version__)
1.13.1+cu116
PyTorch use tensor: the basic data structure in PyTorch.
Tensor: n-dimensional array + GPU calculation is supported
Almost the same with Numpy array
We will show some examples of:
We will not handle all examples in this class :(
First! Define Numpy array and PyTorch tensor
np_array_1 = np.array([1, 2, 3, 4])
np_array_2 = np.array([5, 6, 7, 8])
torch_tensor_1 = torch.tensor([1, 2, 3, 4])
torch_tensor_2 = torch.tensor([5 ,6 ,7, 8])
print (np_array_1)
print (np_array_2)
print (torch_tensor_1)
print (torch_tensor_2)
[1 2 3 4]
[5 6 7 8]
tensor([1, 2, 3, 4])
tensor([5, 6, 7, 8])
1) Same operations with identical grammer
Example) Get the shape of the tensor
# numpy
print (np_array_1.shape)
# torch
print (torch_tensor_1.shape)
print (torch_tensor_1.size()) # size() and shape operation is identical in torch
(4,)
torch.Size([4])
torch.Size([4])
2) Same operations with different grammer
Example 1) Concatenate two tensors
np.concatenate
torch.cat
# numpy
np_concate = np.concatenate([np_array_1, np_array_2], axis=0)
print ('----numpy----')
print (np_concate)
# torch
torch_concate= torch.cat([torch_tensor_1, torch_tensor_2], dim=0)
print ('----torch----')
print (torch_concate)
----numpy----
[1 2 3 4 5 6 7 8]
----torch----
tensor([1, 2, 3, 4, 5, 6, 7, 8])
torch_concate2= torch.cat([torch_tensor_1[:,None], torch_tensor_2[:,None]], dim=1)
print ('----torch----')
print (torch_concate2)
----torch----
tensor([[1, 5],
[2, 6],
[3, 7],
[4, 8]])
torch_tensor_1_ = torch_tensor_1.reshape(1,4)
torch_tensor_2_ = torch_tensor_2.reshape(1,4)
torch_concate3= torch.cat([torch_tensor_1_, torch_tensor_2_], dim=1)
print ('----torch----')
print (torch_concate3)
----torch----
tensor([[1, 2, 3, 4, 5, 6, 7, 8]])
Example 2) reshape the tensor shape
X.reshape
X.view
# numpy
np_reshaped = np_concate.reshape(4, 2)
print ('----numpy----')
print (np_reshaped)
print (np_reshaped.shape)
# torch
torch_reshaped = torch_concate.view(4, 2)
print ('----torch----')
print (torch_reshaped)
print (torch_reshaped.shape)
----numpy----
[[1 2]
[3 4]
[5 6]
[7 8]]
(4, 2)
----torch----
tensor([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
torch.Size([4, 2])
3) Different operations with same grammer (Confusing operations)
Example) manipulation tensors
repeat
has different operationsx = np.array([1, 2, 3])
x_repeat = x.repeat(2)
print ('----numpy----')
print (x)
print (x_repeat)
x = torch.tensor([1, 2, 3])
x_repeat = x.repeat(2)
print ('----torch----')
print (x)
print (x_repeat)
# To obtain the same result with np.repeat (will skip explanation: you should be proficient with reshaping operations)
print('----obtain the same result-----')
x_repeat = x.view(3, 1)
print (x_repeat)
x_repeat = x_repeat.repeat(1, 2)
print (x_repeat)
x_repeat = x_repeat.view(-1)
print (x_repeat)
----numpy----
[1 2 3]
[1 1 2 2 3 3]
----torch----
tensor([1, 2, 3])
tensor([1, 2, 3, 1, 2, 3])
----obtain the same result-----
tensor([[1],
[2],
[3]])
tensor([[1, 1],
[2, 2],
[3, 3]])
tensor([1, 1, 2, 2, 3, 3])
# similar manipulation operation: stack & repeat
x = torch.tensor([1, 2, 3])
x_repeat = x.repeat(4)
x_stack = torch.stack([x, x, x, x])
print (x_repeat)
print (x_stack)
print (x_repeat.view(4, 3)) # reshape x
tensor([1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])
tensor([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
tensor([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
Deep learning frameworks utilize GPUs to accelarate computations.
In this section, we will learn how to utilize GPU in PyTorch
print(torch.cuda.is_available()) # Is GPU accessible?
True
a = torch.ones(3)
b = torch.randn(100, 50, 3)
# Tensors are located in cpu.
print(a.device)
print(b.device)
cpu
cpu
c = a + b
print(c.device)
cpu
# upload a and b to GPU
# Ex. model parameter를 GPU로 옮길 때 쓴다.
a = a.to('cuda')
b = b.to('cuda')
# 특정 cuda로 옮기고 싶으면 a = a.to('cuda:1') 이런 식으로 쓰기
print(a.device)
print(b.device)
cuda:0
cuda:0
c = a + b
# 추가) cpu와 cuda 끼리는 연산하면 error가 난다.
print(c.device)
cuda:0
c = c.to('cpu')
print(c.device)
cpu
Central to all neural networks in PyTorch is the autograd
package.
The autograd
package provides automatic differentiation for all operations on Tensors.
torch.Tensor
is the central class of the package. If you set its attribute .requires_grad
as True, it starts to track all operations on it. When you finish your computation you can call .backward()
and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad
attribute.
To stop a tensor from tracking history, you can call .detach()
to detach it from the computation history, and to prevent future computation from being tracked.
# gradient를 구하고 싶으면 requires_grad=True를 해야한다.
x = torch.ones(2, 2, requires_grad=True)
print(x)
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
y = x + 2
print(y)
tensor([[3., 3.],
[3., 3.]], grad_fn=<AddBackward0>)
z = y * y * 3
print(z)
tensor([[27., 27.],
[27., 27.]], grad_fn=<MulBackward0>)
out = z.mean()
print(out)
tensor(27., grad_fn=<MeanBackward0>)
y.retain_grad() # Operation graph를 얻는다.
z.retain_grad() # retain_grad()를 하면 memory efficient를 위해 마지막 variable을 제외한 나머지 variable의 gradient는 자동적으로 memory에서 제거한다.
# 실제 모델을 훈련할 때는 memory에서 자동적으로 제거되어야 한다.
out.backward() # Differentation을 수행한다.
print(z.grad)
tensor([[0.2500, 0.2500],
[0.2500, 0.2500]])
print(y.grad)
tensor([[4.5000, 4.5000],
[4.5000, 4.5000]])
print(x.grad)
tensor([[4.5000, 4.5000],
[4.5000, 4.5000]])
To prevent tracking history (and using memory), you can also wrap the code block in with torch.no_grad()
Situation: when gradient calculation is not required e.g., inference\
Solution: use torch.no_grad()
, then torch doesn't generate computational graph for back propagation, therefore it is much faster
Tensor와 gradient는 GPU의 memory인 vram에 저장된다. Inference를 할 때에는 gradient가 필요 없으므로 torch.no_grad()를 취한다.
어느 gradient function도 계산하지 않으므로 속도가 빠르다. Backpropagation을 위한 computational graph를 만들지 않는다.
with torch.no_grad(): # You have same value for forward but backward is not available.
x = torch.ones(2, 2, requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean()
out
tensor(27.)
out.backward() ## ERROR!!!!: we used torch.no_grad()!!
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-35-bf3332dd1f01> in <module>
----> 1 out.backward() ## ERROR!!!!: we used torch.no_grad()!!
/usr/local/lib/python3.8/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
486 inputs=inputs,
487 )
--> 488 torch.autograd.backward(
489 self, gradient, retain_graph, create_graph, inputs=inputs
490 )
/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
195 # some Python versions print out the first line of a multi-line function
196 # calls in the traceback and some print out the last line
--> 197 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
198 tensors, grad_tensors_, retain_graph, create_graph, inputs,
199 allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
import torch.nn as nn
X = torch.tensor([[1., 2., 3.], [4., 5., 6.]])
print (X)
print (X.shape)
tensor([[1., 2., 3.],
[4., 5., 6.]])
torch.Size([2, 3])
# input dim 3, output dim 1
linear_fn = nn.Linear(3, 1) # fully-connected layer
# Pytorch library가 Linear instance를 제공한다.
linear_fn # WX + b
Linear(in_features=3, out_features=1, bias=True)
Y = linear_fn(X)
print(Y)
print(Y.shape)
tensor([[1.0339],
[1.9508]], grad_fn=<AddmmBackward0>)
torch.Size([2, 1])
Y = Y.sum()
print(Y)
tensor(2.9847, grad_fn=<SumBackward0>)
You can use other types of nn.Module
in PyTorch
nn.Conv2d
nn.RNNCell
nn.LSTMCell
nn.GRUCell
nn.Transformer;
# First, we define Model using nn.Module.
class Model(nn.Module):
def __init__(self, input_dim, output_dim, hidden_dim):
# Parameters are initialized in here.
super(Model, self).__init__()
self.linear_1 = nn.Linear(input_dim, hidden_dim) # input_dim -> hidden_dim FC_layer
self.linear_2 = nn.Linear(hidden_dim, output_dim) # hidden_dim -> output_dim FC_layer
self.relu = nn.ReLU() # Activation function
def forward(self, x):
# Function의 sequence를 결정할 수 있다.
x = self.linear_1(x)
x = self.relu(x) # Activation function
x = self.linear_2(x)
return x
What is activation function?
nn.Sigmoid
nn.ReLU
nn.LeakyReLU
nn.Tanh;
The MNIST database of handwritten digits from 0 to 9, has a training set of 60,000 examples, and a test set of 10,000 examples.
Since we have 10 classes (0~9), current problem can be interpreted as multinomial logistic regression (multi-class classification).
Therefore, we use softmax function to handle multiple class output with cross-entropy loss function.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
import torchvision # Handle image conveniently
import torchvision.transforms as transforms # for function like image augmentation
# MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='./', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = torchvision.datasets.MNIST(root='./', train=False, transform=transforms.ToTensor())
# Data loader
# mini batch size
# DataLoader will make batch size and shuffle them.
train_loader = DataLoader(dataset=train_dataset, batch_size=128, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=100, shuffle=False)
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz
0%| | 0/9912422 [00:00<?, ?it/s]
Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./MNIST/raw/train-labels-idx1-ubyte.gz
0%| | 0/28881 [00:00<?, ?it/s]
Extracting ./MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./MNIST/raw/t10k-images-idx3-ubyte.gz
0%| | 0/1648877 [00:00<?, ?it/s]
Extracting ./MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST/raw/t10k-labels-idx1-ubyte.gz
0%| | 0/4542 [00:00<?, ?it/s]
Extracting ./MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST/raw
# Define model class
# This model has one hidden layer
class Multinomial_logistic_regression(nn.Module):
def __init__(self, input_size, output_size):
super(Multinomial_logistic_regression, self).__init__()
# Just one linear layer is same as logistic regression.
# If output_size is not the one, then it is same as multinomial logistic regression.
self.fc = nn.Linear(input_size, output_size)
def forward(self, x):
out = self.fc(x)
return out
# Generate model
model = Multinomial_logistic_regression(784, 10) # init(784, 10)
# input dim: 784 / output dim: 10
model
Multinomial_logistic_regression(
(fc): Linear(in_features=784, out_features=10, bias=True)
)
# Upload model to GPU
model = model.to('cuda')
Optimization is about finding the best solution (model parameter) that fits the given dataset!
PyTorch optimizer is about which optimization methods to use for training
We will not handle the details in this class. (take "Optimization for AI (AI505)" course)
# Optimizer define
# optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
optimizer = torch.optim.SGD(model.parameters(), lr=0.05, momentum=0.9)
# toptimizer = orch.optim.Adam(model.parameters(), lr=0.05)
# Loss function define (we use cross-entropy)
loss_fn = nn.CrossEntropyLoss()
#Train the model
total_step = len(train_loader)
# 10 epoch means you train total dataset 10 times to the model.
for epoch in range(10):
for i, (images, labels) in enumerate(train_loader): # mini batch for loop
# model이 이미 gpu에 있으므로 input data도 gpu에 upload해야한다.
# upload to gpu
images = images.reshape(-1, 28*28).to('cuda')
labels = labels.to('cuda')
# Input data, label, model 모두 gpu에 있으면 forward를 수행할 수 있다.
# Forward
outputs = model(images) # forwardI(images): get prediction
# Forward를 하면, model parameters tensor는 backward gradient function을 가지게 된다.
loss = loss_fn(outputs, labels) # calculate the loss (crossentropy loss) with ground truth & prediction value
# Backward and optimize
optimizer.zero_grad() # backward하기 전에 모든 opimizer의 gradient를 제거해야한다.
loss.backward() # automatic gradient calculation (autograd)
optimizer.step() # update model parameter which has requires_grad=True following the optimizer.
if (i+1) % 100 == 0:
print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
.format(epoch+1, 10, i+1, total_step, loss.item()))
Epoch [1/10], Step [100/469], Loss: 0.3738
Epoch [1/10], Step [200/469], Loss: 0.4604
Epoch [1/10], Step [300/469], Loss: 0.3036
Epoch [1/10], Step [400/469], Loss: 0.3576
Epoch [2/10], Step [100/469], Loss: 0.2903
Epoch [2/10], Step [200/469], Loss: 0.4076
Epoch [2/10], Step [300/469], Loss: 0.3294
Epoch [2/10], Step [400/469], Loss: 0.2468
Epoch [3/10], Step [100/469], Loss: 0.4098
Epoch [3/10], Step [200/469], Loss: 0.2105
Epoch [3/10], Step [300/469], Loss: 0.2060
Epoch [3/10], Step [400/469], Loss: 0.2683
Epoch [4/10], Step [100/469], Loss: 0.4881
Epoch [4/10], Step [200/469], Loss: 0.1585
Epoch [4/10], Step [300/469], Loss: 0.2492
Epoch [4/10], Step [400/469], Loss: 0.1930
Epoch [5/10], Step [100/469], Loss: 0.3474
Epoch [5/10], Step [200/469], Loss: 0.2161
Epoch [5/10], Step [300/469], Loss: 0.2542
Epoch [5/10], Step [400/469], Loss: 0.2496
Epoch [6/10], Step [100/469], Loss: 0.2382
Epoch [6/10], Step [200/469], Loss: 0.2426
Epoch [6/10], Step [300/469], Loss: 0.3540
Epoch [6/10], Step [400/469], Loss: 0.2872
Epoch [7/10], Step [100/469], Loss: 0.2420
Epoch [7/10], Step [200/469], Loss: 0.2859
Epoch [7/10], Step [300/469], Loss: 0.3094
Epoch [7/10], Step [400/469], Loss: 0.2397
Epoch [8/10], Step [100/469], Loss: 0.2120
Epoch [8/10], Step [200/469], Loss: 0.3133
Epoch [8/10], Step [300/469], Loss: 0.2710
Epoch [8/10], Step [400/469], Loss: 0.2138
Epoch [9/10], Step [100/469], Loss: 0.3754
Epoch [9/10], Step [200/469], Loss: 0.3442
Epoch [9/10], Step [300/469], Loss: 0.2198
Epoch [9/10], Step [400/469], Loss: 0.3325
Epoch [10/10], Step [100/469], Loss: 0.3301
Epoch [10/10], Step [200/469], Loss: 0.2388
Epoch [10/10], Step [300/469], Loss: 0.2035
Epoch [10/10], Step [400/469], Loss: 0.1956
# Test the model
# Inference process
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
correct = 0
total = 0
for images, labels in test_loader:
images = images.reshape(-1, 28*28).to('cuda')
labels = labels.to('cuda')
outputs = model(images)
_, predicted = torch.max(outputs.data, 1) # classificatoin model -> get the label prediction of top 1
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))
Accuracy of the network on the 10000 test images: 92.51 %
Previous model used multinomial logistic regression (one linear layer)\
What if we use MLP (multi-layer-perceptron)? A neural network with hidden layers?
# New model with multi layer
class NeuralNet(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(NeuralNet, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.fc2 = nn.Linear(hidden_size, hidden_size)
self.fc3 = nn.Linear(hidden_size, output_size)
self.sigmoid = nn.Sigmoid() # sigmoid activation function (you can customize)
def forward(self, x):
out = self.fc1(x)
out = self.sigmoid(out)
out = self.fc2(out)
out = self.sigmoid(out)
out = self.fc3(out)
return out
# Generate model
model = NeuralNet(784, 20, 10) # init(784, 20, 10)
# input dim: 784 / hidden dim: 20 / output dim: 10
# Upload model to GPU
model = model.to('cuda')
# Loss function define (we use cross-entropy)
loss_fn = nn.CrossEntropyLoss()
# Define optimizer
# optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
optimizer = torch.optim.SGD(model.parameters(), lr=0.05, momentum=0.9)
# optimizer = torch.optim.Adam(model.parameters(), lr=0.05)
# Train the model
total_step = len(train_loader)
for epoch in range(10):
for i, (images, labels) in enumerate(train_loader): # mini batch for loop
# upload to gpu
images = images.reshape(-1, 28*28).to('cuda')
labels = labels.to('cuda')
# Forward
outputs = model(images) # forwardI(images): get prediction
loss = loss_fn(outputs, labels) # calculate the loss (crossentropy loss) with ground truth & prediction value
# Backward and optimize
optimizer.zero_grad()
loss.backward() # automatic gradient calculation (autograd)
optimizer.step() # update model parameter with requires_grad=True
if (i+1) % 100 == 0:
print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
.format(epoch+1, 10, i+1, total_step, loss.item()))
Epoch [1/10], Step [100/469], Loss: 2.2893
Epoch [1/10], Step [200/469], Loss: 1.9973
Epoch [1/10], Step [300/469], Loss: 1.3428
Epoch [1/10], Step [400/469], Loss: 0.8957
Epoch [2/10], Step [100/469], Loss: 0.5629
Epoch [2/10], Step [200/469], Loss: 0.4676
Epoch [2/10], Step [300/469], Loss: 0.3804
Epoch [2/10], Step [400/469], Loss: 0.3832
Epoch [3/10], Step [100/469], Loss: 0.3103
Epoch [3/10], Step [200/469], Loss: 0.2726
Epoch [3/10], Step [300/469], Loss: 0.3784
Epoch [3/10], Step [400/469], Loss: 0.2654
Epoch [4/10], Step [100/469], Loss: 0.3592
Epoch [4/10], Step [200/469], Loss: 0.3668
Epoch [4/10], Step [300/469], Loss: 0.2052
Epoch [4/10], Step [400/469], Loss: 0.3435
Epoch [5/10], Step [100/469], Loss: 0.2059
Epoch [5/10], Step [200/469], Loss: 0.3557
Epoch [5/10], Step [300/469], Loss: 0.3626
Epoch [5/10], Step [400/469], Loss: 0.2820
Epoch [6/10], Step [100/469], Loss: 0.4171
Epoch [6/10], Step [200/469], Loss: 0.2445
Epoch [6/10], Step [300/469], Loss: 0.1477
Epoch [6/10], Step [400/469], Loss: 0.2845
Epoch [7/10], Step [100/469], Loss: 0.0930
Epoch [7/10], Step [200/469], Loss: 0.2193
Epoch [7/10], Step [300/469], Loss: 0.3161
Epoch [7/10], Step [400/469], Loss: 0.1741
Epoch [8/10], Step [100/469], Loss: 0.1585
Epoch [8/10], Step [200/469], Loss: 0.2300
Epoch [8/10], Step [300/469], Loss: 0.1631
Epoch [8/10], Step [400/469], Loss: 0.2610
Epoch [9/10], Step [100/469], Loss: 0.1597
Epoch [9/10], Step [200/469], Loss: 0.2408
Epoch [9/10], Step [300/469], Loss: 0.2199
Epoch [9/10], Step [400/469], Loss: 0.1732
Epoch [10/10], Step [100/469], Loss: 0.0823
Epoch [10/10], Step [200/469], Loss: 0.1927
Epoch [10/10], Step [300/469], Loss: 0.1641
Epoch [10/10], Step [400/469], Loss: 0.2774
# Test the model
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
correct = 0
total = 0
for images, labels in test_loader:
images = images.reshape(-1, 28*28).to('cuda')
labels = labels.to('cuda')
outputs = model(images)
_, predicted = torch.max(outputs.data, 1) # classificatoin model -> get the label prediction of top 1
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))
Accuracy of the network on the 10000 test images: 94.66 %
class NeuralNet(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(NeuralNet, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.fc2 = nn.Linear(hidden_size, hidden_size)
self.fc3 = nn.Linear(hidden_size, output_size)
self.ReLU = nn.ReLU() # sigmoid activation function (you can customize)
def forward(self, x):
out = self.fc1(x)
out = self.ReLU(out)
out = self.fc2(out)
out = self.ReLU(out)
out = self.fc2(out)
out = self.ReLU(out)
out = self.fc3(out)
return out
# Generate model
model = NeuralNet(784, 20, 10) # init(784, 20, 10)
# input dim: 784 / hidden_dim: 20/ output dim: 10
# Upload model to GPU
model = model.to('cuda')
# Loss function define (we use cross-entropy)
loss_fn = nn.CrossEntropyLoss()
# Define optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.05, momentum=0.9)
# Train the model
total_step = len(train_loader)
for epoch in range(10):
for i, (images, labels) in enumerate(train_loader): # mini batch for loop
# upload to gpu
images = images.reshape(-1, 28*28).to('cuda')
labels = labels.to('cuda')
# Forward
outputs = model(images) # forwardI(images): get prediction
loss = loss_fn(outputs, labels) # calculate the loss (crossentropy loss) with ground truth & prediction value
# Backward and optimize
optimizer.zero_grad()
loss.backward() # automatic gradient calculation (autograd)
optimizer.step() # update model parameter with requires_grad=True
if (i+1) % 100 == 0:
print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
.format(epoch+1, 10, i+1, total_step, loss.item()))
Epoch [1/10], Step [100/469], Loss: 0.6402
Epoch [1/10], Step [200/469], Loss: 0.4866
Epoch [1/10], Step [300/469], Loss: 0.2448
Epoch [1/10], Step [400/469], Loss: 0.3097
Epoch [2/10], Step [100/469], Loss: 0.2207
Epoch [2/10], Step [200/469], Loss: 0.1231
Epoch [2/10], Step [300/469], Loss: 0.2906
Epoch [2/10], Step [400/469], Loss: 0.1964
Epoch [3/10], Step [100/469], Loss: 0.3075
Epoch [3/10], Step [200/469], Loss: 0.1527
Epoch [3/10], Step [300/469], Loss: 0.3046
Epoch [3/10], Step [400/469], Loss: 0.2457
Epoch [4/10], Step [100/469], Loss: 0.1330
Epoch [4/10], Step [200/469], Loss: 0.1601
Epoch [4/10], Step [300/469], Loss: 0.2634
Epoch [4/10], Step [400/469], Loss: 0.0997
Epoch [5/10], Step [100/469], Loss: 0.2139
Epoch [5/10], Step [200/469], Loss: 0.1465
Epoch [5/10], Step [300/469], Loss: 0.1549
Epoch [5/10], Step [400/469], Loss: 0.1493
Epoch [6/10], Step [100/469], Loss: 0.1863
Epoch [6/10], Step [200/469], Loss: 0.2241
Epoch [6/10], Step [300/469], Loss: 0.0877
Epoch [6/10], Step [400/469], Loss: 0.1304
Epoch [7/10], Step [100/469], Loss: 0.1734
Epoch [7/10], Step [200/469], Loss: 0.2090
Epoch [7/10], Step [300/469], Loss: 0.2542
Epoch [7/10], Step [400/469], Loss: 0.1492
Epoch [8/10], Step [100/469], Loss: 0.1260
Epoch [8/10], Step [200/469], Loss: 0.0807
Epoch [8/10], Step [300/469], Loss: 0.0383
Epoch [8/10], Step [400/469], Loss: 0.2181
Epoch [9/10], Step [100/469], Loss: 0.2047
Epoch [9/10], Step [200/469], Loss: 0.1175
Epoch [9/10], Step [300/469], Loss: 0.0380
Epoch [9/10], Step [400/469], Loss: 0.1478
Epoch [10/10], Step [100/469], Loss: 0.0786
Epoch [10/10], Step [200/469], Loss: 0.2313
Epoch [10/10], Step [300/469], Loss: 0.0742
Epoch [10/10], Step [400/469], Loss: 0.1136
# Test the model
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
correct = 0
total = 0
for images, labels in test_loader:
images = images.reshape(-1, 28*28).to('cuda')
labels = labels.to('cuda')
outputs = model(images)
_, predicted = torch.max(outputs.data, 1) # classificatoin model -> get the label prediction of top 1
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))
Accuracy of the network on the 10000 test images: 95.17 %
Reference
- AI504: Programming for AI Lecture at KAIST AI