Today class consists of three things.
1-1. We will make graph by using networx libary.
1-2. by using Adjacency Matrix, Node index and Node embedding vector from graph, We will follow the aggregation and combination step in Graph Convolution Equation.
1-3. Finally We will make GCN layer
2-1. Cora dataset Information
2-2. Implement GCN model with Cora dataset
2-3. Visualize node embedding
3-1. I will introduce some brief information about the code and pytorch geometric.
If you have any questions, feel free to ask
!pip install networkx
Looking in indexes:,
Requirement already satisfied: networkx in /usr/local/lib/python3.9/dist-packages (3.0)
import ipdb
import torch
import networkx as nx
import numpy as np
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
from scipy.linalg import fractional_matrix_power
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
By using networkx library, you can do research in graph or network easily.
So, in the Graph Convolution Equation, I'll use networkx libary.
#1. Initialize the graph
G = nx.Graph(name='G')
<networkx.classes.graph.Graph at 0x7fa9bc302e20>
#2. Create nodes
#In this class, we will make graph that consist of 6 nodes.
#Each node is assigned node feature which corresponds to the node name
for i in range(1,7):
G.add_node(i, name=i)
#Define the edges and the edges to the graph
edges = [(1,2), (1,3), (2,4), (2,5), (3,4), (3,6) ]
#Inspect the node features
print('\nGraph Nodes: ',
Graph Nodes: [(1, {'name': 1}), (2, {'name': 2}), (3, {'name': 3}), (4, {'name': 4}), (5, {'name': 5}), (6, {'name': 6})]
#Plot the graph
nx.draw(G, with_labels=True, font_weight='bold')
# Adjacency Matrix
nx.attr_matrix(G, node_attr='name')
(array([[0., 1., 1., 0., 0., 0.],
[1., 0., 0., 1., 1., 0.],
[1., 0., 0., 1., 0., 1.],
[0., 1., 1., 0., 0., 0.],
[0., 1., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0.]]), [1, 2, 3, 4, 5, 6])
#Get the Adjacency Matrix (A) and Node Features Matrix (X) as numpy array
A = np.array(nx.attr_matrix(G, node_attr='name')[0]) # Converting for getting numpy Adjacency Matrix (A)
X = np.array(nx.attr_matrix(G, node_attr='name')[1]) # Converting for getting numpy Node Features Matrix (X)
X = np.expand_dims(X,axis=1) # Make [6, 1] numpy Node Features Matrix (X)
print('Shape of A: ', A.shape) # [6, 6] matrix
Shape of A: (6, 6)
print('\nShape of X: ', X.shape) # [6, 1] matrix
Shape of X: (6, 1)
print('\nAdjacency Matrix (A):\n', A)
Adjacency Matrix (A):
[[0. 1. 1. 0. 0. 0.]
[1. 0. 0. 1. 1. 0.]
[1. 0. 0. 1. 0. 1.]
[0. 1. 1. 0. 0. 0.]
[0. 1. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0.]]
print('\nNode Features Matrix (X):\n', X)
Node Features Matrix (X):
#Dot product Adjacency Matrix (A) and Node Features (X)
AX =,X) # AX
print("Dot product of A and X (AX):\n", AX)
Dot product of A and X (AX):
[[ 5.]
[ 5.]
[ 2.]
[ 3.]]
A' = A + I
#Add Self Loops
G_self_loops = G.copy() # A'
self_loops = []
for i in range(1, 1+ G.number_of_nodes()):
G_self_loops.add_edges_from(self_loops) # A' = A + I
#Check the edges of G_self_loops after adding the self loops
print('Edges of G with self-loops:\n', G_self_loops.edges)
Edges of G with self-loops:
[(1, 2), (1, 3), (1, 1), (2, 4), (2, 5), (2, 2), (3, 4), (3, 6), (3, 3), (4, 4), (5, 5), (6, 6)]
#Get the Adjacency Matrix (A) and Node Features Matrix (X) of added self-lopps graph
A_hat = np.array(nx.attr_matrix(G_self_loops, node_attr='name')[0]) # A' numpy Matrix
print('Adjacency Matrix of added self-loops G (A_hat):\n', A_hat)
Adjacency Matrix of added self-loops G (A_hat):
[[1. 1. 1. 0. 0. 0.]
[1. 1. 0. 1. 1. 0.]
[1. 0. 1. 1. 0. 1.]
[0. 1. 1. 1. 0. 0.]
[0. 1. 0. 0. 1. 0.]
[0. 0. 1. 0. 0. 1.]]
#Calculate the dot product of A_hat and X (AX)
A_hatX =, X)
print('A_hatX:\n', A_hatX)
[[ 6.]
[ 9.]
[ 7.]
[ 9.]]
#Get the Degree Matrix of the added self-loops graph
edge_List = G_self_loops.edges()
Deg_Mat = [[i, 0] for i in G_self_loops.nodes()]
for element in edge_List:
if element[0] != element[1]:
Deg_Mat[element[0] - 1][1] += 1
Deg_Mat[element[1] - 1][1] += 1
else :
Deg_Mat[element[0] - 1][1] += 1
[[1, 3], [2, 4], [3, 4], [4, 3], [5, 2], [6, 2]]
#Convert the Degree Matrix to a N x N matrix where N is the number of nodes
D = np.diag([deg for [n,deg] in Deg_Mat]) # Get degree matrix
print('Degree Matrix of added self-loops G as numpy array (D):\n', D)
Degree Matrix of added self-loops G as numpy array (D):
[[3 0 0 0 0 0]
[0 4 0 0 0 0]
[0 0 4 0 0 0]
[0 0 0 3 0 0]
[0 0 0 0 2 0]
[0 0 0 0 0 2]]
#Get the inverse of Degree Matrix (D)
D_inv = np.linalg.inv(D)
print('Inverse of D:\n', D_inv)
Inverse of D:
[[0.33333333 0. 0. 0. 0. 0. ]
[0. 0.25 0. 0. 0. 0. ]
[0. 0. 0.25 0. 0. 0. ]
[0. 0. 0. 0.33333333 0. 0. ]
[0. 0. 0. 0. 0.5 0. ]
[0. 0. 0. 0. 0. 0.5 ]]
array([[1., 1., 1., 0., 0., 0.],
[1., 1., 0., 1., 1., 0.],
[1., 0., 1., 1., 0., 1.],
[0., 1., 1., 1., 0., 0.],
[0., 1., 0., 0., 1., 0.],
[0., 0., 1., 0., 0., 1.]])
D_invA =, A_hat)
[[0.33333333 0.33333333 0.33333333 0. 0. 0. ]
[0.25 0.25 0. 0.25 0.25 0. ]
[0.25 0. 0.25 0.25 0. 0.25 ]
[0. 0.33333333 0.33333333 0.33333333 0. 0. ]
[0. 0.5 0. 0. 0.5 0. ]
[0. 0. 0.5 0. 0. 0.5 ]]
#Dot product of D and AX for normalization
DAX =,X)
print('DAXW:\n', DAX)
[[2. ]
[3. ]
[3. ]
#Initialize the weights
n_h = 4 #number of neurons in the hidden layer
n_y = 2 #number of neurons in the output layer
W0 = np.random.randn(X.shape[1],n_h) * 0.01
W1 = np.random.randn(n_h,n_y) * 0.01
print('W0 weight:\n', W0)
print('W1 weight:\n', W1)
W0 weight:
[[-0.00204708 0.00478943 -0.00519439 -0.0055573 ]]
W1 weight:
[[ 0.01965781 0.01393406]
[ 0.00092908 0.00281746]
[ 0.00769023 0.01246435]
[ 0.01007189 -0.01296221]]
#Implement ReLu as activation function,
#Originally, non-linear activation needed, but when I searched some material, relu is used for activate function generally.
def relu(x):
return np.maximum(0,x)
#Build GCN layer
#In this function, we implement numpy to simplify
def gcn(A,H,W):
# Make a GCN Layer using the Graph Convolution Equation process so far.
# You can use np.diag, np.sum, np.linalg.inv,
I = np.identity(A.shape[0]) # create Identity Matrix of A
A_hat = A + I # add self-loop to A
D = np.diag(np.sum(A_hat, axis=0)) # create Degree Matrix of A
D_inv = np.linalg.inv(D)
D_invA =, A_hat)
DAXW =, H).dot(W)
return relu(DAXW)
#Do forward propagation
H1 = gcn(A,X,W0)
Node Embedding from GCN output:
[[1.26076558e-05 3.82331255e-05]
[1.27930625e-05 3.87953773e-05]
[1.44617228e-05 4.38556439e-05]
[1.40909094e-05 4.27311403e-05]
[1.44617228e-05 4.38556439e-05]
[1.77990434e-05 5.39761772e-05]]
Node representation
def visualize(h, color):
plt.figure(figsize=(8, 8))
plt.xlim([np.min(h[:,0])*0.9, np.max(h[:,0])*1.1])
plt.xlabel('Dimension 0')
plt.ylabel('Dimension 1')
plt.scatter(h[:, 0], h[:, 1], s=140, c=color, cmap="Set2")
visualize(H2, color=range(6)) # node3 and node 5 have same embedding, So Two nodes overlap on the screen.
import math
import numpy as np
import scipy.sparse as sp
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.parameter import Parameter
from torch.nn.modules.module import Module
import torch.optim as optim
import time
Dataset link :
The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.
import pandas as pd
import os
edgelist = pd.read_csv(os.path.join("./", "cora.cites"), sep='\t', header=None, names=["target", "source"]) # it has graph
edgelist["label"] = "cites"
edgelist.sample(frac=1).head(5) # <ID of cited paper node> <ID of citing paper node>, by doing this, you can see the edge information
target | source | label | |
4294 | 152731 | 1109392 | cites |
2625 | 28385 | 118558 | cites |
4255 | 144408 | 219446 | cites |
3696 | 78555 | 78557 | cites |
437 | 1365 | 188318 | cites |
nx.set_node_attributes(Gnx, "paper", "label")
print(Gnx.nodes) # from edgelist, by using from_pandas_edgelist() function, we can extract node list from edgelist
Gnx.nodes[12210] ## by type this, we can see the node feature
{'label': 'paper'}
feature_names = ["word_{}".format(ii) for ii in range(1433)]
column_names = feature_names + ["subject"]
node_data = pd.read_csv(os.path.join("/content/", "cora.content"), sep='\t', header=None, names=column_names)
node_data.head(5) # <paper node id> <word_attributes>+ <node label>
word_0 | word_1 | word_2 | word_3 | word_4 | word_5 | word_6 | word_7 | word_8 | word_9 | ... | word_1424 | word_1425 | word_1426 | word_1427 | word_1428 | word_1429 | word_1430 | word_1431 | word_1432 | subject | |
31336 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | Neural_Networks |
1061127 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Rule_Learning |
1106406 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Reinforcement_Learning |
13195 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Reinforcement_Learning |
37879 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Probabilistic_Methods |
5 rows × 1434 columns
set(node_data["subject"]) # node class type
In the class, we will predict the subject of a paper (node) on the basis of the surrounding node data and the structure of the graph.
EPOCH = 200
SEED = 42
dropout_rate = 0.5
learning_rate = 0.01
weight_decay = 5e-4
def encode_onehot(labels): # we will make all class(subject) to one-hot vector for training.
classes = set(labels) # {'Case_Based', 'Genetic_Algorithms', 'Neural_Networks', 'Probabilistic_Methods', 'Reinforcement_Learning', 'Rule_Learning', 'Theory'}
classes_dict = {c: np.identity(len(classes))[i, :] for i, c in enumerate(classes)}
labels_onehot = np.array(list(map(classes_dict.get, labels)), dtype=np.int32)
return labels_onehot
def normalize(mx): # This part is similar to the normalization process implemented earlier.
rowsum = np.array(mx.sum(1))
r_inv = np.power(rowsum, -1).flatten()
r_inv[np.isinf(r_inv)] = 0.
r_mat_inv = sp.diags(r_inv)
mx =
return mx
def sparse_mx_to_torch_sparse_tensor(sparse_mx): # Convert a scipy sparse matrix to a torch sparse tensor.
sparse_mx = sparse_mx.tocoo().astype(np.float32)
indices = torch.from_numpy(np.vstack((sparse_mx.row, sparse_mx.col)).astype(np.int64))
values = torch.from_numpy(
shape = torch.Size(sparse_mx.shape)
return torch.sparse.FloatTensor(indices, values, shape)
def load_data(path="./", dataset="cora"):
# In the function, by using above 3 function,
print('Loading {} dataset...'.format(dataset))
idx_features_labels = np.genfromtxt("{}{}.content".format(path, dataset), dtype=np.dtype(str)) # load all tables
features = sp.csr_matrix(idx_features_labels[:, 1:-1], dtype=np.float32) # Compress sparse matrix
labels = encode_onehot(idx_features_labels[:, -1]) # Label onehot encoding
# build graph
idx = np.array(idx_features_labels[:, 0], dtype=np.int32) # node list, size : total number of publication
idx_map = {j: i for i, j in enumerate(idx)}
edges_unordered = np.genfromtxt("{}{}.cites".format(path, dataset),dtype=np.int32)
edges = np.array(list(map(idx_map.get, edges_unordered.flatten())), dtype=np.int32).reshape(edges_unordered.shape)
adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])), shape=(labels.shape[0], labels.shape[0]), dtype=np.float32)
# build adjacency matrix
adj = adj + adj.T.multiply(adj.T > adj) - adj.multiply(adj.T > adj)
features = normalize(features)
adj = normalize(adj + sp.eye(adj.shape[0]))
# split all nodes to train/valid/test for node classification
idx_train = range(140)
idx_val = range(200, 500)
idx_test = range(500, 1500)
features = torch.FloatTensor(np.array(features.todense()))
labels = torch.LongTensor(np.where(labels)[1])
adj = sparse_mx_to_torch_sparse_tensor(adj)
idx_train = torch.LongTensor(idx_train)
idx_val = torch.LongTensor(idx_val)
idx_test = torch.LongTensor(idx_test)
return adj, features, labels, idx_train, idx_val, idx_test
def accuracy(output, labels):
preds = output.max(1)[1].type_as(labels)
correct = preds.eq(labels).double()
correct = correct.sum()
return correct / len(labels)
class GraphConvolution(Module):
#Simple GCN layer, similar to
def __init__(self, in_features, out_features):
super(GraphConvolution, self).__init__()
# initialize weight by using reset_parameters() function
self.in_features = in_features
self.out_features = out_features
# Random initialization of weight
self.weight = Parameter(torch.FloatTensor(in_features, out_features))
# Change random initialization as uniform distribution
def reset_parameters(self):
stdv = 1. / math.sqrt(self.weight.size(1)), stdv)
def forward(self, input, adj):
# You can use
support =, self.weight) # Make XW weight = W
output =, support) # Make AXW adj = A
return output
class GCN(nn.Module):
def __init__(self, nfeat, nhid, nclass, dropout):
super(GCN, self).__init__()
self.gc1 = GraphConvolution(nfeat, nhid)
self.gc2 = GraphConvolution(nhid, nclass)
self.dropout = dropout
def forward(self, x, adj):
# Obtain Node embedding
# Make forward propagation by referencing Section 1 (Graph Convolution Equation's forward propagation).
x = self.gc1(x, adj) # Fisrt GraphConvlution Layer
x = F.relu(x) # relu
x = F.dropout(x, self.dropout, # dropout
x = self.gc2(x, adj) # Second Graph Convolution Layer
x = F.log_softmax(x, dim=1) # log(softmax(x))
return x
# Fixing some seed
# Load data
adj, features, labels, idx_train, idx_val, idx_test = load_data() # adj -> adjacency matrix, same ax A, features -> node feature matrix, same as X
Loading cora dataset...
CPU times: user 3.73 s, sys: 338 ms, total: 4.07 s
Wall time: 4.09 s
tensor(indices=tensor([[ 0, 8, 14, ..., 1389, 2344, 2707],
[ 0, 0, 0, ..., 2707, 2707, 2707]]),
values=tensor([0.1667, 0.1667, 0.0500, ..., 0.2000, 0.5000, 0.2500]),
size=(2708, 2708), nnz=13264, layout=torch.sparse_coo)
features.shape # (Number of publications) X (word vectors + subject)
torch.Size([2708, 1433])
# 0이 아닌 것이 subject attribute
features[:, -10:]
tensor([[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0588, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0526, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]])
tensor([0, 6, 3, ..., 4, 1, 0])
# Model and optimizer
model = GCN(nfeat=features.shape[1], # [2708, 1433] -> [1433] for matrix multiplication of X and W
nclass=labels.max().item() + 1,
optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
features = features.cuda()
adj = adj.cuda()
labels = labels.cuda()
idx_train = idx_train.cuda()
idx_val = idx_val.cuda()
idx_test = idx_test.cuda()
In the train() function, We train GCN by using nll_loss objective function and Adam Optimizer.
By using train and validation index, We get output in model result.
def train(epoch):
t = time.time()
output = model(features, adj)
loss_train = F.nll_loss(output[idx_train], labels[idx_train])
acc_train = accuracy(output[idx_train], labels[idx_train])
# Evaluate validation set performance separately,
# deactivates dropout during validation run.
output = model(features, adj)
loss_val = F.nll_loss(output[idx_val], labels[idx_val])
acc_val = accuracy(output[idx_val], labels[idx_val])
print('Epoch: {:04d}'.format(epoch+1),
'loss_train: {:.4f}'.format(loss_train.item()),
'acc_train: {:.4f}'.format(acc_train.item()),
'loss_val: {:.4f}'.format(loss_val.item()),
'acc_val: {:.4f}'.format(acc_val.item()))
In the test() function, we test trained model with node embedding visualization (T-SNE).
# Visualize
def visualize(h, label, idx):
plt.figure(figsize=(8, 8))
plt.xlabel('Dimension 0')
plt.ylabel('Dimension 1')
h_ = h[idx]
color = [ label[i] for i in idx ]
print(f'Embedding shape: {list(h_.shape)}')
z = TSNE(n_components=2).fit_transform(h_.detach().cpu().numpy())
plt.scatter(z[:, 0], z[:, 1], s=70, c=color, cmap="Set2")
def test(): # get loss and accuracy with node embedding visualization
output = model(features, adj)
visualize(output, labels.detach().cpu(), idx_test)
loss_test = F.nll_loss(output[idx_test], labels[idx_test])
acc_test = accuracy(output[idx_test], labels[idx_test])
print("Test set results:",
"loss= {:.4f}".format(loss_test.item()),
"accuracy= {:.4f}".format(acc_test.item()))
When I measure time for traing, About 1.35 sec
# Train model
t_total = time.time()
for epoch in range(EPOCH):
print("Optimization Finished!")
print("Total time elapsed: {:.4f}s".format(time.time() - t_total))
Epoch: 0001 loss_train: 1.9490 acc_train: 0.2429 loss_val: 1.9475 acc_val: 0.2067
Epoch: 0002 loss_train: 1.9440 acc_train: 0.2500 loss_val: 1.9430 acc_val: 0.2433
Epoch: 0003 loss_train: 1.9367 acc_train: 0.3357 loss_val: 1.9388 acc_val: 0.2300
Epoch: 0004 loss_train: 1.9317 acc_train: 0.3071 loss_val: 1.9347 acc_val: 0.2200
Epoch: 0005 loss_train: 1.9264 acc_train: 0.3214 loss_val: 1.9304 acc_val: 0.2267
Epoch: 0006 loss_train: 1.9215 acc_train: 0.2929 loss_val: 1.9259 acc_val: 0.2367
Epoch: 0007 loss_train: 1.9153 acc_train: 0.3214 loss_val: 1.9210 acc_val: 0.2500
Epoch: 0008 loss_train: 1.9058 acc_train: 0.3214 loss_val: 1.9155 acc_val: 0.2500
Epoch: 0009 loss_train: 1.9011 acc_train: 0.3357 loss_val: 1.9097 acc_val: 0.2600
Epoch: 0010 loss_train: 1.8911 acc_train: 0.3071 loss_val: 1.9034 acc_val: 0.2600
Epoch: 0011 loss_train: 1.8827 acc_train: 0.3286 loss_val: 1.8966 acc_val: 0.2600
Epoch: 0012 loss_train: 1.8713 acc_train: 0.3143 loss_val: 1.8892 acc_val: 0.2600
Epoch: 0013 loss_train: 1.8622 acc_train: 0.2786 loss_val: 1.8813 acc_val: 0.2633
Epoch: 0014 loss_train: 1.8533 acc_train: 0.3500 loss_val: 1.8730 acc_val: 0.2700
Epoch: 0015 loss_train: 1.8385 acc_train: 0.3500 loss_val: 1.8641 acc_val: 0.2767
Epoch: 0016 loss_train: 1.8272 acc_train: 0.3714 loss_val: 1.8546 acc_val: 0.2967
Epoch: 0017 loss_train: 1.8142 acc_train: 0.3429 loss_val: 1.8446 acc_val: 0.3233
Epoch: 0018 loss_train: 1.7981 acc_train: 0.3857 loss_val: 1.8340 acc_val: 0.3567
Epoch: 0019 loss_train: 1.7861 acc_train: 0.4000 loss_val: 1.8231 acc_val: 0.3767
Epoch: 0020 loss_train: 1.7784 acc_train: 0.3857 loss_val: 1.8117 acc_val: 0.3867
Epoch: 0021 loss_train: 1.7477 acc_train: 0.4357 loss_val: 1.7999 acc_val: 0.4133
Epoch: 0022 loss_train: 1.7493 acc_train: 0.4071 loss_val: 1.7878 acc_val: 0.4267
Epoch: 0023 loss_train: 1.7209 acc_train: 0.4286 loss_val: 1.7751 acc_val: 0.4333
Epoch: 0024 loss_train: 1.7052 acc_train: 0.4643 loss_val: 1.7620 acc_val: 0.4533
Epoch: 0025 loss_train: 1.6924 acc_train: 0.4643 loss_val: 1.7485 acc_val: 0.4600
Epoch: 0026 loss_train: 1.6710 acc_train: 0.4857 loss_val: 1.7347 acc_val: 0.4700
Epoch: 0027 loss_train: 1.6416 acc_train: 0.4714 loss_val: 1.7205 acc_val: 0.4733
Epoch: 0028 loss_train: 1.6360 acc_train: 0.4929 loss_val: 1.7060 acc_val: 0.4800
Epoch: 0029 loss_train: 1.6070 acc_train: 0.5000 loss_val: 1.6913 acc_val: 0.4900
Epoch: 0030 loss_train: 1.5961 acc_train: 0.5357 loss_val: 1.6762 acc_val: 0.5000
Epoch: 0031 loss_train: 1.5766 acc_train: 0.5071 loss_val: 1.6610 acc_val: 0.5067
Epoch: 0032 loss_train: 1.5607 acc_train: 0.5429 loss_val: 1.6457 acc_val: 0.5100
Epoch: 0033 loss_train: 1.5345 acc_train: 0.5571 loss_val: 1.6302 acc_val: 0.5233
Epoch: 0034 loss_train: 1.5062 acc_train: 0.6143 loss_val: 1.6146 acc_val: 0.5267
Epoch: 0035 loss_train: 1.4915 acc_train: 0.5643 loss_val: 1.5989 acc_val: 0.5267
Epoch: 0036 loss_train: 1.5025 acc_train: 0.6143 loss_val: 1.5832 acc_val: 0.5367
Epoch: 0037 loss_train: 1.4599 acc_train: 0.6214 loss_val: 1.5675 acc_val: 0.5433
Epoch: 0038 loss_train: 1.4581 acc_train: 0.5929 loss_val: 1.5519 acc_val: 0.5533
Epoch: 0039 loss_train: 1.4309 acc_train: 0.6429 loss_val: 1.5363 acc_val: 0.5667
Epoch: 0040 loss_train: 1.3725 acc_train: 0.6429 loss_val: 1.5206 acc_val: 0.5700
Epoch: 0041 loss_train: 1.3793 acc_train: 0.6357 loss_val: 1.5049 acc_val: 0.5767
Epoch: 0042 loss_train: 1.3352 acc_train: 0.6500 loss_val: 1.4891 acc_val: 0.5800
Epoch: 0043 loss_train: 1.3562 acc_train: 0.6786 loss_val: 1.4733 acc_val: 0.5833
Epoch: 0044 loss_train: 1.3076 acc_train: 0.6929 loss_val: 1.4576 acc_val: 0.5933
Epoch: 0045 loss_train: 1.2951 acc_train: 0.6786 loss_val: 1.4419 acc_val: 0.6000
Epoch: 0046 loss_train: 1.2654 acc_train: 0.6857 loss_val: 1.4263 acc_val: 0.6067
Epoch: 0047 loss_train: 1.2657 acc_train: 0.7071 loss_val: 1.4108 acc_val: 0.6100
Epoch: 0048 loss_train: 1.2517 acc_train: 0.7500 loss_val: 1.3954 acc_val: 0.6100
Epoch: 0049 loss_train: 1.2049 acc_train: 0.7071 loss_val: 1.3802 acc_val: 0.6267
Epoch: 0050 loss_train: 1.2129 acc_train: 0.7000 loss_val: 1.3651 acc_val: 0.6367
Epoch: 0051 loss_train: 1.1661 acc_train: 0.7357 loss_val: 1.3500 acc_val: 0.6500
Epoch: 0052 loss_train: 1.2001 acc_train: 0.7071 loss_val: 1.3351 acc_val: 0.6533
Epoch: 0053 loss_train: 1.1581 acc_train: 0.7714 loss_val: 1.3204 acc_val: 0.6567
Epoch: 0054 loss_train: 1.1501 acc_train: 0.7714 loss_val: 1.3059 acc_val: 0.6600
Epoch: 0055 loss_train: 1.1119 acc_train: 0.7714 loss_val: 1.2915 acc_val: 0.6633
Epoch: 0056 loss_train: 1.1154 acc_train: 0.8000 loss_val: 1.2774 acc_val: 0.6733
Epoch: 0057 loss_train: 1.0678 acc_train: 0.8143 loss_val: 1.2634 acc_val: 0.6800
Epoch: 0058 loss_train: 1.0512 acc_train: 0.7857 loss_val: 1.2496 acc_val: 0.6833
Epoch: 0059 loss_train: 1.0376 acc_train: 0.8214 loss_val: 1.2359 acc_val: 0.6900
Epoch: 0060 loss_train: 1.0373 acc_train: 0.8214 loss_val: 1.2225 acc_val: 0.7033
Epoch: 0061 loss_train: 1.0335 acc_train: 0.8071 loss_val: 1.2094 acc_val: 0.7167
Epoch: 0062 loss_train: 1.0095 acc_train: 0.8000 loss_val: 1.1965 acc_val: 0.7200
Epoch: 0063 loss_train: 0.9977 acc_train: 0.8000 loss_val: 1.1840 acc_val: 0.7267
Epoch: 0064 loss_train: 0.9484 acc_train: 0.8357 loss_val: 1.1717 acc_val: 0.7267
Epoch: 0065 loss_train: 0.9430 acc_train: 0.8000 loss_val: 1.1596 acc_val: 0.7300
Epoch: 0066 loss_train: 0.9460 acc_train: 0.8214 loss_val: 1.1478 acc_val: 0.7367
Epoch: 0067 loss_train: 0.9307 acc_train: 0.8286 loss_val: 1.1366 acc_val: 0.7333
Epoch: 0068 loss_train: 0.8884 acc_train: 0.8286 loss_val: 1.1257 acc_val: 0.7400
Epoch: 0069 loss_train: 0.9236 acc_train: 0.8357 loss_val: 1.1149 acc_val: 0.7400
Epoch: 0070 loss_train: 0.8896 acc_train: 0.8357 loss_val: 1.1045 acc_val: 0.7467
Epoch: 0071 loss_train: 0.8333 acc_train: 0.8643 loss_val: 1.0943 acc_val: 0.7600
Epoch: 0072 loss_train: 0.8907 acc_train: 0.8643 loss_val: 1.0844 acc_val: 0.7600
Epoch: 0073 loss_train: 0.8249 acc_train: 0.8643 loss_val: 1.0748 acc_val: 0.7633
Epoch: 0074 loss_train: 0.8501 acc_train: 0.8500 loss_val: 1.0654 acc_val: 0.7633
Epoch: 0075 loss_train: 0.8271 acc_train: 0.8571 loss_val: 1.0563 acc_val: 0.7633
Epoch: 0076 loss_train: 0.8333 acc_train: 0.8786 loss_val: 1.0474 acc_val: 0.7633
Epoch: 0077 loss_train: 0.7798 acc_train: 0.9000 loss_val: 1.0386 acc_val: 0.7667
Epoch: 0078 loss_train: 0.7881 acc_train: 0.8643 loss_val: 1.0303 acc_val: 0.7667
Epoch: 0079 loss_train: 0.7975 acc_train: 0.8571 loss_val: 1.0223 acc_val: 0.7667
Epoch: 0080 loss_train: 0.7892 acc_train: 0.8786 loss_val: 1.0146 acc_val: 0.7733
Epoch: 0081 loss_train: 0.7624 acc_train: 0.9000 loss_val: 1.0071 acc_val: 0.7767
Epoch: 0082 loss_train: 0.7459 acc_train: 0.8929 loss_val: 0.9996 acc_val: 0.7767
Epoch: 0083 loss_train: 0.7435 acc_train: 0.8786 loss_val: 0.9925 acc_val: 0.7800
Epoch: 0084 loss_train: 0.7274 acc_train: 0.8857 loss_val: 0.9856 acc_val: 0.7833
Epoch: 0085 loss_train: 0.6996 acc_train: 0.8857 loss_val: 0.9791 acc_val: 0.7833
Epoch: 0086 loss_train: 0.7249 acc_train: 0.8857 loss_val: 0.9729 acc_val: 0.7833
Epoch: 0087 loss_train: 0.7449 acc_train: 0.8929 loss_val: 0.9669 acc_val: 0.7833
Epoch: 0088 loss_train: 0.7044 acc_train: 0.9071 loss_val: 0.9610 acc_val: 0.7833
Epoch: 0089 loss_train: 0.7135 acc_train: 0.8929 loss_val: 0.9551 acc_val: 0.7800
Epoch: 0090 loss_train: 0.6792 acc_train: 0.9071 loss_val: 0.9494 acc_val: 0.7800
Epoch: 0091 loss_train: 0.7334 acc_train: 0.8500 loss_val: 0.9438 acc_val: 0.7800
Epoch: 0092 loss_train: 0.6932 acc_train: 0.9000 loss_val: 0.9386 acc_val: 0.7800
Epoch: 0093 loss_train: 0.6891 acc_train: 0.9000 loss_val: 0.9337 acc_val: 0.7800
Epoch: 0094 loss_train: 0.6501 acc_train: 0.9000 loss_val: 0.9289 acc_val: 0.7833
Epoch: 0095 loss_train: 0.6511 acc_train: 0.8786 loss_val: 0.9241 acc_val: 0.7867
Epoch: 0096 loss_train: 0.6786 acc_train: 0.8929 loss_val: 0.9195 acc_val: 0.7867
Epoch: 0097 loss_train: 0.6553 acc_train: 0.8714 loss_val: 0.9149 acc_val: 0.7833
Epoch: 0098 loss_train: 0.6299 acc_train: 0.8929 loss_val: 0.9108 acc_val: 0.7833
Epoch: 0099 loss_train: 0.6283 acc_train: 0.9000 loss_val: 0.9068 acc_val: 0.7833
Epoch: 0100 loss_train: 0.6411 acc_train: 0.8929 loss_val: 0.9028 acc_val: 0.7833
Epoch: 0101 loss_train: 0.6216 acc_train: 0.9000 loss_val: 0.8986 acc_val: 0.7833
Epoch: 0102 loss_train: 0.6309 acc_train: 0.9071 loss_val: 0.8946 acc_val: 0.7833
Epoch: 0103 loss_train: 0.6211 acc_train: 0.8786 loss_val: 0.8908 acc_val: 0.7800
Epoch: 0104 loss_train: 0.5940 acc_train: 0.9071 loss_val: 0.8870 acc_val: 0.7833
Epoch: 0105 loss_train: 0.6268 acc_train: 0.9000 loss_val: 0.8831 acc_val: 0.7833
Epoch: 0106 loss_train: 0.5906 acc_train: 0.9071 loss_val: 0.8793 acc_val: 0.7833
Epoch: 0107 loss_train: 0.5937 acc_train: 0.8929 loss_val: 0.8754 acc_val: 0.7833
Epoch: 0108 loss_train: 0.5637 acc_train: 0.9143 loss_val: 0.8717 acc_val: 0.7833
Epoch: 0109 loss_train: 0.5805 acc_train: 0.9000 loss_val: 0.8681 acc_val: 0.7800
Epoch: 0110 loss_train: 0.5983 acc_train: 0.8786 loss_val: 0.8647 acc_val: 0.7800
Epoch: 0111 loss_train: 0.5719 acc_train: 0.9143 loss_val: 0.8613 acc_val: 0.7800
Epoch: 0112 loss_train: 0.5894 acc_train: 0.8929 loss_val: 0.8579 acc_val: 0.7800
Epoch: 0113 loss_train: 0.5635 acc_train: 0.8929 loss_val: 0.8547 acc_val: 0.7800
Epoch: 0114 loss_train: 0.6131 acc_train: 0.8929 loss_val: 0.8516 acc_val: 0.7800
Epoch: 0115 loss_train: 0.5426 acc_train: 0.9143 loss_val: 0.8483 acc_val: 0.7800
Epoch: 0116 loss_train: 0.5330 acc_train: 0.9000 loss_val: 0.8449 acc_val: 0.7800
Epoch: 0117 loss_train: 0.5570 acc_train: 0.9143 loss_val: 0.8416 acc_val: 0.7800
Epoch: 0118 loss_train: 0.5509 acc_train: 0.9357 loss_val: 0.8386 acc_val: 0.7800
Epoch: 0119 loss_train: 0.5752 acc_train: 0.8929 loss_val: 0.8356 acc_val: 0.7800
Epoch: 0120 loss_train: 0.5703 acc_train: 0.9143 loss_val: 0.8328 acc_val: 0.7800
Epoch: 0121 loss_train: 0.5391 acc_train: 0.9214 loss_val: 0.8299 acc_val: 0.7800
Epoch: 0122 loss_train: 0.5385 acc_train: 0.9071 loss_val: 0.8274 acc_val: 0.7800
Epoch: 0123 loss_train: 0.5392 acc_train: 0.9000 loss_val: 0.8250 acc_val: 0.7800
Epoch: 0124 loss_train: 0.5267 acc_train: 0.9071 loss_val: 0.8230 acc_val: 0.7800
Epoch: 0125 loss_train: 0.5205 acc_train: 0.9143 loss_val: 0.8210 acc_val: 0.7833
Epoch: 0126 loss_train: 0.5583 acc_train: 0.9000 loss_val: 0.8189 acc_val: 0.7833
Epoch: 0127 loss_train: 0.5233 acc_train: 0.9286 loss_val: 0.8168 acc_val: 0.7833
Epoch: 0128 loss_train: 0.5294 acc_train: 0.9143 loss_val: 0.8145 acc_val: 0.7833
Epoch: 0129 loss_train: 0.5298 acc_train: 0.8929 loss_val: 0.8116 acc_val: 0.7833
Epoch: 0130 loss_train: 0.5261 acc_train: 0.9143 loss_val: 0.8086 acc_val: 0.7833
Epoch: 0131 loss_train: 0.5282 acc_train: 0.9071 loss_val: 0.8056 acc_val: 0.7833
Epoch: 0132 loss_train: 0.5312 acc_train: 0.9286 loss_val: 0.8029 acc_val: 0.7800
Epoch: 0133 loss_train: 0.5154 acc_train: 0.9000 loss_val: 0.8004 acc_val: 0.7833
Epoch: 0134 loss_train: 0.5126 acc_train: 0.9143 loss_val: 0.7979 acc_val: 0.7833
Epoch: 0135 loss_train: 0.5036 acc_train: 0.9000 loss_val: 0.7957 acc_val: 0.7833
Epoch: 0136 loss_train: 0.4925 acc_train: 0.9143 loss_val: 0.7935 acc_val: 0.7867
Epoch: 0137 loss_train: 0.5123 acc_train: 0.8786 loss_val: 0.7915 acc_val: 0.7833
Epoch: 0138 loss_train: 0.5016 acc_train: 0.9143 loss_val: 0.7894 acc_val: 0.7867
Epoch: 0139 loss_train: 0.5007 acc_train: 0.9143 loss_val: 0.7875 acc_val: 0.7867
Epoch: 0140 loss_train: 0.5032 acc_train: 0.9143 loss_val: 0.7855 acc_val: 0.7800
Epoch: 0141 loss_train: 0.4719 acc_train: 0.9357 loss_val: 0.7838 acc_val: 0.7833
Epoch: 0142 loss_train: 0.4737 acc_train: 0.9286 loss_val: 0.7822 acc_val: 0.7800
Epoch: 0143 loss_train: 0.4898 acc_train: 0.9143 loss_val: 0.7809 acc_val: 0.7800
Epoch: 0144 loss_train: 0.4710 acc_train: 0.9214 loss_val: 0.7797 acc_val: 0.7767
Epoch: 0145 loss_train: 0.4852 acc_train: 0.9214 loss_val: 0.7782 acc_val: 0.7767
Epoch: 0146 loss_train: 0.4303 acc_train: 0.9286 loss_val: 0.7767 acc_val: 0.7767
Epoch: 0147 loss_train: 0.4668 acc_train: 0.9429 loss_val: 0.7752 acc_val: 0.7767
Epoch: 0148 loss_train: 0.4971 acc_train: 0.8929 loss_val: 0.7736 acc_val: 0.7767
Epoch: 0149 loss_train: 0.4710 acc_train: 0.9071 loss_val: 0.7721 acc_val: 0.7800
Epoch: 0150 loss_train: 0.4713 acc_train: 0.9143 loss_val: 0.7706 acc_val: 0.7767
Epoch: 0151 loss_train: 0.4826 acc_train: 0.9286 loss_val: 0.7692 acc_val: 0.7767
Epoch: 0152 loss_train: 0.4402 acc_train: 0.9214 loss_val: 0.7677 acc_val: 0.7767
Epoch: 0153 loss_train: 0.4601 acc_train: 0.9357 loss_val: 0.7663 acc_val: 0.7767
Epoch: 0154 loss_train: 0.4625 acc_train: 0.9286 loss_val: 0.7645 acc_val: 0.7767
Epoch: 0155 loss_train: 0.4578 acc_train: 0.9286 loss_val: 0.7629 acc_val: 0.7767
Epoch: 0156 loss_train: 0.4636 acc_train: 0.9071 loss_val: 0.7613 acc_val: 0.7767
Epoch: 0157 loss_train: 0.4710 acc_train: 0.9286 loss_val: 0.7597 acc_val: 0.7767
Epoch: 0158 loss_train: 0.4791 acc_train: 0.9429 loss_val: 0.7581 acc_val: 0.7767
Epoch: 0159 loss_train: 0.4814 acc_train: 0.9214 loss_val: 0.7564 acc_val: 0.7767
Epoch: 0160 loss_train: 0.4818 acc_train: 0.8929 loss_val: 0.7547 acc_val: 0.7767
Epoch: 0161 loss_train: 0.4525 acc_train: 0.9214 loss_val: 0.7535 acc_val: 0.7800
Epoch: 0162 loss_train: 0.4120 acc_train: 0.9286 loss_val: 0.7521 acc_val: 0.7867
Epoch: 0163 loss_train: 0.4675 acc_train: 0.9429 loss_val: 0.7505 acc_val: 0.7867
Epoch: 0164 loss_train: 0.4444 acc_train: 0.9143 loss_val: 0.7487 acc_val: 0.7900
Epoch: 0165 loss_train: 0.4293 acc_train: 0.9286 loss_val: 0.7469 acc_val: 0.7867
Epoch: 0166 loss_train: 0.4124 acc_train: 0.9214 loss_val: 0.7456 acc_val: 0.7867
Epoch: 0167 loss_train: 0.4526 acc_train: 0.9143 loss_val: 0.7442 acc_val: 0.7900
Epoch: 0168 loss_train: 0.4110 acc_train: 0.9500 loss_val: 0.7427 acc_val: 0.7900
Epoch: 0169 loss_train: 0.4323 acc_train: 0.9429 loss_val: 0.7411 acc_val: 0.7900
Epoch: 0170 loss_train: 0.4613 acc_train: 0.9143 loss_val: 0.7394 acc_val: 0.7933
Epoch: 0171 loss_train: 0.3700 acc_train: 0.9429 loss_val: 0.7381 acc_val: 0.7900
Epoch: 0172 loss_train: 0.4179 acc_train: 0.9214 loss_val: 0.7370 acc_val: 0.7933
Epoch: 0173 loss_train: 0.4309 acc_train: 0.9214 loss_val: 0.7356 acc_val: 0.7967
Epoch: 0174 loss_train: 0.4136 acc_train: 0.8929 loss_val: 0.7343 acc_val: 0.7967
Epoch: 0175 loss_train: 0.3838 acc_train: 0.9429 loss_val: 0.7331 acc_val: 0.7967
Epoch: 0176 loss_train: 0.4168 acc_train: 0.9214 loss_val: 0.7324 acc_val: 0.7967
Epoch: 0177 loss_train: 0.4039 acc_train: 0.9286 loss_val: 0.7320 acc_val: 0.7967
Epoch: 0178 loss_train: 0.4021 acc_train: 0.9214 loss_val: 0.7312 acc_val: 0.7933
Epoch: 0179 loss_train: 0.4318 acc_train: 0.9500 loss_val: 0.7302 acc_val: 0.7933
Epoch: 0180 loss_train: 0.3904 acc_train: 0.9500 loss_val: 0.7293 acc_val: 0.7967
Epoch: 0181 loss_train: 0.4072 acc_train: 0.9357 loss_val: 0.7286 acc_val: 0.7933
Epoch: 0182 loss_train: 0.3995 acc_train: 0.9286 loss_val: 0.7276 acc_val: 0.7967
Epoch: 0183 loss_train: 0.4138 acc_train: 0.9214 loss_val: 0.7268 acc_val: 0.7967
Epoch: 0184 loss_train: 0.4128 acc_train: 0.9214 loss_val: 0.7257 acc_val: 0.7967
Epoch: 0185 loss_train: 0.4114 acc_train: 0.9286 loss_val: 0.7244 acc_val: 0.8033
Epoch: 0186 loss_train: 0.4140 acc_train: 0.9286 loss_val: 0.7236 acc_val: 0.7967
Epoch: 0187 loss_train: 0.4249 acc_train: 0.9357 loss_val: 0.7223 acc_val: 0.7967
Epoch: 0188 loss_train: 0.4085 acc_train: 0.9429 loss_val: 0.7212 acc_val: 0.7967
Epoch: 0189 loss_train: 0.3959 acc_train: 0.9500 loss_val: 0.7201 acc_val: 0.8033
Epoch: 0190 loss_train: 0.3834 acc_train: 0.9429 loss_val: 0.7192 acc_val: 0.8033
Epoch: 0191 loss_train: 0.3958 acc_train: 0.9714 loss_val: 0.7182 acc_val: 0.8033
Epoch: 0192 loss_train: 0.3717 acc_train: 0.9357 loss_val: 0.7171 acc_val: 0.8000
Epoch: 0193 loss_train: 0.4009 acc_train: 0.9500 loss_val: 0.7163 acc_val: 0.8000
Epoch: 0194 loss_train: 0.3830 acc_train: 0.9357 loss_val: 0.7156 acc_val: 0.8033
Epoch: 0195 loss_train: 0.3970 acc_train: 0.9214 loss_val: 0.7148 acc_val: 0.8067
Epoch: 0196 loss_train: 0.3925 acc_train: 0.9500 loss_val: 0.7140 acc_val: 0.8067
Epoch: 0197 loss_train: 0.3775 acc_train: 0.9643 loss_val: 0.7128 acc_val: 0.8100
Epoch: 0198 loss_train: 0.4060 acc_train: 0.9429 loss_val: 0.7114 acc_val: 0.8133
Epoch: 0199 loss_train: 0.3697 acc_train: 0.9429 loss_val: 0.7106 acc_val: 0.8133
Epoch: 0200 loss_train: 0.3809 acc_train: 0.9500 loss_val: 0.7094 acc_val: 0.8167
Optimization Finished!
Total time elapsed: 4.2290s
CPU times: user 1.64 s, sys: 369 ms, total: 2.01 s
Wall time: 4.23 s
When I measure test time, About 6.79 sec
# Testing
Embedding shape: [1000, 7]
Test set results: loss= 0.7287 accuracy= 0.8240
CPU times: user 8.25 s, sys: 402 ms, total: 8.65 s
Wall time: 4.21 s
Collab Dataset
it is a large dataset containing many graphs and graph labels.
This dataset is mainly used for graph classification.
COLLAB is a scientific collaboration dataset. A graph corresponds to a researcher’s ego network,
i.e., the researcher and its collaborators are nodes and an edge indicates collaboration between two researchers.
The code is made on pytorch_geometric library.
Why do I use?
pytorch_geometric is very fast despite working on sparse data.
Compared to the Deep GraphLibrary (DGL) 0.1.3, pytorch_geometric trains models up to 15 times faster.
So, I recommend running the code and studying the library.
Reference :
from torch_geometric.datasets import TUDataset
from torch_geometric.utils import to_networkx
import torch_geometric.transforms as T
from torch_geometric.utils import degree
def create_one_hot_transform(dataset): # Since the collab dataset does not have a node feature, So I make a node feature using the max_degree value.
max_degree = 0 # I reference that in
degs = []
for data in dataset:
degs += [degree(data.edge_index[0], dtype=torch.long)]
max_degree = max(max_degree, degs[-1].max().item())
return T.OneHotDegree(max_degree)
def load_dataset():
dataset = TUDataset(root='/tmp/COLLAB', name="COLLAB")
dataset.transform = create_one_hot_transform(dataset)
return dataset
dataset = load_dataset()
Extracting /tmp/COLLAB/COLLAB/
CPU times: user 1min 33s, sys: 11.4 s, total: 1min 44s
Wall time: 1min 44s
print(f'Dataset: {dataset}:')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of classes: {dataset.num_classes}')
print(f'Number of features: {dataset.num_features}')
###### One graph #####
data = dataset[0] # Get the first graph object.
# Gather some statistics about the first graph.
print(f'Number of nodes: {data.num_nodes}')
print(f'Number of edges: {data.num_edges}')
print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}')
print(f'Contains isolated nodes: {data.contains_isolated_nodes()}')
print(f'Contains self-loops: {data.contains_self_loops()}')
print(f'Is undirected: {data.is_undirected()}')
Dataset: COLLAB(5000):
Number of graphs: 5000
Number of classes: 3
Number of features: 492
Data(edge_index=[2, 1980], y=[1], num_nodes=45, x=[45, 492])
Number of nodes: 45
Number of edges: 1980
Average node degree: 44.00
Contains isolated nodes: False
Contains self-loops: False
Is undirected: True
# One graph edges
tensor([[ 0, 0, 0, ..., 44, 44, 44],
[ 1, 2, 3, ..., 41, 42, 43]])
from torch_geometric.utils import to_networkx
G = to_networkx(data, to_undirected=True)
#It shows one graph of Collab dataset.
plt.figure(figsize=(10, 10))
nx.draw_networkx(G, pos=nx.spring_layout(G, seed=42), with_labels=True, cmap="Set2", width=0.5, node_size=500, node_color='yellow')
dataset = dataset.shuffle() # Label data are sequentially located. (0, 1, 2)
# train / valid
train_dataset = dataset[:4000]
valid_dataset = dataset[4000:]
print(f'Number of training graphs: {len(train_dataset)}')
print(f'Number of test graphs: {len(valid_dataset)}')
Number of training graphs: 4000
Number of test graphs: 1000
from import DataLoader
# Unlike CV and NLP, in graph, DataLoader aggregates node_feature, weight and edge_index from different samples/ graphs into Batches
# So The GNN model needs this “batch” information to know which nodes belong to the same graph within a batch to perform computation.
# Reference :
# Reference :
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
valid_loader = DataLoader(valid_dataset, batch_size=64, shuffle=False)
from torch.nn import Linear
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.nn import global_mean_pool
class GCN(torch.nn.Module):
def __init__(self, hidden_channels):
super(GCN, self).__init__()
self.conv1 = GCNConv(dataset.num_node_features, hidden_channels)
self.conv2 = GCNConv(hidden_channels, hidden_channels)
self.conv3 = GCNConv(hidden_channels, hidden_channels) # When I used one more GCNConv, the performance came out better.
self.lin = Linear(hidden_channels, dataset.num_classes)
def forward(self, x, edge_index, batch):
# 1. Obtain node embeddings
x = self.conv1(x, edge_index)
x = x.relu()
x = self.conv2(x, edge_index)
x = x.relu()
x = self.conv3(x, edge_index)
# 2. Readout layer
x = global_mean_pool(x, batch) # [batch_size, hidden_channels] , for graph classsification
h = x.clone().detach() # for making graph embedding
# 3. Apply a final classifier
x = F.dropout(x, p=0.5,
x = self.lin(x)
return x , h
model = GCN(hidden_channels=64)
model =
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()
def train(epoch=None):
for data in train_loader: # Iterate in batches over the training dataset.
data =
out, _ = model(data.x, data.edge_index, data.batch) # Perform a single forward pass.
loss = criterion(out, data.y) # Compute the loss.
loss.backward() # Derive gradients.
optimizer.step() # Update parameters based on gradients.
optimizer.zero_grad() # Clear gradients.
print(f'Epoch: {epoch:03d}, Train loss: {loss:.4f}')
def test(loader, visual=False):
correct = 0
for data in loader: # Iterate in batches over the training/test dataset.
data =
out, h = model(data.x, data.edge_index, data.batch)
pred = out.argmax(dim=1) # Use the class with highest probability.
correct += int((pred == data.y).sum()) # Check against ground-truth labels.
if visual == True:
colors = ['#3A3120', '#535D8E', '#BD3430']
color = [ colors[i] for i in data.y.detach().cpu()]
z = TSNE(n_components=2).fit_transform(h.detach().cpu().numpy())
print(f'Embedding shape: {list(h.shape)}')
plt.scatter(z[:, 0], z[:, 1], s=70, c=color, cmap="Set2")
return correct / len(loader.dataset) # Derive ratio of correct predictions.
for epoch in range(1, 31):
test_acc = test(valid_loader)
if epoch % 5 == 0:
print(f'Epoch: {epoch:03d}, Test Acc: {test_acc:.4f}')
Epoch: 001, Train loss: 0.7195
Epoch: 002, Train loss: 0.4161
Epoch: 003, Train loss: 0.4312
Epoch: 004, Train loss: 0.4805
Epoch: 005, Train loss: 0.3761
Epoch: 005, Test Acc: 0.7810
Epoch: 006, Train loss: 0.6026
Epoch: 007, Train loss: 0.3423
Epoch: 008, Train loss: 0.4011
Epoch: 009, Train loss: 0.3502
Epoch: 010, Train loss: 0.3265
Epoch: 010, Test Acc: 0.7960
Epoch: 011, Train loss: 0.3840
Epoch: 012, Train loss: 0.5217
Epoch: 013, Train loss: 0.4022
Epoch: 014, Train loss: 0.3394
Epoch: 015, Train loss: 0.3952
Epoch: 015, Test Acc: 0.8010
Epoch: 016, Train loss: 0.3260
Epoch: 017, Train loss: 0.5382
Epoch: 018, Train loss: 0.4368
Epoch: 019, Train loss: 0.2666
Epoch: 020, Train loss: 0.2966
Epoch: 020, Test Acc: 0.7890
Epoch: 021, Train loss: 0.3952
Epoch: 022, Train loss: 0.2110
Epoch: 023, Train loss: 0.2415
Epoch: 024, Train loss: 0.1759
Epoch: 025, Train loss: 0.2747
Epoch: 025, Test Acc: 0.8190
Epoch: 026, Train loss: 0.2576
Epoch: 027, Train loss: 0.3284
Epoch: 028, Train loss: 0.4078
Epoch: 029, Train loss: 0.2570
Epoch: 030, Train loss: 0.2867
Epoch: 030, Test Acc: 0.8110
CPU times: user 2min, sys: 345 ms, total: 2min
Wall time: 1min 3s
test(valid_loader, visual=True) # t-SNE
Embedding shape: [64, 64]
Embedding shape: [64, 64]
Embedding shape: [64, 64]
Embedding shape: [64, 64]
Embedding shape: [64, 64]
Embedding shape: [64, 64]
Embedding shape: [64, 64]
Embedding shape: [64, 64]
Embedding shape: [64, 64]
Embedding shape: [64, 64]
Embedding shape: [64, 64]
Embedding shape: [64, 64]
Embedding shape: [64, 64]
Embedding shape: [64, 64]
Embedding shape: [64, 64]
Embedding shape: [40, 64]
Thomas N. Kipf, Max Welling, Semi-Supervised Classification with Graph Convolutional Networks (ICLR 2017)
- AI504: Programming for AI Lecture at KAIST AI