Traditional methods for machine learning in graphs
The goal of lecture: Creating additional features that will describe how this particular node is positioned in the rest of the network and what is its local network structure
ML in Graphs
Given
Learn a function
The way we can think of this is that we are given a graph as a set of verticies, as a set of edges and we wanna learn a function
2.1 Node-level tasks and features
Node classification: Where we are given a network, we are given a couple of nodes that are labeld with different colors, and the goal is to predict the colors of uncolored nodes
Goal: Characterize the structure and position of a node in the network
1. Node degree
2. Node centrality
Node degree counts the neighboring nodes without capturing their importance
Node centrality takes the node importance in a graph into account
Different ways to model importance
Eigen-vector centrality: A node is important if surrounded by important neighboring nodes
Betweenness centrality: A node is important if it lies on many shortest paths b/t other nodes
Closeness centrality: A node is important if it has small shortest path lengths to all other nodes
3. Clustering coefficient
4. Graphlets
Observation: Clustering coeffieicnt counts the #(triangles) in the ego-network
We can generalize the by #(pre-specified subgraphs, i.e. graphlets)
Graphlets: Rooted connected non-isomorphic subgraphs
GDV(Graphlet Degree Vector): Graphlet-base features for nodes
GDV(Graphlet Degree Vector): A count vector of graphlets rooted at a given node
Considering graphlets on 2 to 5 nodes we get:
Graphlet degree vector provides a measure of a node's local network topology:
Summary
2.2 Link prediction task and features
The link prediction task is to predict new links based on existing links
At test time, all node pairs(no existing links) are ranked, and top node pairs are predicted
The key is to design features for a pair of nodes
Two formulations of the link prediction task
Methodology
Link-level features:
1. Distance-based features
2. Local neighborhood overlap
3. Global neighborhood overlap
Summary
2.3 Graph-level features and Graph kernels
Features that characterize the structure of an entire graph
Kernel methods are widely-used for traditional ML for graph-level prediction
Graph Kernels: Measure similarity b/t two graphs
[1] Graphlet kernel
[2] WL(Weisfeiler-Lehman) kernel
[3] Other kernels are also proposed in the literature
Both Graphlet kernel and WL kernel use Bag-of-* representation of graph, where * is more sophisticated than node degress
1. Graphlet features
Count the number of different graphlets in a graph
Definition of graphlets here is slightly different from node-level features
Two two differences are
Limitations: Counting graphlet is expensive
Counting size- graphlets for a graph with size by enumeration takes
This is unavoidable in the worst-case since subgraph isomorphism test (judding whether a graph is a subgraph of another graph) is NP-hard
If a graph's node degree is bounded by , an algorithm exists to count all the graphlets of size
2. WL(Weisfeiler-Lehman) Kernel
Goal: Design an efficient graph feature descriptor of
IDEA: Use neighborhood structure to iteratively enrich node vocabulary
Color refienment
WL kernel is very popular/strong graph feature descriptor to gives strong performance and computationally efficient
Sumamry
Lecture Summary