[CS224W]02.Traditional Methods for Machine Learning in Graphs
- Youtube, Lecture Notes
- Node features(Node classification)
- 1) Node Degree: The number of edges each node has. It’s not considering of importance of their neighbor’s importances
- 2) Node Centrality: takes the node importance in a graph into account
- Eigenvector centrality: important if surrounded by important neighboring nodes.
- Betweenness centrality: important if it lies on many shortest paths between other nodes.
- Closeness centrality: important if it has small shortest path lengths to all other nodes.
- 3) Clustering Coefficient: Measures how connected the node’s neighboring nodes are
- 4) Graphlets: Generalized graph figures for each number of nodes.
- Importance based: Node degree, Node Centrality(Useful for predicting influential nodes in a graph)
- Structure based: Node degree, clustering coefficient, Graphlets(Useful for predicting a particular role a node plays in a graph)
- Edge features
- 1) Distance-based feature: Shortest path distance between two nodes; Cons: Does not capture the degree of neighborhood overlap(various roots)
- 2) Local neighborhood overlap: captures the number of neighboring nodes shared between two nodes; Cons: Assign too many zeros to indirectly connected nodes
- Common neighbors
- Jaccard’s coefficient(normalized common neighbors)
- Adamic-Adar index
- 3) Global neighborhood overlap: Katz index(Weighted summation of all lengths between a pair of nodes, calculated by matrix multiplication with adjacency matrix)
- Graph features
- Kernel methods: Measures similarity between two graphs; Design graph feature vector Φ(G)(The output is a inner product of Φ(G). It’s cosine almost similarity.)
- Graphlet Kernel: Count each number of graphlets; Very expensive
- Weisfeiler-Lehman Kernel: Iteratively refine assigned colors.(Hash and Hash again); linear time complexity, efficient way. Closely related to GNN.
- Others(Random-walk Kernel, Shortest-path graph kernel etc.)