3.1 Node Embeddings
Encoder and Decoder
- Assume we have a graph G:
- V is the vertex set
- A is the adjacency matrix (assume binary)
- For simplicity: no node features or extra information is used
Summary
3.2 Random walk approaches for node embddings
- Notation
- Vector zu: The embedding of node u (what we aim to find)
- Probability P(v∣zu): The (predicted) probability of visiting node v on random walks starting from node u. (Our model prediction based on zu
- Non-linear functions used to produce predicted probabilities P(v∣zu)
- Softmax function: Turns vector of K real values (model predictions) into K probabilities that sum to 1
- Sigmoid function: S-shaped function that turns real values into the range of (0, 1)
- Random walk
- Given a graph and a starting point(node), we select a neighbor of it at random, and move to this neighbor
- Then we select a neighbor of this point at random, and move to it
- The (random) sequence of points visited this way is random walk on the graph
- Random walk embeddings: zuTzv≈ probability that u and v co-occur on a random walk over the graph
- Estimate probability of visiting node v on a random walk starting from node u using some random walk strategy R
- Optimize embeddings to encode these random walk statistics
- Similarity in embdding space(Here: dot product = cos(θ)) encodes random walk "similarity"
- Why random walks?
- Expressitivity: Flexible stochastic definition of node similarity that incorporates both local and higher-order neighborhood information
- IDEA: if random walk starting from node u visits v with high probability, u and v are similar (high-order multi-hop information)
- Efficiency: Do not need to consider all node pairs when training; only need to consider pairs that co-occur on random walks
- Un-supervised feature learning
- Intuition: Find embedding of nodes in d-dimensional space that preserves similarity
- IDEA: Learn node embedding such that near by nodes are close together in the network
- Given a node u, how do we define neraby nodes?
- NR(u) ... neighborhood of u obtained by some random walk strategy R
- Feature learning as Optimization
- Given G=(V,E)
- Our goal is to learn a mapping f:u−>Rd:f(u)=zu
- Log-likelihood objective maxfu∈V∑logP(NR(u)∣zu)
- NR(u) is the neighborhood of node u by strategy R
- Given node u, we want to learn feature representations that are predictive of the nodes in its random walk neighborhood NR(u)
Random walks summary
Node2Vec
- How sould we randomly walk?
- So far, we have described how to optimize embeddings given a random walk strategy R
- What strategies should we use to run these random walks?
- Simplest idea: Just run fixed-length, unbiased random walks strating from each node (DeepWalk)
- The issue is that such notion of similarity is too constrained
- Overview of Node2Vec
- Goal: Embed nodes with similar network neighborhoods close in the feature space
- We frame this goal as a maximum likelihood optimization problem, independent to the downstream prediction task
- Key observation: Flexible notion of network neighborhood NR(u) of node u leads to rich node embeddings
- Develop biased 2nd order random walk R to generate network neighborhood NR(u) of node u
- Node2Vec : Biased walks
- IDEA: use flexible, biased random walks that can trade-off b/t local and global vies of the network
- Node2Vec algorihthm
- Compute random walk probabilities
- Simulate r random walks of length l starting from each node u
- Optimize the Node2Vec objective using Stochastic Gradient Descent
- Linear-time complexity
- All 3 steps are individually parallelizable
- Other random walk ideas
- Different kinds of biased random walks
- Alternative optimization schemes
- Network pre-processing techniques
Summary
3.3 Embedding entire graphs
- Graph embedding zG: Want to embed a subgraph or an entire graph G
- Tasks
- Classifying toxic vs. non-toxic molecules
- Identifying anomalous graphs
- Approach 1
- Run a standard node embedding technique on the (sub) graph G
- Then just sum (or average) the node embeddings in the (sub) graph G
zG=v∈G∑zv
- Approach 2
- Introduce a "virtual node" to represent the (sub) grpah and run a standard node embedding technique
Summary
- How to use embeddings
- Clustering/community detection: cluster points zi
- Node classification: Predict label of node i based on zi
- Link prediction: Predict edge (i,j) based on (zi,zj)
- Where we can: concatenate, avg, product, or take a difference b/t the embeddings
- Graph classification: graph embedding zG via aggregating node embeddings or anonymous random walks → Predict label based on graph embdding zG
Summary