[Stanford CS224W] Lecture3

Jaehee Kim·2024년 3월 31일

Graph

목록 보기

3/8

Node Embeddings

Input Graph → Sturucture Features → Learning Algorithm → Prediction
by "Representiation Learning" (feature eng.를 대체하도록)

Graph Representation Learning

Goal : Efficient task-independent feature learning for machine learning with graphs!
Mapping을 자동적으로 일어나게 하는것, 안에 들어있는 정보들을 모두 표현할 수 있도록 하는 것

Why Embedding?

Task: map nodes into an embedding space
❍ Similarity of embeddings between nodes indicates their similarity in the network.
❍ Encode network information
❍ Potentially used for many downstream predictions

Encoder and Decoder

Graph → Adjacency matrix

Goal is to encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the graph
Need to define : similairty functiona and objective function

Learning Node Embeddings

Encoder maps from nodes to embeddings
Define a node similarity function (a measure of similarity in the original network)
Decoder DEC maps from embeddings to the similarity score
Optimize the parameters of the encoder so that

Encoder: maps each node to a low-dimensional vector
Similairty function: specifies how the relationsips in vector space map to the relationships in the original network
"Shallow Encoding" : Encoder is just an embedding-lookup (one column per node x dimension/size of embeddings)
Each node is assigned a unique embedding vector

Framework Summary

Encoder + Decoder Framework
❍ Shallow encoder: embedding loopup
❍ Parameters to optimize: Z which contains node embeddings z_u for all nodes u in V
❍ We will cover deep encoders (GNNs) in Lecture 6
❍ Decoder: based on node similarity
❍ Objective: maximize the dot product of node pairs

How to Define Node Similarity?

Key choice of methods is how they define node similarity
Should two nodes have a similar embedding if they...
❍ are linked?
❍ share neighbors?
❍ have similary "structural roles"?
We will now learn node similairty definition that uses random walks, and how to optimize embeddings for such a similarity measure.

Note on Node Embeddings

This is unsupervised/self-supervised way of learning node embeddings
❍ We are not utilizing node labels
❍ We are not utilizing node features
❍ The goal is to directly estimate a set of coordinates of a node so that some aspect of the network structure is preserved
These embeddings are task independent
❍ They are not trained for a specific task but can be used for any task.

Random Walk Approaches for Node Embeddings

Notation

Random Walk

Given a graph and a starting point, we select a negihbor of it at random, and move to this neighbor; then we select a neighbor of this point at random, and move to it, etc. The (random) sequence of points visited this way is a random walk on the graph.

Random-Walk Embeddings

1. Estimate probabilty of visiting node v on a random walk starting from node u using some random walk strategy R
2. Optimze embeddings to encode these random walk statistics

Why Random Walks?

Expressivity: Flexible stochastic definition of node similarity that incorporates both local and higher-order neighborhood information

Idea : if random walk starting from node u visits v with high probability, u and v are similar (high-order multi-hop information)

Efficiency: Do not need to consider all node pairs when traning; only need to consider pairs that co-occur on random walks

Unsupervised Feature Learning

Intuition: Find embedding of nodes in d-dimensional space that preserves similarity
Idea: Learn node embedding such that nearby nodes are close together in the network
Given a node u, how do we define nearby nodes?

Feature Learning as Optimization

Given node u, we want to learn feature representations that are predictive of the nodes in its random walk neighborhood N_R(u)

Random Walk Optimization

Run short fixed-length random walks starting from each node u in the graph using some random walk strategy R
For each node u collect N_R(u), the multiset of nodes visited on random walks starting from u
Optimize embeddings according to: Given node u, predict its neighbors N_R(u)

Optimizing random walk embeddings = Finding embeddings z_u that minimize L

But doing this naively is too expensive!

Negative Sampling

Instead of normalizing w.r.t all nodes, just normalize against k random "negative samples" n_i

Smaple k negative nodes each with prob. proportional to its degree
Two considerations for k (#negative samples):

Higher k gives more robust estimates
Higher k corresponds to higher bias on negative events

Stochastic Gradient Descent

After we obtained the objective function, how do we optimize (minimize) it?

Random Walks: Summary

Run short fixed-length random walks starting from each node on the graph
For each node u collect N_R(u), the multiset of nodes visited on random walks starting from u
Optimize embeddings using Stochastic Gradient Descent (We can efficiently approximate this using negative sampling!)

How should we randomly walk?

So far we have described how to optimize embeddings given a random walk strategy R
What startegies should we use to run these random walks?

Overview of node2vec

Goal: Embed nodes withs similar network neighborhoods close in the feature space
We frame this goal as a maximum likelihood optimization problem, independent to the downstream prediction task
Idea: use flexible, biased random walks that can trade off between local and global views of the network
BFS는 local한 값들은 알고 싶을 때 (Micro-view of neighourhood), DFS는 더 깊고 긴 관계를 알고 싶을 때 (Macro-view of neighbourhood)

Interpolating BFS and DFS

Biased fixed-length random walk R that given a node u generates neighborhood N_R(u)
Two parameters

Return parameter p : return back to the previous node
In-out parameter q : moving outwards(DFS) vs. inwards(BFS), q is the "ratio" of BFS vs. DFS

Biased Random Walks

Biased 2nd order random walks explore network neighborhoods

Rnd. walk just traversed edge (S-1, w) and is now at w
Insight: Neighobrs of w can only be 여기에 머무르거나, 하나 더 level이 높아지거나, 낮아지거나

node2vec algorithm

compute random walk probabilites
simulate r random walks of length l starting from each node u
optimize the node2vec objective using Stochastic Gradient Descent

Linar-time complexity
All 3 steps are individually parallelizable

Other random walk ideas

different kinds of biased random walks : based on node attributes, based on learned weights
Alternative optimization schemes : directly optimize based on 1-hop and 2-hop random walk probabilities
Network preprocessing techniques : run random walks on modified versions of the original network

Embedding Entire Graphs

Goal: Want to embed a subgraph or an entire graph G. Graph embedding: z_G

Approach 1

Simple Idea 1

Run a standard graph embedding technique on the (sub)graph G
Then just sum (or average) the node embeddings in the (sub)graph G

Approach 2

Introduce a "virual node" to represent the (sub)graph and run a standard graph embedding technique

Approach 3

Anonymous Walk Embeddings

States in anonymous walks correspond to the index of the first time we visited the node in a random walk (언제 sequence가 node를 제일 처음 방문했는지를 바탕으로 walk의 index 결정)
Agnostic to the identity of the nodes visited (hence anonymous)

Number of Walks Grows

Number of anonymous walks grow exponentially

Simple Use of Anonymous Walks

Simulate anonymous walks w_i of l steps and record their counts
Represent the graph as a probability distribution over these walks

Sampling Anonymous Walks

Sampling anonymous walks: Generate independently a set of m random walks
Represent the graph as a probability distribution over these walks
How many random walks m do we need?

New idea: Learn Walk Embeddings

Rather than simply represent each walk by the fraction of times it occurs, we learn embedding z_i of anonymous walk w_i

Learn a graph embedding Z_G together with all the anonymous walk embeddings z_i
How to embed walk?
We obtain the graph embedding Z_G(learnable parameter) after optimization
Use Z_G to make predictions (e.g. graph classification)
Option 1. Inner product kernel
Option 2. Use a neural network that takes Z_G as input to classify

How to use embeddings

Jaehee Kim

이전 포스트

[Stanford CS224W] Lecture2

다음 포스트