[Stanford CS224W] Lecture3

Jaehee Kim·2024년 3월 31일

Graph

목록 보기
3/8

Node Embeddings

Input Graph → Sturucture Features → Learning Algorithm → Prediction
by "Representiation Learning" (feature eng.를 대체하도록)

Graph Representation Learning

  • Goal : Efficient task-independent feature learning for machine learning with graphs!
  • Mapping을 자동적으로 일어나게 하는것, 안에 들어있는 정보들을 모두 표현할 수 있도록 하는 것

Why Embedding?

  • Task: map nodes into an embedding space
    ❍ Similarity of embeddings between nodes indicates their similarity in the network.
    ❍ Encode network information
    ❍ Potentially used for many downstream predictions

Encoder and Decoder

Graph → Adjacency matrix

  • Goal is to encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the graph
  • Need to define : similairty functiona and objective function

Learning Node Embeddings

  1. Encoder maps from nodes to embeddings
  2. Define a node similarity function (a measure of similarity in the original network)
  3. Decoder DEC maps from embeddings to the similarity score
  4. Optimize the parameters of the encoder so that
  • Encoder: maps each node to a low-dimensional vector
  • Similairty function: specifies how the relationsips in vector space map to the relationships in the original network
    "Shallow Encoding" : Encoder is just an embedding-lookup (one column per node x dimension/size of embeddings)
    Each node is assigned a unique embedding vector

Framework Summary

  • Encoder + Decoder Framework
    ❍ Shallow encoder: embedding loopup
    ❍ Parameters to optimize: Z which contains node embeddings z_u for all nodes u in V
    ❍ We will cover deep encoders (GNNs) in Lecture 6
    ❍ Decoder: based on node similarity
    ❍ Objective: maximize the dot product of node pairs

How to Define Node Similarity?

  • Key choice of methods is how they define node similarity
  • Should two nodes have a similar embedding if they...
    ❍ are linked?
    ❍ share neighbors?
    ❍ have similary "structural roles"?
  • We will now learn node similairty definition that uses random walks, and how to optimize embeddings for such a similarity measure.

Note on Node Embeddings

  • This is unsupervised/self-supervised way of learning node embeddings
    ❍ We are not utilizing node labels
    ❍ We are not utilizing node features
    ❍ The goal is to directly estimate a set of coordinates of a node so that some aspect of the network structure is preserved
  • These embeddings are task independent
    ❍ They are not trained for a specific task but can be used for any task.

Random Walk Approaches for Node Embeddings

Notation

Random Walk

  • Given a graph and a starting point, we select a negihbor of it at random, and move to this neighbor; then we select a neighbor of this point at random, and move to it, etc. The (random) sequence of points visited this way is a random walk on the graph.

Random-Walk Embeddings


1. Estimate probabilty of visiting node v on a random walk starting from node u using some random walk strategy R
2. Optimze embeddings to encode these random walk statistics

Why Random Walks?

  1. Expressivity: Flexible stochastic definition of node similarity that incorporates both local and higher-order neighborhood information
  • Idea : if random walk starting from node u visits v with high probability, u and v are similar (high-order multi-hop information)
  1. Efficiency: Do not need to consider all node pairs when traning; only need to consider pairs that co-occur on random walks

Unsupervised Feature Learning

  • Intuition: Find embedding of nodes in d-dimensional space that preserves similarity
  • Idea: Learn node embedding such that nearby nodes are close together in the network
  • Given a node u, how do we define nearby nodes?

Feature Learning as Optimization

  • Given node u, we want to learn feature representations that are predictive of the nodes in its random walk neighborhood N_R(u)

Random Walk Optimization

  1. Run short fixed-length random walks starting from each node u in the graph using some random walk strategy R
  2. For each node u collect N_R(u), the multiset of nodes visited on random walks starting from u
  3. Optimize embeddings according to: Given node u, predict its neighbors N_R(u)


    Optimizing random walk embeddings = Finding embeddings z_u that minimize L
  • But doing this naively is too expensive!

Negative Sampling


Instead of normalizing w.r.t all nodes, just normalize against k random "negative samples" n_i

  • Smaple k negative nodes each with prob. proportional to its degree
  • Two considerations for k (#negative samples):
  1. Higher k gives more robust estimates
  2. Higher k corresponds to higher bias on negative events

Stochastic Gradient Descent

  • After we obtained the objective function, how do we optimize (minimize) it?

Random Walks: Summary

  1. Run short fixed-length random walks starting from each node on the graph
  2. For each node u collect N_R(u), the multiset of nodes visited on random walks starting from u
  3. Optimize embeddings using Stochastic Gradient Descent (We can efficiently approximate this using negative sampling!)

How should we randomly walk?

  • So far we have described how to optimize embeddings given a random walk strategy R
  • What startegies should we use to run these random walks?

Overview of node2vec

  • Goal: Embed nodes withs similar network neighborhoods close in the feature space
  • We frame this goal as a maximum likelihood optimization problem, independent to the downstream prediction task
  • Idea: use flexible, biased random walks that can trade off between local and global views of the network
    BFS는 local한 값들은 알고 싶을 때 (Micro-view of neighourhood), DFS는 더 깊고 긴 관계를 알고 싶을 때 (Macro-view of neighbourhood)

Interpolating BFS and DFS

Biased fixed-length random walk R that given a node u generates neighborhood N_R(u)
Two parameters

  • Return parameter p : return back to the previous node
  • In-out parameter q : moving outwards(DFS) vs. inwards(BFS), q is the "ratio" of BFS vs. DFS

Biased Random Walks

Biased 2nd order random walks explore network neighborhoods

  • Rnd. walk just traversed edge (S-1, w) and is now at w
  • Insight: Neighobrs of w can only be 여기에 머무르거나, 하나 더 level이 높아지거나, 낮아지거나

node2vec algorithm

  1. compute random walk probabilites
  2. simulate r random walks of length l starting from each node u
  3. optimize the node2vec objective using Stochastic Gradient Descent
  • Linar-time complexity
  • All 3 steps are individually parallelizable

Other random walk ideas

  • different kinds of biased random walks : based on node attributes, based on learned weights
  • Alternative optimization schemes : directly optimize based on 1-hop and 2-hop random walk probabilities
  • Network preprocessing techniques : run random walks on modified versions of the original network

Embedding Entire Graphs

  • Goal: Want to embed a subgraph or an entire graph G. Graph embedding: z_G

Approach 1

Simple Idea 1

  • Run a standard graph embedding technique on the (sub)graph G
  • Then just sum (or average) the node embeddings in the (sub)graph G

Approach 2

  • Introduce a "virual node" to represent the (sub)graph and run a standard graph embedding technique

Approach 3

Anonymous Walk Embeddings

  • States in anonymous walks correspond to the index of the first time we visited the node in a random walk (언제 sequence가 node를 제일 처음 방문했는지를 바탕으로 walk의 index 결정)
  • Agnostic to the identity of the nodes visited (hence anonymous)

Number of Walks Grows

  • Number of anonymous walks grow exponentially

Simple Use of Anonymous Walks

  • Simulate anonymous walks w_i of l steps and record their counts
  • Represent the graph as a probability distribution over these walks

Sampling Anonymous Walks

  • Sampling anonymous walks: Generate independently a set of m random walks
  • Represent the graph as a probability distribution over these walks
  • How many random walks m do we need?

New idea: Learn Walk Embeddings

Rather than simply represent each walk by the fraction of times it occurs, we learn embedding z_i of anonymous walk w_i

  • Learn a graph embedding Z_G together with all the anonymous walk embeddings z_i
  • How to embed walk?

  • We obtain the graph embedding Z_G(learnable parameter) after optimization
  • Use Z_G to make predictions (e.g. graph classification)
    Option 1. Inner product kernel
    Option 2. Use a neural network that takes Z_G as input to classify

How to use embeddings

0개의 댓글