작성자 : 장아연
0. Recap
Deep Graph Encoder
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F77b603a6-01f5-425a-85f4-7b156734987e%2F1.png)
Deep Graph Encoders는 임의의 graph를 Deep Neural Network를 통과 시켜 embedding space로 사상 시키는 것
A General GNN Framework
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2Fc0ef5e11-a76a-4c60-a4ca-e43707da50cd%2Fimage.png)
(4) Graph augmentation
: what kind of graph and feature augumentation can we create to shape the structure of this neural network
(5) Learning Objective
: learn objectives and how to make training work
1. Graph Augmentation for GNNs
Why Graph Augment
가정 : Raw input Graph 와 computational Graph는 동일
- Feature : lack features
- Graph Structure :
sparse -> inefficient passing
dense -> too costly passing
large -> not computational graph into GPU
즉, input graph는 embedding을 위한 computation graph의 적합한 상태가 아님.
Idea : Raw input Graph 와 computational Graph는 동일 X
Graph Augmentation Approaches
Graph Feature Augmentation
- lack features : feature augmentation을 통해 feature 생성
Standard Approach
필요성
: adj. matrix만 가지고 있는 경우 Input Graph에서 node feature가 없는 경우 흔히 발생
1. Constant Node Feature
Assign constant values to nodes
basically all the node have same future value of 1
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2Fc63d19aa-833c-410d-91d7-f59d97a90c3d%2Fimage.png)
2. One-hot node feature
Assign unique IDs to nodes
IDs are converted to one-hot vectors
flag value 1 at ID of that single node
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F21fb85be-dc89-4099-aa13-5e70c0890a7d%2Fimage.png)
Constant Node Feature vs One-hot node feature
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F9adcb954-ffcf-4485-bf38-f93089f83e99%2Fimage.png)
Cycle Count
필요성
: GNN 학습에 어려움을 겪는 특정 구조 존재
-
GNN은 v1이 속한 cycle의 길이 학습 불가
-
v1가 어떤 graph에 속해 있는지 구별 불가능
-
모든 node의 차수가 2임
-
computational graph는 같은 binary tree임
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F6dad6ddb-1615-46b0-8285-a57cb4ed0628%2Fimage.png)
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2Fcc2a0e52-035d-4e76-887b-02abd2320173%2Fimage.png)
-
그 외 Node Degree, Clustering coeffiecient, PageRank, Centrality 등 이용
Graph Structure Augmentation
- too sparse : virtual node / edges를 생성
- too dense : message passing 과정에서 neighbors sampling
- too large : embedding 계산을 위해 subgraph sampling
Add Virtual Edge
개념
- virtual edge를 이용해 2-hop 관계의 이웃 노드 연결
- adj.matrix 대신 A+A2 이용
예시
- Author(A)와 Author(B) 연결
: Author(A) -> paper -> Author(B)
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2Ff7634dbd-702a-4b89-a788-7fde5b950f72%2Fimage.png)
Add Virtual node
개념
- virtual node를 이용해 graph의 모든 node를 연결
- 모든 node와 node는 distance 2를 가짐
: node(A) ->virtual node-> node(B)
예시
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2Fafb73d51-14d0-4816-b920-215fce08d2a2%2Fimage.png)
결과
: virtual node/edge 추가해 sparse한 graph에서 message passing 향상
Node Neighborhood Sampling
기존
- 모든 node는 message passing에 사용됨
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F90aab3b8-9bbd-4dd4-8af0-5776b4c4adbd%2Fimage.png)
개념
- node neighborhood를 random sampling해 message passing 진행
예시
:
1. random하게 2개의 neighbor 선택해 sampling
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F26f11de1-2e5c-42f0-a824-40c5560f55d4%2Fimage.png)
2. 다음 layer에서 다른 neighbor 선택해 resampling해 embedding 진행
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2Fc344423b-eaa5-4b7c-baa8-cb79b56ea655%2Fimage.png)
결과
- 모든 neighbor 사용한 경우와 유사한 embedding
- computational cost 경감
(large graph scaling 해줌)
2. Prediction with GNNs
Prediction head : How to train GNN?
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F2d169eef-5698-4712-97da-9a572b887d80%2Fimage.png)
- hv(L) : node L from graph neural network의 final layer
- Prediction head: final model's output
- Label : where label come from?
- Loss function : define loss function, what to optimize
Different Prediction Head
서로 다른 prediction head에 따라 서로 다른 task가 요구됨
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F4249de11-3726-4e44-a025-8edd47019365%2Fimage.png)
Node-level task
- node embedding 사용해 예측
- prediction head : { hv(L)∈Rd,∀v∈G }
get d-dim node embedding from GNN computation
- k way prediction의 경우
classification : classify among k categories
regression : regress on k target
- y^v=Headnode(hv(L))=W(H)hv(L)
output head of given node=matrix time final embedding of node
- W(H)∈Rk∗d : W(H) map node embedding from hv(L)∈Rd (embedding space) to y^v∈Rk (prediction)
Edge-level task
- node embedding pair 사용해 예측
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2Fede7c61e-9d06-4512-a028-1391b8ea642e%2Fimage.png)
- k-way prediction의 경우 : y^uv=Headedge(hu(L),hv(L))
option for edge level prediction
1. Concatenation + Linear transformation
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F723dd496-cf3d-46e5-9374-87fb2b48383c%2Fimage.png)
- y^uv=(hu(L))Thv(L)
- Linear : 2d-dimensional => k-dim embedding
2. Dot product
- y^uv=Linear(Concat(hu(L),hv(L)))
- only applied to 1 way prediction (binary prediction)
- for k-way prediction : multi-head attention과 유사
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F357b56c2-3539-4d94-ae95-1234b194e789%2Fimage.png)
- trainable different matrix W : W(1),W(2) ... W(k) 이용
=> every class get to learn its own transformation
- prediction for every class인 y^uv(1) ... y^uv(k) 를 concat해 final prediction인 y^uv 구함
Graph-level task
- graph 속 모든 node embedding 이용해 예측
- y^G=Headgraph{ hv(L)∈Rd,∀v∈G }
take individual node embedding for every node and aggregate them to find embedding in graph
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F5082cfd7-37ef-4a97-9e0f-85204a21481e%2Fimage.png)
- Headgraph(∗)과 GNN에서 AGG(∗) 유사
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F119b3205-deb1-4efc-afd3-00ad9f686e5c%2Fimage.png)
option for Graph level prediction
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2Fed1800ed-b1d1-4900-9462-50103ced9f9b%2Fimage.png)
위 3개는 small graph에 적합함
large graph를 global pooling하여 information lose
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2Fc4b180f3-d838-47c7-a758-5f5d97e4171f%2Fimage.png)
- solution :
Hierarchical Global Pooling
Hierarchical Global Pooling
- 모든 node embedding을 위계 따라 aggregate
개념 예시
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2Ff460f2dd-849b-4dfb-b09e-2b06caaeb00e%2Fimage.png)
DiffPool idea
: Hierarchically pool node embedding
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F79908383-6b2d-4c2b-8de3-40d4b5417f27%2Fimage.png)
3. Training Graph Neural Network
Prediction & labels : ground truth come from
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F7b77d004-fa23-4afe-9fb4-8cab6d94ff1d%2Fimage.png)
Supervised Label vs Unsupervised Signal
1. supervised learning
- label from external source
node label yv : citation network (node가 소속된 학문 분야)
edge label yuv : transition network (edge의 fraudulent 여부)
graph label yG : among molecular graph, drug likeness graph
- unsupervised learning
- only graph, no external label
- find supervision signal in graph
node label yv : Node statistics : clustering coefficient , Pagerank
edge label yuv : Link Prediction : hide edge between node and predict it
graph label yG :Graph statistics : predict each graph are isomorphic
Loss function : compute final loss
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F32665049-f111-4b9c-bba5-4a8c624ba96d%2Fimage.png)
Setting for GNN training
- N data point (node / edge / graph)
- y^(i) : prediction at all level for label y(i)
![](https://velog.velcdn.com/images%2Ftobigsgnn1415%2Fpost%2F8843acea-ed8a-4acf-8768-27d2451a2049%2Fimage.png)
GNN은 Classification과 Regression 모두 사용 가능 그러나 각각 loss function과 evaluation이 다름
Classification Loss
- Classification : 예측값 y^(i)의 discrete value
- Node classification : node가 속한 카테고리 예측
Cross entropy (CE)
![](https://velog.velcdn.com/images%2Ftobigsgnn1415%2Fpost%2F2c01c03c-9195-4480-b15e-4f3c6a32efff%2Fimage.png)
Regression Loss
- Regression : 예측값 y^(i)의 continuous value
- mocular graph의 drug 유사도 예측
Mean Squared Error (MSE) : L2 loss
![](https://velog.velcdn.com/images%2Ftobigsgnn1415%2Fpost%2F7cd1d56e-9527-450b-89dd-6d5aa2982d25%2Fimage.png)
Evaluation metric : measure success of GNN
![](https://velog.velcdn.com/images%2Ftobigsgnn1415%2Fpost%2Fac795673-19b7-4b40-b56e-a54211e8d71e%2Fimage.png)
Evaluate Regression
![](https://velog.velcdn.com/images%2Ftobigsgnn1415%2Fpost%2F390f15ea-981e-4637-967b-818976832c88%2Fimage.png)
Evaluate Classification
- multi-class classification : Accuracy
![](https://velog.velcdn.com/images%2Ftobigsgnn1415%2Fpost%2F61bfaecf-1a04-4b22-bd2f-7d5f8a96dc59%2Fimage.png)
- binary classification : Confusion matrix, ROC Curve
![](https://velog.velcdn.com/images%2Ftobigsgnn1415%2Fpost%2F76a1de7a-9860-4cbe-a4a3-282fc470d5d9%2Fimage.png)
- Accuracy : TP+TN+FP+FNTP+TN =∣Dataset∣TP+TN
- Precision : TP+FPTP
- Recall : TP+FPTP
- f1-score : P+R2P∗R
![](https://velog.velcdn.com/images%2Ftobigsgnn1415%2Fpost%2F188af0dd-dbae-4cc0-8029-d05abea5b566%2Fimage.png)
- TPR = recall = TP+FNTP
- FPR = FP+TNFP
- dashed line : random classifier의 performance
- threshold : binary classifier에 따라 값 다름
- ROC AUC : ROC curve 밑에 면적
4. Setting - up GNN Prediction Tasks
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2Ff0e427f7-3d76-42a8-81d9-91d05ffe5e5b%2Fimage.png)
Fixed split vs Random split
Fixed split
- Train : GNN parameters를 optimize에 사용
- Validation : model/hyperparameters를 develope
- Test : final performance를 report하는 데 사용
-> test 결과에 대한 해당 실행 결과 보장X
Random split
- randomly split training / validation / test dataset
-> 서로 다른 random seed에 대해 performance를 average함
Why Splitting Graph is special
image data splite하는 경우
- image classification은 모든 data point가 image로 각각의 data가 서로 independent함
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F7041b49a-8758-44db-afe8-b69a7e0a9458%2Fimage.png)
graph data splite하는 경우
- node classification은 모든 data point가 node로 각각의 data가 서로 dependent함
- node 1과 node 2는 node 5를 예측하는데 영향을 줌
: node 1과 node2가 train data, node 5가 test data인 경우, information leakage 발생 가능
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2Fd34fc5cf-94a4-4228-8064-541a47685ea6%2Fimage.png)
solution
1. Transductive setting
- entire graph in all dataset
- only split label
- dataset consist of one graph
- node / edge prediction
- train : entire graph, use node 1 & 2 label
- validation : entire graph, use node 3 & 4 label
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F9fd4c200-3fef-469a-a3da-e445e574498a%2Fimage.png)
2. Inductive setting
- different graph in each dataset
- dataset consist of multiple graph
- generalize to unseen graph
- node / edge / graph task
- train : embedding 계산 graph over node 1 & 2, train use node 1 & 2 label
- validation : embedding 계산 graph over node 3 & 4, evaluate node 3 & 4 label
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2Fd6d057e6-083a-4bb6-b932-b29f9c971ce8%2Fimage.png)
Example
1. Node Classification
- transductive node classification :
can observe entire graph but only observe respective nodes
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F52a097d2-a69b-4c7a-9894-673f8bf2ea35%2Fimage.png)
- inductive node classification :
each contain independent graph
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F30bb4b4d-baee-4da9-9171-45a99d1206a6%2Fimage.png)
- Graph Classification
- only in inductive setting
- test on unseen graph
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F85b8d4d8-66ce-4b92-932a-7a680be4ea39%2Fimage.png)
- Link Classification
- predict missing edge
- unsupervised / self-supervised task
-> create label & dataset split
= we hide edge & let GNN predict whether edge exist
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2Fd2684322-2a1c-4cf8-8356-0e30b9ef608f%2Fimage.png)
Setting up Link Prediction
1-0. Assign edge as Message edges
or Supervision edges
- Message edges : GNN message passing에 사용
- Supervision edges : objectives 계산에 사용
1-1.
message edges
: remain in graph
supervision edges
: supervise edge prediction made by model
2-0. split edges as train / validation / test
2-1. Inductive link prediction split의 경우
- contain independent graph in dataset
- 각 dataset에는
message edges
와 supervision edges
가 포함됨
supervision edges
not fed in GNN
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F8ef6521c-e4aa-4d2d-bed0-af0b4efe2724%2Fimage.png)
2-2. Transductive link prediction split의 경우
- after train,
supervision edges
known to GNN
- use
supervision edges
at validation time
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F9023203a-7ff0-4cff-9a4d-807bdcab9f7d%2Fimage.png)
- in sum, 4 types of edges
![](https://velog.velcdn.com/images%2Fhixkix59%2Fpost%2F51025b82-76d4-4d68-8028-85b112a01abe%2Fimage.png)
14기 김상현
GNN의 augmentation과 학습 방법에 대해 이해할 수 있었습니다.
유익한 강의 감사합니다:)