[CS224W] 2. Traditional Methods for Machine Learning in Graphs

Cherishยท2023๋…„ 2์›” 4์ผ
0

CS224W

๋ชฉ๋ก ๋ณด๊ธฐ
2/7

๐Ÿ”จ Machine Learning Tasks

  • Node-level prediction
  • Link-level prediction
  • Graph-level prediction

๐Ÿ”จ Traditional ML Pipleline

  • Design features for nodes/links/graphs
  • Obtain features for all training data

1. Train an ML model
Data point, node, link, graph ๋“ฑ์„ feature Vector๋กœ ๋ณ€ํ™˜ํ•ด ML model์„ ํ•™์Šต์‹œํ‚จ๋‹ค
2. Apply the model
์ƒˆ๋กœ์šด ์ž…๋ ฅ์ด ๋“ค์–ด์˜ค๋ฉด feature Vector๋ฅผ ์–ป๊ณ  ๋ชจ๋ธ์„ ํ†ตํ•ด ์˜ˆ์ธกํ•œ๋‹ค.

๐Ÿ”จ Feature Design

Using effective features over graph !!

โœ”๏ธ Goal : ์ž…๋ ฅ Set์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ์˜ˆ์ธก ๊ฐ’์„ ๋งŒ๋“ค์–ด๋‚ด๋Š” ๊ฒƒ
โœ”๏ธ Design choices
[] Features : d ์ฐจ์›์˜ ๋ฒกํ„ฐ
[] Objects : Nodes, Links, Sets of Nodes, Entire Graph
[] Objective : ์šฐ๋ฆฌ๊ฐ€ ํ’€๋ ค๊ณ  ํ•˜๋Š” task

๐Ÿ“ Node-Level Features

Goal : Characterize the structure and position of a node in the network

โœ”๏ธ 1. Node degree (kv)

  • kv = Node v์˜ degree
  • kv = v๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” Edge(Link)์˜ ์ˆ˜

โœ”๏ธ 2. Node centrality (cv)

  • Node Degree๋Š” ์ด์›ƒํ•œ Node์˜ ๊ฐœ์ˆ˜๋งŒ ์…€ ๋ฟ, ์ค‘์š”๋„(importance)๋ฅผ caputureํ•  ์ˆ˜ ์—†๋‹ค.
  • Node Centrality๋Š” Graph์—์„œ ํ•ด๋‹น Node v์˜ importance๋ฅผ ํฌํ•จํ•œ๋‹ค

    2-1. Engienvector centrality

  • Node v๊ฐ€ Important ์ด์›ƒ๋…ธ๋“œ u์— ๋‘˜๋Ÿฌ์‹ธ์—ฌ์žˆ์„ ๋•Œ v๋Š” Important ํ•˜๋‹ค.

    2-2. Betweenness centrality

  • v๊ฐ€ ๋‹ค๋ฅธ ๋…ธ๋“œ๋“ค์„ ์—ฐ๊ฒฐํ•˜๋Š” ์ตœ๋‹จ ๊ฒฝ๋กœ์— ์žˆ์„ ๋•Œ Importantํ•˜๋‹ค๊ณ  ํ•œ๋‹ค.

    2-3. Closeness centrality

  • v๊ฐ€ ๋‹ค๋ฅธ ๋ชจ๋“  ๋…ธ๋“œ์— ๋Œ€ํ•œ ์ตœ๋‹จ ๊ฒฝ๋กœ์˜ ๊ธธ์ด๊ฐ€ ์งง์„ ๋•Œ Important

โœ”๏ธ Clustering coefficient

  • Node v์˜ ์ด์›ƒ๋“ค์ด ์–ผ๋งˆ๋‚˜ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ๋Š”์ง€๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ฐœ๋…

โœ”๏ธ Graphlets

Connected(์„œ๋กœ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ๋Š”) induced non-isomorphic subgraphs
Goal : Node u์˜ ์ด์›ƒ๊ตฌ์กฐ๋ฅผ ๊ธฐ์ˆ 

  • induced subgraph : ์–ด๋–ค ๋…ธ๋“œ๋“ค์„ ์„ ํƒํ–ˆ์„ ๋•Œ, ํ•ด๋‹น ๋…ธ๋“œ๋“ค ์‚ฌ์ด์— ์—ฐ๊ฒฐ๋œ ๋ชจ๋“  edge๋“ค์„ ํฌํ•จํ•˜๋Š” subgraph
  • graph isomorphism : ๋‘ ๊ทธ๋ž˜ํ”„๊ฐ€ ๊ฐ™์€ ๊ฐœ์ˆ˜์˜ ๋…ธ๋“œ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๊ณ , ๋…ธ๋“œ๋“ค์ด ๋˜‘๊ฐ™์ด ์—ฐ๊ฒฐ๋˜์–ด ์žˆ๋Š” ๊ฒƒ
    ์–ธ๋œป ๋ณด๊ธฐ์— ๋‹ฌ๋ผ๋ณด์ด์ง€๋งŒ ๋…ธ๋“œ๋ฅผ ์ž˜ ๋งคํ•‘ํ•˜๋ฉด edge๊ด€๊ณ„๊ฐ€ ๊ฐ™๊ธฐ ๋•Œ๋ฌธ์— isomorphism grph์ด๋‹ค.

โœ”๏ธ Graphlet Degree Vector(GDV)

= Node์˜ Graphelt-Based Feature

  • Degree of Graphlet : ํŠน์ • Node๊ฐ€ ํฌํ•จ๋œ Graphlet์˜ ๊ฐœ์ˆ˜ ๋ฒกํ„ฐ
    1. Graph G์—์„œ Node u๋ฅผ ํƒ€๊นƒ์œผ๋กœ ์ •ํ•œ ์ƒํ™ฉ
      2. Graph G์—์„œ ์ตœ๋Œ€ ๋…ธ๋“œ๊ฐ€ 3๊ฐœ์ธ Graphlet 3๊ฐœ๋ฅผ ๋„์ถœํ•  ์ˆ˜ ์žˆ๋‹ค.
      3. ๊ฐ Graphlet์ด G์—์„œ u๋ฅผ ํฌํ•จํ•œ ์ฑ„๋กœ ๋ช‡๋ฒˆ ๋‚˜ํƒ€๋‚˜๋Š”์ง€ count 4. Node u์˜ GDV๋Š” [2,1,0,2] ๊ฐ€ ๋œ๋‹ค.

Node-Level Feature Summary

โœ”๏ธ Importance Based

a. Node Degree : ์ด์›ƒ์˜ ๊ฐœ์ˆ˜ count
b. Node Centrality : Graph์—์„œ์˜ ์ด์›ƒ ๋…ธ๋“œ ์ค‘์š”๋„๋ฅผ ๋ชจ๋ธ๋ง

โœ”๏ธ Structure Based

a. Node Degree : ์ด์›ƒ์˜ ๊ฐœ์ˆ˜ count
b. Clustering Coefficient : ์ด์›ƒ์ด ์–ด๋–ป๊ฒŒ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ๋Š”์ง€ ์ธก์ •
c. Graphlet Count Vector : ์—ฌ๋Ÿฌ Graphlet๋“ค์ด ์ถœํ˜„ํ•˜๋Š” ๋นˆ๋„๋ฅผ count

์กด์žฌํ•˜๋Š” Link๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ƒˆ๋กœ์šด Link๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ

  • ๋žœ๋คํ•˜๊ฒŒ ์‚ฌ๋ผ์ง„ Link ์ฐพ๊ธฐ -> Staticํ•œ Graph์— ์ ์ ˆ
  • ์‹œ๊ฐ„์ด ํ๋ฆ„์— ๋”ฐ๋ผ ์ƒ๊ฒจ๋‚˜๋Š” Link ์ฐพ๊ธฐ -> Dynamicํ•œ Graph์— ์ ์ ˆ ex) SNS

โœ”๏ธ Features

  • Distance-Based : ๋‘ Node๊ฐ„ ์ตœ๋‹จ ๊ฒฝ๋กœ์˜ ๊ฑฐ๋ฆฌ

    ๋‹จ์  : neighborhood overlap์˜ ์ •๋ณด๋Š” ์‚ฌ์šฉ๋˜์ง€ ์•Š๋Š”๋‹ค.
  • Local Neighborhood Overlap : ๋‘ Node๊ฐ€ ๊ณต์œ ํ•˜๋Š” ์ด์›ƒ์˜ ์ˆ˜

  • Global Neighborhood Overlap : ๋‘ Node๋ฅผ ์ž‡๋Š” ๋ชจ๋“  ๊ธธ์˜ ๊ฒฝ๋กœ ๊ฐ€์ค‘ํ•ฉ

    -> ์ „์ฒด ๊ทธ๋ž˜ํ”„๋ฅผ ๊ณ ๋ คํ•˜์—ฌ local neighborhood overlap์˜ ๋‹จ์ ์„ ๊ทน๋ณตํ•œ๋‹ค

๐Ÿ“ Graph-Level Features

Goal : Feature Vector ๋Œ€์‹  ์ „์ฒด Graph ๊ตฌ์กฐ๋ฅผ ํŠน์ •ํ•  ์ˆ˜ ์žˆ๋Š” Feature์„ ๋งŒ๋“ค์ž

โœ”๏ธ Features

  • Graphlet Kernel

    Bag-of-Graphlets๋กœ feature vector์„ ๋งŒ๋“  ํ›„ kernel์„ ๊ณ„์‚ฐํ•œ๋‹ค
    ๋‹จ์  : graphlet์„ countํ•˜๋Š” ๊ฒƒ์ด ๋น„์‹ธ๊ณ , ๋น„ํšจ์œจ์ ์ด๋‹ค.

  • Weisfeiler-Lehman(WL) Kernel

    ์žฅ์ : efficient, ์ „์ฒด ์‹œ๊ฐ„ ๋ณต์žก๋„ O(# of edges) / GNN๊ณผ ๊ด€๋ จ์ด ํฌ๋‹ค
  • [ Color Refienment ] : After K steps of color refinement, the structure of K-hop neighborhood is summarized
    ๋น„์Šทํ•œ Graph ๋‘๊ฐœ (G1,G2)๊ฐ€ ์ฃผ์–ด์ ธ์žˆ๋‹ค.
  1. ๋™์ผํ•œ Initial Color๋ฅผ ๋ชจ๋“  Node์— ํ• ๋‹น
  2. ์ด์›ƒ๋…ธ๋“œ์˜ ์ƒ‰์„ Aggregate ํ•ด์ค€๋‹ค
  3. Aggregate๋œ color๋ฅผ Hash ํ•œ๋‹ค
  4. ์ด์›ƒ๋…ธ๋“œ ์ƒ‰์„ Aggregate ํ•œ๋‹ค
  5. Aggregate๋œ color๋ฅผ Hash ํ•œ๋‹ค
  6. Color Fefinement๊ฐ€ ๋๋‚˜๋ฉด WL Kernel์ด ๊ฐ Color๊ฐ€ ๋“ฑ์žฅํ•œ ํšŸ์ˆ˜๋ฅผ countํ•œ๋‹ค
  7. Color Count Vector๋ฅผ ๋‚ด์ ํ•˜์—ฌ WL Kernel์˜ ๊ฒฐ๊ณผ๊ฐ’์„ ๊ตฌํ•œ๋‹ค

Reference

https://www.youtube.com/watch?v=3IS7UhNMQ3U&list=PLoROMvodv4rPLKxIpqhjhPgdQy7imNkDn&index=4

0๊ฐœ์˜ ๋Œ“๊ธ€