Uncovering High-Order Cohesive Structures: Efficient (π‘˜,𝑔)-Core Computation and Decomposition for Large Hypergraphs

SouthgiriΒ·2025λ…„ 7μ›” 15일

Graph Paper Reivew

λͺ©λ‘ 보기
2/2
post-thumbnail

Abstract

  • Cohesive subgraph discovery
  • Selecting appropriate parameters is an open question
  • Aim to design an efficient indexing structure to retrieve cohesive subgraphs

1. Introduction

Limitation of existing approaches

  • Existing approaches primarily handle pairwise relationships, relying on indirect representations of higher-order structures such as motif and cliques
  • To better capture higher-order structures beyond pairwise interactions, hypergraphs have emerged as a more flexible structure where a single hyperedge can connect multiple nodes simultaneously
  • kk-hypercore only consider neighbor constraint
  • (k,d)(k,d)-core incorporates a degree constraint
    Assumes that if any node in a hyperedge is removed, the hyperedge must also be discarded

Co-occurrence pattern

  • Accounts for the frequency of repeated interactions between nodes in hypergraphs

2. Problem Statement

Notations and Definition

  • deg(v)deg(v): the count of hyperedges containing vv
  • ∣e∣|e|: the number of nodes in ee

Def 1. Support

  • Support value s(u,v)s(u,v) with two nodes u,v∈Vu,v \in V is the number of hyperedges in which both nodes co-occur

Def 2. 𝑔-Neighbour

  • Given a node u∈Vu \in V, and a support threshold gg, a node v∈Vv \in V is called a gg-neighbour of uu if the support value between uu and vv is greater than or equal to gg,
    i.e., s(u,v)β‰₯gs(u,v) \geq g
  • Set of gg-neighbours of a node uu as Ng(u)N_g(u)

Def 3. (π’Œ, π’ˆ)-core

  • Given a neighbour size threshold kk and a support threshold gg, Ck,gC_{k,g} is the maximal set of nodes where each node has at least kk neighbour nodes as its gg-neighbours within G[Ck,g]G[C_{k,g}]

Key properties of the (π’Œ, π’ˆ)-core

1. Uniqueness of (k,g)(k,g)-core

  • The maximal subset of nodes satisfying the given kk and gg constraint within the hypergraph is unique

2. Containment of (k,g)(k,g)-core

  • A hierarchical structure, meaning that both the (k+1,g)(k+1,g)-core and the (k,g+1)(k,g+1)-core are contained within the (k,g)(k,g)-core

3. (π’Œ, π’ˆ)-core Computation

Efficient Peeling Algorithm (EPA)

  • The naive approach maintains an explicit gg-neighbour set for every node and continuously updates the pairwise co-occurrence counts
  • EPA only tracks the number of gg-neighbours for each node

Pseudo Code

  • Intially for each node, the size of its gg-neighbour is computed and recorded (4-6)
  • Nodes that do not satisfy the (k,g)(k,g)-core criteria are identified and marked for removal (7-8)
  • The number of gg-neighbours for each affected node is updated (13-15)
  • The final set of nodes constitutes the (k,g)(k,g)-core

Time Complexity

  • O(∣eβˆ—βˆ£β‹…D)O(|e^*|\cdot D)
  • DD : the total sum of the degree of all nodes in the hypergraph
  • ∣eβˆ—βˆ£|e^*| : the maximum cardinality among all hyperedges
  • To compute the gg-neighbours of a single node vv, examines each hyperedge containing vv (deg(v)deg(v) such hyperedges)
  • Iterates through up to ∣eβˆ—βˆ£|e^*| nodes to identify those that co-occur with vv
  • Resulting in a time complexity of O(∣eβˆ—βˆ£β‹…deg(v))O(|e^*| \cdot deg(v))
  • Considering all nodes, the total becomes O(∣eβˆ—βˆ£β‹…D)O(|e^*| \cdot D)

Space Complexity

  • O(∣V∣)O(|V|)
  • Store only the size of gg-neighbours

4. (π’Œ, π’ˆ)-core Decomposition

4.1. Coreness of the (π’Œ, π’ˆ)-core

  • The coreness of a node refelcts the maximum level of (k,g)(k,g)-core that the node can belong to,
    indicating the intensity of engagement of each node within the network

Def 4. kk-coreness

  • Given a positive integer kk, the maximum integer gβ€²g' for which vv belong s to the (k,gβ€²)(k,g')-core but not to the (k,gβ€²+1)(k,g'+1)-core
  • The kk-coreness indicates maximal co-occurrence frequency that a node can have

Def 5. gg-coreness

  • Given a positive integer gg, the maximum integer kβ€²k' for which vv belong s to the (kβ€²,g)(k',g)-core but not to the (kβ€²+1,g)(k'+1,g)-core
  • The gg-coreness represents the highest levels of neighbour connectivity

Def 6. (k,g)(k,g)-coreness

  • Given a node v∈Vv \in V, the (k,g)(k,g)-coreness of a node vv is the maximal (k,g)(k,g) pairs such that vv is in the (k,g)(k,g)-core
    but not in the (kβ€²,g)(k',g)-core or the (k,gβ€²)(k,g')-core, where kβ€²>kk'>k and gβ€²>gg'>g
  • A node can have multiple (k,g)(k,g)-coreness values as long as each is maximal

4.2. Bucket-based decomposition algorithm

Bucket-based Coreness Algorithm (BCA)

  1. Enumerate all possible (k,g)(k,g)-cores
  2. Remove duplicates by leveraging the hierarchical properties of the (k,g)(k,g)-core

Pseudo Code

  • Initially, kk is set to 0 and gg to 1
  • For each node, the number of gg-neighbours is computed and nodes are assigned to buckets (5-10)
    Can focus on only those nodes whose gg-neighbour count falls below a kk threshold rather than checking all nodes
  • Nodes that do not satisfy the kk constraint are marked for removal (14-20)
  • After reducing their gg-neighbour count by one, the gg-neighbours of the removed nodes are reassigned to the buckets (21-27, 32)
  • Any node whose updated gg-neighbour count falls below kk is also marked for removal and deleted in the next iteration (28)

DedDuplication

  • If the node appears in a higher core value, it is removed from the current core

Time Complexity

  • O(gβˆ—β‹…βˆ£eβˆ—βˆ£β‹…D)O(g^* \cdot |e^*| \cdot D)
    • gβˆ—g^* : the maximum gg value
  • The process iterates up to gβˆ—g^*
  • Each iteration involves at most ∣eβˆ—βˆ£β‹…D|e^*| \cdot D computations

Space Complexity

Three factors

  • Storing the (k,g)(k,g)-coreness of each node
  • Maintaining the count of gg-neighbours
  • Grouping nodes based on their gg-neighbour count
  • Storage requirement of the (k,g)(k,g)-coreness for every nodes is O(min(kβˆ—,gβˆ—)∣V∣)O(min(k^*,g^*)|V|)
  • Storing the number of gg-neighbours for each node requires O(∣V∣)O(|V|) space
  • Additionally, grouping can store up to O(∣V∣)O(|V|) nodes
  • ∴O(min(kβˆ—,gβˆ—)∣V∣)\therefore O(min(k^*,g^*)|V|)

5. Experiments

5.2 Experimental Setting

  • kk-hypercore, nbr-kk-core and (k,d)(k,d)-core identify strongly induced subgraphs, which means that any node removed must also discard all hyperedges containing it

5.3 Experimental Results

EQ1. Impact of Parameter Variations on Subhypergraph Co-hesion

  • To observe the impact of changing gg while keeping kk constant, fix kk and vary gg over the values 3~7
    Similarly, fix gg and vary kk

  • In the Contact dataset, the average degree drops when gg reaches 7
    This occurs due to a substantial reduction in hyperedges resulting from stricter co-occurence constraints


EQ2. Running Time with Varying Parameters

EQ5. Comparison of (k,g)(k,g)-core with other models

  • By incorporating both degree and co-occurrence constraints, resultant groups are dense subhypergraphs

EQ7. Distribution of (k,g)(k,g)-coreness and (k,d)(k,d)-coreness

  • (k,g)(k,g)-core enables more granular hierarchies, capturing a broader spectrum of internal cohesion levels

0개의 λŒ“κΈ€