Graph (1): Disjoint Set

YJ·2025년 6월 17일

algorithm

목록 보기

11/16

용어 정리

Graph: vertex 와 edge 로 구성된 비선형 자료 구조
Vertex: 정점
Edge: 두 vertex 사이의 연결
Path: 한 vertex에서 다른 vertex로 이동하는 vertex의 순서
Path Length: 경로에 있는 간선의 수
Cycle: 시작점과 끝점이 같은 정점인 경로
Connectivity: 두 정점 사이에 경로가 하나 이상 존재하는 경우
Degree of a Vertex: 가중치 없는 그래프에서 정점에 연결되어 있는 간선의 수
In-Degree: Directed Graph 에서 정점으로 들어오는 간선의 수
Out-Degree: Directed Graph 에서 정점에서 나가는 간선의 수

유형

Undirected Graph: 양방향 관계
Directed Graph: 방향성을 갖는 관계
Weighted Graph: 가중치가 있는 간선을 가진 그래프
(Negative Weight Cycle: 모든 간선 가중치의 합이 음수인 Weigted Cycle)

Disjoint Set (aka union-find)

주요 용도

Vertex 간의 Connectivity 파악 (특정 원소들이 같은 그룹에 속해 있는지 판단)

컨셉

정점들끼리 이어져 있다면 같은 set에 집어넣는다. 그리고 set 마다는 대표 정점인 head 를 하나씩 갖는다. 두 정점이 연결되었는지를 확인하려면, 두 정점이 속한 set의 head 정점이 같은지 보면 된다.

(parent node: 정점에 바로 이어진 정점. 무엇을 parent 로 정할지는 알아서.. / root node: parent node 를 가지고 있지 않은 정점. head node 와 같다.)

구현

find 함수는 주어진 정점의 root 정점을 찾는 연산, union 함수는 두 정점을 잇는 연산을 한다. 즉, 두 정점의 root 를 같게 만드는 연산을 한다.

배열을 하나 둔다. 배열의 요소 하나하나가 각 정점이고, 이 배열의 값은 해당 정점의 parent (혹은 root) 정점이다. 무엇을 배열의 값으로 두느냐에 따라 복잡도가 달라진다.
(1) Quick Find: 여기서 root node 를 두면 find 연산이 빠르고(O(1)) union 연산은 느리고(O(N)),
(2) Quick Union: parent node 를 두면 find 연산은 느리고(최악 O(N)) union 연산은 빠르다(최악 O(N)).

N 개의 정점을 잇는다고 생각했을 때 Quick Find 는 O(N^2) 가 걸리고, Quick Union 은 최악의 경우(모든 정점이 일자로 이어졌을 때) O(N^2)가 걸리므로 Quick Union 으로 구현하도록 한다.

// Quick Union
class UnionFind {
    private int[] root;

    public UnionFind(int size) {
        root = new int[size];
        for (int i = 0; i < size; i++) {
            root[i] = i;
        }
    }

    public int find(int x) {
        while (x != root[x]) {
            x = root[x];
        }
        return x;
    }

    public void union(int x, int y) {
        int rootX = find(x);
        int rootY = find(y);
        if (rootX != rootY) {
            root[rootY] = rootX;
        }
    }

    public boolean connected(int x, int y) {
        return find(x) == find(y);
    }
}

다음은 Quick Union 을 최적화하는 방법이다.

Union 최적화: Union by Rank

Quick Union 의 경우에서 모든 정점이 일자로 이어지도록 Union 하는 것을 방지하기 위해 union by rank 로 최적화한다. 트리를 밸런스 있게 유지하기 위해서 짧은 트리를 긴 트리 아래 병합하는 것이다. 이 경우 Find 연산과 Union 연산 모두 O(logN) 의 복잡도를 갖는다.

// Disjoint Set with Union by Rank
class UnionFind {
    private int[] root;
    private int[] rank;  // ****

    public UnionFind(int size) {
        root = new int[size];
        rank = new int[size];
        for (int i = 0; i < size; i++) {
            root[i] = i;
            rank[i] = 1; 
        }
    }

    public int find(int x) {
        while (x != root[x]) {
            x = root[x];
        }
        return x;
    }

    public void union(int x, int y) {
        int rootX = find(x);
        int rootY = find(y);
        if (rootX != rootY) {
            if (rank[rootX] > rank[rootY]) {
                root[rootY] = rootX;
            } else if (rank[rootX] < rank[rootY]) {
                root[rootX] = rootY;
            } else {
                root[rootY] = rootX;
                rank[rootX] += 1;
            }
        }
    }

    public boolean connected(int x, int y) {
        return find(x) == find(y);
    }
}

Find 최적화: Path Compression

또한, Find 연산 최적화를 위해서 배열의 값인 parent node 를 root node 로 업데이트하는 Path Compression 기법을 쓸 수도 있다. 재귀를 사용한다. root node 가 root node 인 정점들을 모두 직접 이어버리는 것이다. 이로써 Find 와 Union 최고 O(1), 최악 O(N), 평균 O(logN)의 복잡도를 가진다.

class UnionFind {
    private int[] root;

    public UnionFind(int size) {
        root = new int[size];
        for (int i = 0; i < size; i++) {
            root[i] = i;
        }
    }

    public int find(int x) {
        if (x == root[x]) {
            return x;
        }
        return root[x] = find(root[x]); // *** root 업데이트
    }

    public void union(int x, int y) {
        int rootX = find(x);
        int rootY = find(y);
        if (rootX != rootY) {
            root[rootY] = rootX;
        }
    }

    public boolean connected(int x, int y) {
        return find(x) == find(y);
    }
}

Union by Rank와 Path Compression 모두 적용

모두 적용하면 Find 와 Union 모두 평균 O(N) 복잡도를 가지게 할 수 있다. (정확히는 O(a*N), a는 Inverse Ackermann function)

class UnionFind {
    private int[] root;
    // Use a rank array to record the height of each vertex, i.e., the "rank" of each vertex.
    private int[] rank;

    public UnionFind(int size) {
        root = new int[size];
        rank = new int[size];
        for (int i = 0; i < size; i++) {
            root[i] = i;
            rank[i] = 1; // The initial "rank" of each vertex is 1, because each of them is
                         // a standalone vertex with no connection to other vertices.
        }
    }

	// The find function here is the same as that in the disjoint set with path compression.
    public int find(int x) {
        if (x == root[x]) {
            return x;
        }
	// Some ranks may become obsolete so they are not updated
        return root[x] = find(root[x]);
    }

	// The union function with union by rank
    public void union(int x, int y) {
        int rootX = find(x);
        int rootY = find(y);
        if (rootX != rootY) {
            if (rank[rootX] > rank[rootY]) {
                root[rootY] = rootX;
            } else if (rank[rootX] < rank[rootY]) {
                root[rootX] = rootY;
            } else {
                root[rootY] = rootX;
                rank[rootX] += 1;
            }
        }
    }

    public boolean connected(int x, int y) {
        return find(x) == find(y);
    }
}

Hello

이전 포스트

Height-Balanced BST

다음 포스트