BOJ 7432) Disk tree + 문자열 리스트 정렬 기준

Wonjun Lee·2024년 6월 12일

문제 링크)

https://www.acmicpc.net/problem/7432

입력

The first line of the input file contains single integer number N (1 ≤ N ≤ 500) that denotes a total number of distinct directory paths. Then N lines with directory paths follow. Each directory path occupies a single line and does not contain any spaces, including leading or trailing ones. No path exceeds 80 characters. Each path is listed once and consists of a number of directory names separated by a back slash ("\").

Each directory name consists of 1 to 8 uppercase letters, numbers, or the special characters from the following list: exclamation mark, number sign, dollar sign, percent sign, ampersand, apostrophe, opening and closing parenthesis, hyphen sign, commercial at, circumflex accent, underscore, grave accent, opening and closing curly bracket, and tilde ("!#$%&'()-@^_`{}~").

출력

Write to the output file the formatted directory tree. Each directory name shall be listed on its own line preceded by a number of spaces that indicate its depth in the directory hierarchy. The subdirectories shall be listed in lexicographic order immediately after their parent directories preceded by one more space than their parent directory. Top level directories shall have no spaces printed before their names and shall be listed in lexicographic order. See sample below for clarification of the output format.

풀이과정

문제이해

이 문제는 N개의 문자열이 들어왔을 때, 문자열을 백 슬래쉬로 분할한 토큰 단위로 디렉토리 구조를 만드는 것이다.

파일 시스템이 트리 구조라는 점에서 이 문제또한 특정한 모양의 트리를 구성하라는 것을 요구하고 있다고 볼 수 있다.

입력의 범위가 최대 500 문장이라는 점, 각 문장이 최대 80개의 문자로 이뤄진다는 점에서 \를 감안했을 때, 40개의 문자 토큰이 생성될 수 있다. 최악의 경우를 상정한다면, 2000개의 토큰을 다루게 되는데, 이것은 트리 구성을 위한 시간이고, 트리 출력을 위한 시간까지 더하더라도 O(N)의 시간복잡도를 요구할 것이다.

출력할 때는 동일레벨 디렉토리 요소들을 사전순(Lexicographic order)으로 출력한다.

접근방법

어떤 방법으로 트리를 구성할지를 생각해보자. 이 문제는 사실 트리의 구성만 끝나면 다 푼 문제라고 할 수 있다.

파일이나 디렉토리 이름 문자열을 사용하기 때문에, 쉽게 딕셔너리를 트리로 이용해보고자 할 수 있다.

토큰이 몇 번째냐에 따라 노드의 레벨을 알 수 있으니 제일 처음 등장하는 토큰들을 루트로 하여 재귀적으로 저장한다면 쉽게 트리를 구성할 수 있다. 그리고 딕셔너리에 없는 토큰이라면 새로운 key와 빈 리스트를 추가하면 된다.

하지만, 딕셔너리는 key값이 중복될 수 없다는 단점이 있다.

다음과 같은 입력에 대해 딕셔너리 기반 트리는 오류를 만들어 낸다.

A/AA/AAA/AAAA
AAAA/AAAA/AA/A

첫 번째 경로에서 AAAA는 최하위 디렉토리이다. 반면 두 번째 경로에서 AAAA는 루트 디렉토리에 존재한다.

딕셔너리를 이용할 경우 레벨에 따라 디렉토리 명을 분류할 수 없다.

그러므로 나는 Node 클래스를 생성하여 name, child dirs를 저장하게 만들어 해결하였다.

코드 설계

각 디렉토리 별로 자식 디렉토리들을 사전순으로 정렬해야 하며 자식의 개수가 정해지지 않고, 문자열 이름으로 노드 객체에 접근해야 하므로 directory를 이용한다.

디렉토리 자체의 이름을 저장해야 하므로 Node 객체에 이름을 저장한다.

디렉토리의 레벨에 따라 이름 앞에 공백이 출력되어야 한다. 따라서 depth를 Node 객체에 저장한다.

모든 디렉토리의 조상격인 root 노드를 먼저 생성하고 depth는 -1로 한다. 자식을 추가할 때마다 부모 depth + 1을 한다.

최종적으로 출력시엔 사전순으로 깊이 우선 탐색해야한다.

사전순 정렬은 sort() 함수를 호출하면 된다.

다음은 이를 구현한 코드이다.

import sys
sys.setrecursionlimit(40000)
class Node :
    def __init__(self, name, depth) :
        self.name = name
        self.childs = {}
        self.depth = depth

def solve() :
    n = int(sys.stdin.readline())
    root = Node('root',-1)
    root.depth = -1
    
    for i in range(n) :
        parent = root
        path = input().split('\\')
        for token in path :
            if token not in parent.childs :
                parent.childs[token] = Node(token, parent.depth + 1)
            parent = parent.childs[token]
    
    def dfs(root, keys) :
        if keys == None : 
            return
        for c in keys :
            print(f'{" "*root.childs[c].depth}{root.childs[c].name}')
            dfs(root=root.childs[c], keys= sorted(list(root.childs[c].childs.keys())))
    dfs(root=root, keys = sorted(list(root.childs.keys())))
solve()