[Algorithm] String-searching: KMP algorithm

William JO·2021년 9월 18일
1

Algorithm

목록 보기
1/2

KMP(Knuth-Morris-Pratt) algorithm

  • 문자열 중 특정 패턴을 찾아내는 문자열 검색 알고리즘 중 한가지
  • 연산량을 최대한 줄이기 위해 접두사와 접미사의 개념을 활용하여 jump하는 기법

    string length : N, pattern length : M

    • Brute Force(완전 탐색): O(N*M)
    • KMP : O(N+M)

KMP table

  • 접두사(prefix)와 접미사(suffix)의 최대 일치 길이를 저장한 table을 생성 및 반환
  • Failure function이라고도 불림

출처 : https://m.blog.naver.com/PostView.nhn?blogId=ndb796&logNo=221240660061&referrerCode=0&searchKeyword=KMP

코드

def KMP_table(pattern : str) -> list:
    """
        Failure Function.
        
        args:
            pattern(str) : string pattern for making table.
        
        returns:
            kmp_tbl(list) : kmp table contains max lengths of matched prefixes and suffixes
    """
    p_len = len(pattern) # pattern length
    kmp_tbl = [0 for _ in range(p_len)] # kmp table

    p_idx = 0 # pattern index

    # exclude idx 0, [1, p_len)
    for idx in range(1, p_len): 
        # update p_idx with p_idx-1's matched length until pattern matches.
        while p_idx > 0 and pattern[idx] != pattern[p_idx]:
            p_idx = kmp_tbl[p_idx-1] 
        
        # p_idx increases and save in kmp_table[idx] when pattern matches.
        if pattern[idx] == pattern[p_idx]: 
            p_idx += 1
            kmp_tbl[idx] = p_idx

    return kmp_tbl

KMP

  • KMP table을 활용하여 KMP 알고리즘 수행

코드

def KMP(string : str, pattern : str) -> None:
    """
        KMP algorithm.
        
        args:
            string(str) : string to be searched
            pattern(str) : pattern for searching
    """
    kmp_tbl = KMP_table(pattern) # KMP table for pattern
    s_len = len(string) # string length
    p_len = len(pattern) # pattern length

    p_idx = 0
    for s_idx in range(s_len):
        # update p_idx with [:p_idx-1]'s matched length until string and pattern matches.
        while p_idx > 0 and string[s_idx] != pattern[p_idx]:
            p_idx = kmp_tbl[p_idx-1]

        if string[s_idx] == pattern[p_idx]:
            # print start index of matched pattern 
            # and update p_idx with [:p_idx]'s string and pattern matched length 
            # when (string and pattern matched length == pattern length).
            if p_idx == (p_len - 1):
                print(f'{s_idx - p_len + 1}번째 index부터 같은 pattern 입니다.')
                p_idx = kmp_tbl[p_idx]
            # p_idx increases with s_idx.
            else:
                p_idx += 1

Reference

0개의 댓글