가장 많이 쓰이는 문자열 매칭 알고리즘 (Exact String Matching)들은
이중 Boyer-Moore Algorithm을 implement 해 보았다.
Boyer-Moore은 두 가지 heuristic의 조합으로 이루어져 있다.
Bad Character Heuristic works under two simple rules.
Once the text and the pattern we desire to match are aligned, each letter are compared from right most towards the left.
At the instance of a mismatch, the pattern may,
Shift right to match the mismatched character of the text with a character located further left
Or, move pass the mismatching character (if the pattern does not contain the mismatched character)
Good Suffix Heuristic, unlike bad character heuristic, works with the concept of prefix and suffix for determining the pattern's shift.
Again comparing pattern and text from right most location,
To use bad character and good suffix heuristics, both algorithm requires pre-defined arrays which marks the range of jumping for each instances of the mismatch.
The pre-processed array for bad character defines a numerical value for the pattern to jump
(if there are more than one occation of a character, the right most occation of the character is the standard)
The preprocessing of good suffix requires two arrays (border and shift)
The each value of shift[i] indicates the value of which the pattern must shift when a mismatch happens at i-1
The border array contains the starting index of the widest border of each position of the pattern.
Both the good suffix heuristic and bad character heuristic was used for the algorithm
(Uses both heuristic and apply the biggest shift possible)
Input
Mode 1
returns boolean (Yes/No) value if each pattern exists within the given string
Mode 2
returns the first index of each pattern's appearance within each given string
Mode 3
returns all index of each pattern's appearance within each given string
[1] https://www.geeksforgeeks.org/boyer-moore-algorithm-for-pattern-searching/
[2] https://www.geeksforgeeks.org/boyer-moore-algorithm-good-suffix-heuristic/