data:image/s3,"s3://crabby-images/9a5a0/9a5a0d638eb37f18236327862db918cd58f1a288" alt=""
Notation and terminology
- T[1..n]: the text
- P[1..m]: the pattern
- P occurs with shift s in T if T[s+j] = p[j] for 1≤j≤m.
- If P occurs with shift s in T, then we call s a valid shift; otherwise, an invalid shift
[1] The naïve string-matching algorithmdata:image/s3,"s3://crabby-images/09cfa/09cfaf315a12e34b37a8cc24fb281cfc097412ef" alt=""
data:image/s3,"s3://crabby-images/86e9e/86e9e3b55fa056278510e5d811780b123d1c59c3" alt=""
Problem in the naïve algorithm
- O((n-m+1)m) time → if m = n / 2, O(n2)
- The naïve string matching is inefficient because information gained about the text for one value of s is entirely ignored in considering other values of s.
- If P = aaab and we find s = 0 is valid, then none of the shifts 1, 2, or 3 are valid → 1, 2, 3 are invalid
[2] Rabin-Karp algorithm
- Main Idea
: length m string is regarded as m digits radix-d number.
- P[1..m] : Convert it into m-digit number p
- Substring T[s+1..s+m] : Convert it into m-digit number ts
Ex) ∑={0,1,2,...,9}, P[1..m] = 31425,
➔ p = 31,425
- If p = ts, then string matching!
- String matching problem
➔ is converted into number comparison problem
Exampledata:image/s3,"s3://crabby-images/97f7e/97f7efc127dde15a324c2c9abb3ccdb06f83bf67" alt=""
Convert string into number
- Use Horner’s rule
p = P[m] + 10(P[m-1]+10(P[m-2]+....+10(P[2]+10P[1])...))
- Ex) When P[1..m] = 31425, → Ө(m)
p= 5+10(2+10(4+10*(1+10*3))) = 31,425
→ What if Σ = {a, b ... z} : *10 → *26
- Is there faster way to calculate ts?
- Calculate t0 similarly.
Then, we can calculate ts+1 from ts
- ts+1 = 10(ts-10m-1T[s+1])+T[s+m+1]
- i.e., remove high order digit T[s+1] and bring low order
digit.
- Ex) When ts = 31415 and T[s+5+1]= 2, → Text = ... 314152 ...
ts+1 = 10(31415 – 10000*3) +2 = 14152 → Ө(1)
Computing p & t0 : Θ(m)
Computing t1, ...tn−m : Θ(n-m) or Θ(n)
→ Ө(1) n-m개 ( n - m + 1(t0) )
Comparing p with ts
- How long will it take to compare p with ts?
- Constant time if m is very small
- Otherwise ... → constant time이라고 할 수 없다
- Cure for the problem : Use ‘modulo’ operation.
When comparing two numbers, we do not compare the numbers directly. Instead, take ‘modulo q’ operation and compare.
- However,the solution of working ‘modulo q’ is not perfect, since ts ≡ p (mod q) does not imply ts = p.
- Valid : ts ≡ p (mod q) and ts = p
- Spurious hit : ts ≡ p (mod q) but ts ≠ p
- However, if ts ≠ p (mod q) , there is no chance that ts = p.
- Ex) 67399 != 31415 but, 67399 ≡ 31415 (mod 13)
Algorithmdata:image/s3,"s3://crabby-images/d6248/d62484ae60313977bfbb56701ac557b75b5ca086" alt=""
data:image/s3,"s3://crabby-images/a9f28/a9f289ce3c886aed985cc8928e4ba59c09730bb4" alt=""
Modulo operation
- q : The modulus q is typically chosen as a prime number such that d*q just fits within one computer word in d-ary alphabet.
- Recalculation of p and t
- p = original p (mod q)
- ts+1=(d(ts - T[s+1]h) + T[s+m+1]) (mod q) → Ө(1)
→ Constant time
Analysis
- Θ(m) preprocessing time --- calculation of p and t0
- Θ((n-m+1)m) worst-case running time → 모두 mod q 값이 동일
- Θ(n-m+1) times to find all s such that p = ts
- Θ(m) time to check if each s is really valid
- However we expect few valid shifts
→ 그러나 유효한 변화가 거의 없을 것으로 예상, 실제로는 worst보다 빠를 것이다
[3] String matching with finite automatadata:image/s3,"s3://crabby-images/fabd6/fabd6afeed2cdb9d10644a02e78b677557e99c66" alt=""
[4] Knuth-Morris-Pratt Algorithm
- Consider the operation of the naïve string matcher
When 6th pattern character fails to match the corresponding text character, where can we resume the match again?
- We don’t have to resume the match from the character right next to it!
data:image/s3,"s3://crabby-images/e5859/e58593009ea4d372b750e592532410f741ba4827" alt=""
- In general, it is useful to know the answer to the following question :
Given that pattern characters P[1..q] match text characters T[s+1..s+q], what is the least shift s’ > s such that
P[1..k] = T[s’+1..s’+k], where s’ + k = s + q?data:image/s3,"s3://crabby-images/2353c/2353c2b2adbbafb1e70972bad2f1b9a7372ca21c" alt=""
- The necessary information can be precomputed by comparing the pattern against itself
→ Text 필요 x, Pattern으로 비교 가능data:image/s3,"s3://crabby-images/d29c3/d29c3c8246e3add6a6a069279f4780c6dcf18c29" alt=""
Prefix function Πdata:image/s3,"s3://crabby-images/fdfef/fdfefe489906fe35dddf540f4161b4c8c8f5b278" alt=""
data:image/s3,"s3://crabby-images/e42be/e42bea6123e0a2c708c7e008824e20586e3e8389" alt=""
data:image/s3,"s3://crabby-images/e195c/e195cb1821487ec2dc56009d8132bcbfb920727f" alt=""
data:image/s3,"s3://crabby-images/75c7c/75c7cc45885dd001daaa6cf7193bdf652785fb50" alt=""
data:image/s3,"s3://crabby-images/3db9b/3db9be374753603029ab1de5e63088d473f4cbfa" alt=""
Exampledata:image/s3,"s3://crabby-images/bc4d2/bc4d22e7e93d15bae76dfae9e9ad51126810d122" alt=""
Case1
First character of the pattern does not match character of the text. (P[1] ≠ T[i])
➔ Shift pattern by 1
== increment text index by 1
(no change of pattern index)data:image/s3,"s3://crabby-images/ce219/ce219307a5af3c0901ad11766b1ec622f186fce3" alt=""
Case2
Character (other than first) of the pattern does not match character of the text. (P[q+1] ≠ T[i])
➔ Shift pattern by value Π value
(Prefix of P has already been compared)data:image/s3,"s3://crabby-images/2e450/2e45076427c8fcd8c3551f7a920e38b3084666c8" alt=""
Case3
Character of the pattern matches character of the text. (P[q+1] = T[i])
➔ Increment text index and pattern index by 1
And if all patterns are matched
➔ match is found
Algorithmdata:image/s3,"s3://crabby-images/a0d75/a0d75a37eb0fb6a75c933e525b3103b1ee53a90f" alt=""
Exampledata:image/s3,"s3://crabby-images/ad649/ad6499b556821ed327b27f6d864265b7766808dd" alt=""
Running-time
- Using the amortized analysis (Sec 17.3)
COMPUTE-PREFIX-FUNCTION : Θ(m)
KMP-MATCHER : Θ(n)
Exercisedata:image/s3,"s3://crabby-images/19104/1910447a278cce1b365c0d2a5cf8b81a24ff9eaf" alt=""
Exercise in Class
1. How many spurious hits does the Rabin-karp matcher (modulo q = 11)
When looking for the pattern p = 26?
→ 26 % 11 = 4, %11 = 4인 결과는 4개 그 중 3개가 spurious hits
2. Compute the prefix function Π for the patterndata:image/s3,"s3://crabby-images/65daa/65daa4259cebd1c2dd5a92d6208807ffcc4a3520" alt=""
3. Run the KMPdata:image/s3,"s3://crabby-images/aa22e/aa22e0b70b1bcacef10e3bfab1314a84c389c4a4" alt=""
HGU 전산전자공학부 용환기 교수님의 23-1 알고리듬 분석 수업을 듣고 작성한 포스트이며, 첨부한 모든 사진은 교수님 수업 PPT의 사진 원본에 필기를 한 수정본입니다.