TIL 002 Suffix array (SA)

์กฐ์„ฑํ˜„ยท2020๋…„ 12์›” 20์ผ
0

์ •๊ธ€

๋ชฉ๋ก ๋ณด๊ธฐ
3/21

๐Ÿ“• Suffix array (SA)

๐Ÿ‘‰ ์‚ฌ์šฉ๋ชฉ์ 
๋ฌธ์ž์—ด์„ ์‚ฌ์ „ ์ˆœ์œผ๋กœ ๋‚˜์—ดํ•œ ๋ฐฐ์—ด๋กœ, ๋ฌธ์ž์—ด ๋ฌธ์ œ๋ฅผ ํ’€ ๋•Œ ์‚ฌ์šฉ.
๋ฌธ์ž์—ด์„ ๋‹ด๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ, ๋ฌธ์ž์˜ ์‹œ์ž‘ index๋ฅผ ๋‹ด๋Š”๋‹ค.

์˜ˆ๋ฅผ ๋“ค๋ฉด camel์˜ ๋ถ€๋ถ„ ๋ฌธ์ž์—ด์€ camel, amel, mel, el, l ์ด๊ณ , ์ด๋ฅผ ์‚ฌ์ „ ์ˆœ์œผ๋กœ ๋‚˜์—ดํ•˜๋ฉด amel, camel, l, mel,el ์ด๋‹ค.

๐Ÿ‘‰ ๊ตฌํ˜„
์—ฌ๋Ÿฌ๋ฐฉ๋ฒ•์ด ์žˆ์ง€๋งŒ, 9249๋ฒˆ์„ ํ†ต๊ณผํ•˜๋ ค๋ฉด O(N(logN)^2)์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํ•„์š”ํ–ˆ๋‹ค.(๋Œ€ํšŒ์—์„œ ํ†ต์šฉ๋˜๊ณ  ์žˆ๋‹ค๊ณ ํ•จ)

๊ฐ„๋žตํ•˜๊ฒŒ ๋งํ•˜๋ฉด, ์ผ๋‹จ ์ฒซ ์งธ ๊ธ€์ž๋กœ ์ •๋ ฌ์„ ํ•ด ๊ทธ๋ฃน์„ ๋‚˜๋ˆˆ๋’ค, ๊ทธ ํ›„ ๊ฐ™์€ ๊ทธ๋ฃน์— ์ ‘๋ฏธ์‚ฌ๋ฅผ d=1 ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ด์„œ, d*2 ๋ฒˆ ์งธ ๊ธ€์งœ๋งŒ ๋น„๊ต๋ฅผ ํ•ด์„œ ๊ทธ ๊ทธ๋ฃน์„ ์•ˆ์„ ์ •๋ ฌ์„ ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

์ด ๋•Œ d*2๋ฒˆ์งธ ๋‹จ์–ด๋„ ๋‹จ์–ด์˜ ์ ‘๋ฏธ์‚ฌ๋กœ ์–ด๋–ค ๊ทธ๋ฃน์— ์†ํ•ด์žˆ์Œ์œผ๋กœ, ๋‹จ์–ด๋ฅผ ๋น„๊ตํ•˜๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ ๊ทธ๋ฃน์„ ๋น„๊ตํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋„ ์ˆœ์„œ๋ฅผ ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

ex) banana

  • ์ฒซ ๋ฒˆ์งธ ๊ธ€์ž๋กœ ์ •๋ ฌ
  • d=1 ๋ฒˆ์งธ ๊ธ€์งœ๋กœ ์ •๋ ฌ, ๊ธ€์ž ์ˆ˜๊ฐ€ ์ดˆ๊ณผํ•œ๋‹ค๋ฉด ์งง์€ ๋‹จ์–ด๋ฅผ ์•ž์œผ๋กœ(a,anana)
  • d*2 ๋ฒˆ์งธ ๊ธ€์ž๋กœ ์ •๋ ฌ, ์ด ๋•Œ ๊ธ€์ž๋ฅผ ๋น„๊ต ํ•˜๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ ana์˜ a๊ฐ€ anana์˜ ana ๋ณด๋‹ค ์•ž์„  ๊ทธ๋ฃน์— ์†ํ•ด์žˆ์Œ์œผ๋กœ ์•ž์œผ๋กœ
  • ๊ทธ๋ฃน ์ˆ˜๊ฐ€ ์ ‘๋ฏธ์‚ฌ ์ˆ˜๊ฐ€ ๋˜๋ฉด ์ข…๋ฃŒ

์•„๋ž˜ ๋ธ”๋กœ๊ทธ์— ์„ค๋ช…์ด ์•„์ฃผ ์ž์„ธํžˆ ๋˜์–ด์žˆ๋‹ค.

๐Ÿ‘‰ ๋งํฌ
์ ‘๋ฏธ์‚ฌ๋ฐฐ์—ด
https://blog.naver.com/kks227/221028710658

๐Ÿ‘‰ ํŒ
ํŒŒ์ด์ฌ์—์„œ ์•„๋ž˜์ฒ˜๋Ÿผ custom compare ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค์–ด์„œ ๋„˜๊ธฐ๋Š” ๊ฒƒ๋ณด๋‹ค,

def cmp(i,j):
    if pos[i] != pos[j]:
        return pos[i] - pos[j]
    i += d
    j += d
    if i <n and j <n :
        return pos[i] - pos[j]
    else:
        return j - i
sa.sort(key=cmp_to_key(cmp))

๋ฏธ๋ฆฌ ๊ณ„์‚ฐํ•œ ๋’ค sort์‹œํ‚ค๋ฉด ํ›จ์”ฌ ์‹œ๊ฐ„์„ ๋‹จ์ถ• ์‹œํ‚ฌ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

def getGroup(x):
        if x < n:
            return pos[x]
        else:
            return -1
sa.sort(key=lambda x: (getGroup(x), getGroup(x + d)))
profile
Jazzing๐Ÿ‘จโ€๐Ÿ’ป

0๊ฐœ์˜ ๋Œ“๊ธ€