๐Ÿฅ”ํ•ด์‰ฌ ํ…Œ์ด๋ธ”(Hash Table)

eunseo kim ๐Ÿ‘ฉโ€๐Ÿ’ปยท2020๋…„ 12์›” 27์ผ
1

โœจ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ธฐ๋ณธ

๋ชฉ๋ก ๋ณด๊ธฐ
2/10

๐Ÿ“Œ ํ•ด์‰ฌ ๊ตฌ์กฐ

Hash Table : ํ‚ค(Key)์— ๋ฐ์ดํ„ฐ(Value)๋ฅผ ์ €์žฅํ•˜๋Š” ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ์ด๋‹ค.

  • key๋ฅผ ํ†ตํ•ด ๋ฐ”๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ›์•„์˜ฌ ์ˆ˜ ์žˆ์–ด ์†๋„๊ฐ€ ๋งค์šฐ ๋นจ๋ผ์ง„๋‹ค. (๋ฐฐ์—ด ๊ฐ™์€ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๋ณด๋‹ค ํ›จ์”ฌ ๊ฒ€์ƒ‰ ์†๋„๊ฐ€ ๋น ๋ฆ„)
  • ๋ณดํ†ต ๋ฐฐ์—ด๋กœ ๋ฏธ๋ฆฌ Hash Table์˜ ์‚ฌ์ด์ฆˆ๋งŒํผ ๊ณต๊ฐ„์„ ์ƒ์„ฑํ•œ ํ›„ ์‚ฌ์šฉํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ์ €์žฅ ๊ณต๊ฐ„์ด ๋งŽ์ด ํ•„์š”ํ•œ ๋Œ€์‹ , ํƒ์ƒ‰ ์‹œ๊ฐ„์ด ํ›จ์”ฌ ๊ฐ์†Œํ•˜๋ฏ€๋กœ ํ•ด์‰ฌ ํ…Œ์ด๋ธ”์€ ๊ณต๊ฐ„๊ณผ ํƒ์ƒ‰ ์‹œ๊ฐ„์„ ๋งž๋ฐ”๊พธ๋Š” ๊ธฐ๋ฒ•์ด๋‹ค.
  • ํ•ด์‰ฌ ํ…Œ์ด๋ธ”์˜ ๋Œ€ํ‘œ์ ์ธ ์˜ˆ๋กœ๋Š” ํŒŒ์ด์ฌ์˜ ๋”•์…”๋„ˆ๋ฆฌ(Dictionary) ํƒ€์ž…์ด ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ํŒŒ์ด์ฌ์—์„œ๋Š” ํ•ด์‰ฌ๋ฅผ ๋ณ„๋„๋กœ ๊ตฌํ˜„ํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค. (๋”•์…”๋„ˆ๋ฆฌ ํƒ€์ž…์„ ์‚ฌ์šฉํ•˜๋ฉด ๋œ๋‹ค.)


๐Ÿ“Œ ๊ด€๋ จ ์šฉ์–ด ์ •๋ฆฌ

  • ํ•ด์‰ฌ(Hash) : ์ž„์˜์˜ ๊ฐ’์„ ๊ณ ์ •๋œ ๊ธธ์ด๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ
  • ํ•ด์‰ฌ ํ…Œ์ด๋ธ”(Hash Table) : (key, value)๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜์—ฌ, ํ‚ค ๊ฐ’์˜ ์—ฐ์‚ฐ์— ์˜ํ•ด ๋ฐ์ดํ„ฐ์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ
  • ํ•ด์‹ฑ ํ•จ์ˆ˜(Hashing function) : key์— ๋Œ€ํ•œ ์‚ฐ์ˆ  ์—ฐ์‚ฐ์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ์˜ ์œ„์น˜๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋Š” ํ•จ์ˆ˜
  • ํ•ด์‰ฌ ์ฃผ์†Œ(Hash Address) or ํ•ด์‰ฌ ๊ฐ’(Hash Value) : key๋ฅผ ํ•ด์‹ฑ ํ•จ์ˆ˜๋กœ ์—ฐ์‚ฐํ•œ ๊ฒฐ๊ณผ๋กœ, ์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ํ•ด์‰ฌ ํ…Œ์ด๋ธ”์—์„œ ํ•ด๋‹น key์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ ์œ„์น˜๋ฅผ ์ผ๊ด€์„ฑ ์žˆ๊ฒŒ ์ฐพ์„ ์ˆ˜ ์žˆ์Œ
  • ์Šฌ๋กฏ(Slot) : ํ•œ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ๋Š” ๊ณต๊ฐ„


๐Ÿ“Œ Hash Table ๋งŒ๋“ค๊ธฐ

๐Ÿ’ก ๋ฆฌ์ŠคํŠธ๋กœ ํ•ด์‰ฌ ํ…Œ์ด๋ธ” ๋งŒ๋“ค๊ธฐ

  • ์ถฉ๋Œ์€ ๊ณ ๋ คํ•˜์ง€ ์•Š์Œ
hash_table = list([0 for i in range(8)])
# ํŒŒ์ด์ฌ list comprehension ์‚ฌ์šฉ

def get_key(data):
    return hash(data)

def hash_function(key):
    return key % 8

def save_data(data, value):
    hash_address = hash_function(get_key(data))
    hash_table[hash_address] = value
    
def read_data(data):
    hash_address = hash_function(get_key(data))
    return hash_table[hash_address]

์ž…๋ ฅ

save_data('Dave', '0102030200')
save_data('Andy', '01033232200')
read_data('Dave')
print(hash_table)

์ถœ๋ ฅ

[0, 0, 0, 0, 0, 0, '0102030200', '01033232200']

๋ฌธ์ œ์ 

  • ํ•˜์ง€๋งŒ ๋งŒ์•ฝ ์„œ๋กœ ๋‹ค๋ฅธ key์ž„์—๋„ ํ•ด์‰ฌ ์ฃผ์†Œ๊ฐ€ ๋™์ผํ•˜๋‹ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ• ๊นŒ?
  • ๋”ฐ๋ผ์„œ ์œ„์˜ ๋ฐฉ๋ฒ•์œผ๋กœ ํ•ด์‰ฌ ํ…Œ์ด๋ธ”์„ ๊ตฌํ˜„ํ•˜๊ฒŒ ๋˜๋ฉด ์ด๋Ÿฌํ•œ ํ•ด์‰ฌ ์ถฉ๋Œ(Hash Collision) ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์ง€ ๋ชปํ•œ๋‹ค.
  • ํ•ด์‰ฌ ์ถฉ๋Œ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” Chaining ๊ธฐ๋ฒ• or Linear Probing ๊ธฐ๋ฒ•์„ ์ด์šฉํ•ด์•ผ ํ•œ๋‹ค.

๐Ÿ’ก ์ถฉ๋Œ ํ•ด๊ฒฐ (1) - Chaining ๊ธฐ๋ฒ•

  • ๊ฐœ๋ฐฉ ํ•ด์Š(Open Hashing) ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์ด๋‹ค.
  • ํ•ด์‰ฌ ํ…Œ์ด๋ธ”์˜ ์ €์žฅ๊ณต๊ฐ„ ์™ธ์˜ ๊ณต๊ฐ„์„ ํ™œ์šฉํ•˜๋Š” ๊ธฐ๋ฒ•์ด๋‹ค.
  • ์ถฉ๋Œ์ด ์ผ์–ด๋‚˜๋ฉด Linked List ์ž๋ฃŒ๊ตฌ์กฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€๋กœ ๋’ค์— ์—ฐ๊ฒฐ์‹œ์ผœ ์ €์žฅํ•œ๋‹ค.

Code

hash_table = list([0 for i in range(8)])

def get_key(data):
    return hash(data)

def hash_function(key):
    return key % 8

def save_data(data, value):
    index_key = get_key(data)
    hash_address = hash_function(index_key)
    if hash_table[hash_address] != 0: # ์ถฉ๋Œ์ด ์ผ์–ด๋‚˜๋Š” ๊ฒฝ์šฐ
        for index in range(len(hash_table[hash_address])): # ๋งŒ์•ฝ ํ‚ค ๊ฐ’์ด ๊ฐ™๋‹ค๋ฉด ๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธ
            if hash_table[hash_address][index][0] == index_key:
                hash_table[hash_address][index][1] = value
                return
        hash_table[hash_address].append([index_key, value]) # ์ผ์น˜ํ•˜๋Š” ํ‚ค ๊ฐ’์ด ์—†๋‹ค๋ฉด ์ถ”๊ฐ€๋กœ ๋’ค์— ์—ฐ๊ฒฐ
    else: # ์ถฉ๋Œ์ด ์ผ์–ด๋‚˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ
        hash_table[hash_address] = [[index_key, value]]
    
def read_data(data):
    index_key = get_key(data)
    hash_address = hash_function(index_key)

    if hash_table[hash_address] != 0: # ํ•ด๋‹น hash address์— ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ
        for index in range(len(hash_table[hash_address])): # ํ•ด๋‹น hash address์— ์—ฐ๊ฒฐ๋œ ๋ชจ๋“  key ์ˆœํšŒ
            if hash_table[hash_address][index][0] == index_key: 
                return hash_table[hash_address][index][1]
        return None # ์ˆœํšŒ ํ›„ ์ผ์น˜ํ•˜๋Š” key๊ฐ€ ์—†์œผ๋ฉด None ๋ฐ˜ํ™˜
    else:
        return None # ํ•ด๋‹น hash address์— ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ

๐Ÿ’ก ์ถฉ๋Œ ํ•ด๊ฒฐ (2) - Linear Probing ๊ธฐ๋ฒ•

  • ํ์‡„ ํ•ด์Š ๋˜๋Š” Close Hashing ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์ด๋‹ค.
  • ํ•ด์‰ฌ ํ…Œ์ด๋ธ” ์ €์žฅ๊ณต๊ฐ„ ์•ˆ์—์„œ ์ถฉ๋Œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๊ธฐ๋ฒ•์ด๋‹ค.
  • ์ถฉ๋Œ์ด ์ผ์–ด๋‚˜๋ฉด, ํ•ด๋‹น hash address์˜ ๋‹ค์Œ address๋ถ€ํ„ฐ ๋งจ ์ฒ˜์Œ ๋‚˜์˜ค๋Š” ๋นˆ๊ณต๊ฐ„์— ์ €์žฅํ•œ๋‹ค.
  • ์ €์žฅ ๊ณต๊ฐ„์˜ ํ™œ์šฉ๋„๋ฅผ ๋†’์ด๋Š” ๊ธฐ๋ฒ•์ด๋‹ค.

Code

hash_table = list([0 for i in range(8)])

def get_key(data):
    return hash(data)

def hash_function(key):
    return key % 8

def save_data(data, value):
    index_key = get_key(data)
    hash_address = hash_function(index_key)
    if hash_table[hash_address] != 0: # ์ถฉ๋Œ์ด ์ผ์–ด๋‚˜๋Š” ๊ฒฝ์šฐ
        for index in range(hash_address, len(hash_table)): # ํ˜„์žฌ hash address๋ถ€ํ„ฐ ํ•ด์‰ฌ ํ…Œ์ด๋ธ” ๋๊นŒ์ง€ ์ˆœํšŒ
            if hash_table[index] == 0: # ์ฒ˜์Œ ๋‚˜์˜ค๋Š” ๋นˆ ๊ณต๊ฐ„์— ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅ
                hash_table[index] = [index_key, value]
                return
            elif hash_table[index][0] == index_key: # ๊ฐ™์€ ํ‚ค์ด๋ฉด ๋ฐ์ดํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธ
                hash_table[index][1] = value
                return
    else: # ์ถฉ๋Œ์ด ์ผ์–ด๋‚˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ
        hash_table[hash_address] = [index_key, value]

def read_data(data):
    index_key = get_key(data)
    hash_address = hash_function(index_key)
    
    if hash_table[hash_address] != 0: # ํ•ด๋‹น hash address์— ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ
        for index in range(hash_address, len(hash_table)): # ํ˜„์žฌ hash address๋ถ€ํ„ฐ ํ•ด์‰ฌ ํ…Œ์ด๋ธ” ๋๊นŒ์ง€ ์ˆœํšŒ
            if hash_table[index] == 0: #๋นˆ ์Šฌ๋กฏ์ด ๋‚˜ํƒ€๋‚˜๊ฒŒ ๋˜๋ฉด ์ €์žฅ๋œ ์ ์ด ์—†๋Š” ๋ฐ์ดํ„ฐ์ž„
                return None
            elif hash_table[index][0] == index_key: 
                return hash_table[index][1]
    else: # ํ•ด๋‹น hash address์— ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ
        return None

๐Ÿ“Œ ํ•ด์‰ฌ ํ…Œ์ด๋ธ”์˜ ์žฅ๋‹จ์ ๊ณผ ์ฃผ์š” ์šฉ๋„

  • ์žฅ์ 
    • ๋ฐ์ดํ„ฐ ์ €์žฅ/์ฝ๊ธฐ ์†๋„๊ฐ€ ๋น ๋ฅด๋‹ค. (๊ฒ€์ƒ‰ ์†๋„๊ฐ€ ๋น ๋ฅด๋‹ค)
    • ํ•ด์‰ฌ๋Š” ํ‚ค์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š”์ง€(์ค‘๋ณต) ํ™•์ธ์ด ์‰ฌ์›€
  • ๋‹จ์ 
    • ์ผ๋ฐ˜์ ์œผ๋กœ ์ €์žฅ๊ณต๊ฐ„์ด ์ข€๋” ๋งŽ์ด ํ•„์š”ํ•˜๋‹ค.
    • ์—ฌ๋Ÿฌ ํ‚ค์— ํ•ด๋‹นํ•˜๋Š” ์ฃผ์†Œ๊ฐ€ ๋™์ผํ•  ๊ฒฝ์šฐ ์ถฉ๋Œ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ณ„๋„ ์ž๋ฃŒ๊ตฌ์กฐ๊ฐ€ ํ•„์š”ํ•จ
  • ์ฃผ์š” ์šฉ๋„
    • ๊ฒ€์ƒ‰์ด ๋งŽ์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ
    • ์ €์žฅ, ์‚ญ์ œ, ์ฝ๊ธฐ๊ฐ€ ๋นˆ๋ฒˆํ•œ ๊ฒฝ์šฐ
    • ์บ์‰ฌ ๊ตฌํ˜„์‹œ (์ค‘๋ณต ํ™•์ธ์ด ์‰ฝ๊ธฐ ๋•Œ๋ฌธ)

๐Ÿ“Œ ์‹œ๊ฐ„ ๋ณต์žก๋„

  • ์ผ๋ฐ˜์ ์ธ ๊ฒฝ์šฐ(Collision์ด ์—†๋Š” ๊ฒฝ์šฐ)๋Š” O(1)O(1)
  • ์ตœ์•…์˜ ๊ฒฝ์šฐ(Collision์ด ๋ชจ๋‘ ๋ฐœ์ƒํ•˜๋Š” ๊ฒฝ์šฐ)๋Š” O(n)O(n)

ํ•ด์‰ฌ ํ…Œ์ด๋ธ”์˜ ๊ฒฝ์šฐ, ์ผ๋ฐ˜์ ์ธ ๊ฒฝ์šฐ๋ฅผ ๊ธฐ๋Œ€ํ•˜๊ณ  ๋งŒ๋“ค๊ธฐ ๋•Œ๋ฌธ์—, ์‹œ๊ฐ„ ๋ณต์žก๋„๋Š” O(1)O(1) ์ด๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์Œ

profile
์—ด์‹ฌํžˆ๐Ÿ’จ (์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ธ”๋กœ๊ทธ)

0๊ฐœ์˜ ๋Œ“๊ธ€