Hash table을 쓰는 Set 자료구조(built-in Python)

yebright·2025년 2월 12일

algorithm using Python

목록 보기

1/3

def longestConsecutive(self, nums: List[int]) -> int:
	numSet = set(nums)
    longest = 0

    for num in numSet:
    	if (num - 1) not in numSet:
        	length = 1
            while (num + length) in numSet:
            	length += 1
            longest = max(length, longest)
	return longest

에서 for num in numSet:부분이 set이 아니라 배열내에서 해당요소를 찾는거였으면 최악의 경우 모든 요소를 다 순회해야하므로 O(n)의 시간이 걸린다.
예를 들어

numArray=[33,21,3,14,8,26,17]

처럼 순서가 무작위인 배열에서 17이 있는지 없는지를 알아내려면 배열길이만큼을 다 순회해야했다.
그러나 Set을 사용하면 그 17(key)에 해당하는 hash value를 계산하고 (계산만 하면 되니까 O(1)시간) 그 hash value만을 가지고 이 key가 numSet에 있는지 없는지 판단이 가능하기 때문에 상수 시간밖에 안 걸린다. 참고로 set은 무질서한 고유 요소들의 collection으로, hash 함수를 사용해 그 요소들의 메모리상의 위치를 결정한다. 어떤 요소를 set에 추가하면, python은 먼저 그 요소의 hash value를 계산하고 그 값을 이용해 메모리공간 어디에 저장할지를 결정한다. 그래서 이 hash value를 이용해서 set 내의 요소인지 아닌지를 바로 결정할 수 있기 때문에 검색 속도가 매우 빠르다고 할 수 있다. (index별로 iterate하는게 아니고 input을 hash 함수에 넣어서 계산만 하면 바로 주소값이 (즉, set에 있는지 여부가) 나오니까 constant한 시간만 소요된다.)

그러나 해쉬도 충돌이 일어나는 최악의 경우에는 O(n)이 걸린다고 한다.

A Python set is an unordered collection of unique elements. Sets use a hash function to determine the position of elements in memory. When you add an element to a set, Python first computes the hash of the element and then uses this hash to decide where to store it in memory. This makes membership tests (in keyword) extremely fast.

A hash table lookup has a time complexity of O(1) because, on average, it only requires a constant number of operations to access a value using its key, thanks to the hash function which directly maps the key to a specific location in the table, allowing for near-instantaneous access regardless of the size of the data stored; essentially, you can "jump directly" to the desired element using the calculated hash code, making the lookup operation constant time.

Hash function:
The core component is the hash function, which takes a key and produces a unique index (hash code) that points to the location where the corresponding value is stored in the hash table.
Direct access:
By using the hash code, the lookup operation can directly access the desired bucket in the table, eliminating the need to search through a large dataset.
However, potential issues to consider:
Collisions:
If multiple keys map to the same hash code (collision), the lookup time might increase depending on the collision resolution strategy used (e.g., chaining, open addressing).
Poor hash function:
A poorly designed hash function can lead to many collisions, degrading the performance of the hash table and causing the lookup time to approach O(n) in worst-case scenarios.

Set자료구조는 이러한 hash table을 써서 lookup/insert/delete 기능을 O(1)만에 해결하게 한다.

*)해쉬함수를 이용해서 각 데이터의 해쉬값도 확인해볼 수 있다.

print("a".__hash__() )

myset = {"a", "b"}
print({element.__hash__() for element in myset})

output:
-24645259431269796
{2941982044835736999, -24645259431269796}

참조: https://medium.com/@kuldeepkumawat195/hashing-in-python-sets-and-dictionaries-aa2fdbb3861f
https://medium.com/@chakravartyutkarsh/understanding-why-hashset-provides-o-1-search-time-complexity-15cee2f96cec
https://medium.com/@abasaeed/understanding-hash-maps-hash-tables-and-hash-sets-in-python-c0a78b1e131
https://bluese05.tistory.com/58
https://analytics4everything.tistory.com/180#:~:text=Summary:%20set%2C%20dict%EC%9D%80%20%ED%95%B4%EC%8B%9C%ED%95%A8%EC%88%98%EB%A5%BC%20%EC%9D%B4%EC%9A%A9%ED%95%9C%20%ED%95%B4%EC%8B%9C%EA%B0%92%EC%9D%84%20%EC%A0%80%EC%9E%A5%ED%95%98%EA%B3%A0%EC%9E%88%EB%8A%94,%EC%88%AB%EC%9E%90%EB%A1%9C%20%EB%A7%A4%ED%95%91%ED%95%9C%20%EA%B2%BD%EC%9A%B0%20%EC%9D%B8%EB%8D%B1%EC%8A%A4%EC%B2%98%EB%9F%BC%20%EC%82%AC%EC%9A%A9%ED%95%A0%20%EC%88%98%20%EC%9E%88%EC%96%B4

yebright

조합론, 강화학습 좋아하는 개발자

다음 포스트

Hash table을 쓰는 Set 자료구조(built-in Python)

algorithm using Python

Map 함수- built-in Python

0개의 댓글