14. Hashing

Jerry·2025년 8월 1일

C언어로 쉽게 풀어쓴 자료구조

목록 보기

12/12

What is hashing?

Hashing is a technique that maps data (keys) to fixed-size values (hash codes) using a mathematical hash function, for fast data access.

It's commonly used to implement hash tables, which allow constant-time access on average.

Analogy

Think of a library:

Book title = key
Shelf number = hash value
Instead of searching the whole library, you compute the shelf number from the book title using a hash function and go directly to it.

Key Components

Term	Description
Key	Data you want to store or search (e.g., string, number)
Hash Function	Converts key to index in array (e.g., `h(key) % table_size`)
Hash Table	Array where data is stored at index = hash(key)
Collision	When two keys map to the same index
Load Factor	Ratio of elements stored to table size (`n / table_size`)

Hash Function Example

int hash(String key, int tableSize) {
    int hash = 0;
    for (char c : key.toCharArray()) {
        hash = (hash * 31 + c) % tableSize;
    }
    return hash;
}

This maps a string to an index between 0 and tableSize - 1.

Collision Resolution Strategies

Technique	Description
Chaining	Use a linked list (or ArrayList) at each index
Open Addressing	Find next available slot using probing methods like:
- Linear Probing	Try next slot: `(h + 1) % n`
- Quadratic Probing	Try slots `(h + i²) % n`
- Double Hashing	Use a second hash function to resolve collisions

Time Complexities (Average Case)

Operation	Time
Insert	O(1)
Search	O(1)
Delete	O(1)

Worst-case: O(n), if many collisions occur (but can be minimized with good hash functions & resizing).

Applications of Hashing

HashMap / HashSet in Java, unordered_map in C++
Symbol tables in compilers
Database indexing
Caching / Memoization
Password storage (with cryptographic hashes)
Bloom Filters
Data deduplication

Advantages of Hashing

Very fast access, insert, delete
Easy to implement
Good for large data sets when quick lookup is needed

Limitations

Collisions can degrade performance
Needs good hash functions
Not suitable for ordered data (can’t do range queries efficiently)

Summary

Feature	Value
Access Time	O(1) average
Storage	Array + keys/values
Order-preserving	❌ No
Collision Handling	Chaining, Open Addressing
Best Use Case	Fast key-value access, caching, lookup tables

How HashMap works internally?

What is a HashMap?

A HashMap is a key-value data structure that provides average O(1) time complexity for insertion, deletion, and lookup using hashing.

It is part of Java's java.util package, and it's not synchronized (unlike Hashtable).

How Does HashMap Work Internally?

Steps for put(key, value):

Hash the key:
Use hashCode() of the key and then apply a secondary hash function to avoid poor distribution.
```
int hash = hash(key.hashCode());
```
Index calculation:
Compute the bucket index by doing hash % table.length (using bitmasking: hash & (n - 1)).
Store Entry:
- If the bucket is empty → directly insert.
- If the bucket has entries → handle collision:
  - Use a linked list (pre-Java 8)
  - Convert to a balanced tree (Red-Black Tree) if the list becomes too long (Java 8+)
Replace or Append:
- If key already exists → update value.
- Else → add new node.

Steps for get(key):

Hash the key → get the hash.
Calculate index using hash & (n - 1)
Search in the bucket:
- If linked list/tree, traverse using .equals() to match key.

HashMap Internal Structure (Simplified)

Node<K, V>[] table; // Array of linked lists or trees

static class Node<K, V> {
    final int hash;
    final K key;
    V value;
    Node<K, V> next; // For chaining
}

From Java 8 onwards, the chain in a bucket becomes a Red-Black Tree if it exceeds TREEIFY_THRESHOLD = 8 for faster lookup.

Time Complexity

Operation	Avg Case	Worst Case
`get()`	O(1)	O(n) (if all keys hash to same index)
`put()`	O(1)	O(n) (if many collisions, rare)

But in Java 8+, worst case becomes O(log n) after converting the list to a Red-Black Tree.

Example Walkthrough

Map<String, Integer> map = new HashMap<>();
map.put("Alice", 23);
map.put("Bob", 30);
map.put("Charlie", 25);

"Alice".hashCode() → some int → hash → index
At index, store Node(hash, "Alice", 23)
If "Charlie" hashes to the same index → add it to the chain

Key Points & Pitfalls

Concept	Description
Load Factor	Default is `0.75` — controls resizing threshold
Resizing	Doubles capacity when size > loadFactor × capacity
Null keys/values	Allows 1 `null` key, and multiple `null` values
Hash collision	Multiple keys mapping to same index
Treeification	Happens when collisions per bucket > 8 and size ≥ 64
Not thread-safe	Use `ConcurrentHashMap` for multi-threading

Custom HashMap Behavior

Override hashCode() and equals() properly when using custom keys.
Poor hashCode() leads to clustering → degrades performance.

class Employee {
    int id;
    String name;

    @Override
    public int hashCode() {
        return id % 10;
    }

    @Override
    public boolean equals(Object o) {
        return this.id == ((Employee) o).id;
    }
}

Resize Operation

When current size exceeds capacity × loadFactor, the table is resized (usually doubled), and all entries are rehashed to the new table.

Costly operation: O(n)
Happens lazily during insertion

Summary Table

Feature	Value
Underlying Structure	Array of buckets (linked list or tree)
Hash Function	Modified `hashCode()`
Collision Handling	Chaining → then Tree (Java 8+)
Time Complexity	O(1) avg, O(log n) worst (Java 8+)
Thread Safety	❌ No (`ConcurrentHashMap` for that)
Resize Threshold	`capacity × loadFactor`
Null Keys / Values	✅ 1 null key, multiple null values

Real-World Use Cases

Implementing maps/sets
Memoization / caching
Frequency counters
Indexing in compilers, databases
User session management

Jerry

Backend engineer

이전 포스트