14. Hashing

Jerry·2025년 8월 1일

What is hashing?

Hashing is a technique that maps data (keys) to fixed-size values (hash codes) using a mathematical hash function, for fast data access.

It's commonly used to implement hash tables, which allow constant-time access on average.

Analogy

Think of a library:

  • Book title = key
  • Shelf number = hash value
  • Instead of searching the whole library, you compute the shelf number from the book title using a hash function and go directly to it.

Key Components

TermDescription
KeyData you want to store or search (e.g., string, number)
Hash FunctionConverts key to index in array (e.g., h(key) % table_size)
Hash TableArray where data is stored at index = hash(key)
CollisionWhen two keys map to the same index
Load FactorRatio of elements stored to table size (n / table_size)

Hash Function Example

int hash(String key, int tableSize) {
    int hash = 0;
    for (char c : key.toCharArray()) {
        hash = (hash * 31 + c) % tableSize;
    }
    return hash;
}

This maps a string to an index between 0 and tableSize - 1.

Collision Resolution Strategies

TechniqueDescription
ChainingUse a linked list (or ArrayList) at each index
Open AddressingFind next available slot using probing methods like:
- Linear ProbingTry next slot: (h + 1) % n
- Quadratic ProbingTry slots (h + i²) % n
- Double HashingUse a second hash function to resolve collisions

Time Complexities (Average Case)

OperationTime
InsertO(1)
SearchO(1)
DeleteO(1)

Worst-case: O(n), if many collisions occur (but can be minimized with good hash functions & resizing).

Applications of Hashing

  • HashMap / HashSet in Java, unordered_map in C++
  • Symbol tables in compilers
  • Database indexing
  • Caching / Memoization
  • Password storage (with cryptographic hashes)
  • Bloom Filters
  • Data deduplication

Advantages of Hashing

  • Very fast access, insert, delete
  • Easy to implement
  • Good for large data sets when quick lookup is needed

Limitations

  • Collisions can degrade performance
  • Needs good hash functions
  • Not suitable for ordered data (can’t do range queries efficiently)

Summary

FeatureValue
Access TimeO(1) average
StorageArray + keys/values
Order-preserving❌ No
Collision HandlingChaining, Open Addressing
Best Use CaseFast key-value access, caching, lookup tables

How HashMap works internally?

What is a HashMap?

A HashMap is a key-value data structure that provides average O(1) time complexity for insertion, deletion, and lookup using hashing.

It is part of Java's java.util package, and it's not synchronized (unlike Hashtable).

How Does HashMap Work Internally?

Steps for put(key, value):

  1. Hash the key:
    Use hashCode() of the key and then apply a secondary hash function to avoid poor distribution.
    int hash = hash(key.hashCode());
  2. Index calculation:
    Compute the bucket index by doing hash % table.length (using bitmasking: hash & (n - 1)).
  3. Store Entry:
    • If the bucket is empty → directly insert.
    • If the bucket has entries → handle collision:
      • Use a linked list (pre-Java 8)
      • Convert to a balanced tree (Red-Black Tree) if the list becomes too long (Java 8+)
  4. Replace or Append:
    • If key already exists → update value.
    • Else → add new node.

Steps for get(key):

  1. Hash the key → get the hash.
  2. Calculate index using hash & (n - 1)
  3. Search in the bucket:
    • If linked list/tree, traverse using .equals() to match key.

HashMap Internal Structure (Simplified)

Node<K, V>[] table; // Array of linked lists or trees

static class Node<K, V> {
    final int hash;
    final K key;
    V value;
    Node<K, V> next; // For chaining
}

From Java 8 onwards, the chain in a bucket becomes a Red-Black Tree if it exceeds TREEIFY_THRESHOLD = 8 for faster lookup.

Time Complexity

OperationAvg CaseWorst Case
get()O(1)O(n) (if all keys hash to same index)
put()O(1)O(n) (if many collisions, rare)

But in Java 8+, worst case becomes O(log n) after converting the list to a Red-Black Tree.

Example Walkthrough

Map<String, Integer> map = new HashMap<>();
map.put("Alice", 23);
map.put("Bob", 30);
map.put("Charlie", 25);
  1. "Alice".hashCode() → some int → hash → index
  2. At index, store Node(hash, "Alice", 23)
  3. If "Charlie" hashes to the same index → add it to the chain

Key Points & Pitfalls

ConceptDescription
Load FactorDefault is 0.75 — controls resizing threshold
ResizingDoubles capacity when size > loadFactor × capacity
Null keys/valuesAllows 1 null key, and multiple null values
Hash collisionMultiple keys mapping to same index
TreeificationHappens when collisions per bucket > 8 and size ≥ 64
Not thread-safeUse ConcurrentHashMap for multi-threading

Custom HashMap Behavior

  • Override hashCode() and equals() properly when using custom keys.
  • Poor hashCode() leads to clustering → degrades performance.
class Employee {
    int id;
    String name;

    @Override
    public int hashCode() {
        return id % 10;
    }

    @Override
    public boolean equals(Object o) {
        return this.id == ((Employee) o).id;
    }
}

Resize Operation

When current size exceeds capacity × loadFactor, the table is resized (usually doubled), and all entries are rehashed to the new table.

  • Costly operation: O(n)
  • Happens lazily during insertion

Summary Table

FeatureValue
Underlying StructureArray of buckets (linked list or tree)
Hash FunctionModified hashCode()
Collision HandlingChaining → then Tree (Java 8+)
Time ComplexityO(1) avg, O(log n) worst (Java 8+)
Thread Safety❌ No (ConcurrentHashMap for that)
Resize Thresholdcapacity × loadFactor
Null Keys / Values✅ 1 null key, multiple null values

Real-World Use Cases

  • Implementing maps/sets
  • Memoization / caching
  • Frequency counters
  • Indexing in compilers, databases
  • User session management
profile
Backend engineer

0개의 댓글